Make sitemap with A-Parser

Discussion in 'Share Experience' started by Support, Sep 18, 2015.

  1. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,547
    Likes Received:
    2,164
    Sitemap (Wiki) - this is XML-file with the information for the search engines (such as Yandex, Google, Bing) about web pages, which are to be indexing. It helps search engines index the site more intelligently. Some SEO-experts consider the lack of such map a gross error. On the Internet there are many services and tools for creating these maps, as well as for their validation. We try to create sitemap using A-parser.

    So, let's see what exactly we need to do.
    1. Check all internal links on the site.
    2. Find out all the necessary parameters for each link.
    3. Save all this information to a file sitemap.xml.
    Sitemap protocol is described on the website sitemaps.org. For solving this problemus perfectly will approach the parser HTML::LinkExtractor HTML::LinkExtractor. He is able to collect from the site links to the specified depth and return the server response headers (which typically contains the date of last change). For the result output will respond template engine Template Toolkit, which will generate the right kind of output file.

    [​IMG]
    [​IMG]
    Code:
    eyJwcmVzZXQiOiJzaXRlbWFwIiwidmFsdWUiOnsicHJlc2V0Ijoic2l0ZW1hcCIs
    InBhcnNlcnMiOltbIkhUTUw6OkxpbmtFeHRyYWN0b3IiLCJkZWZhdWx0Iix7InR5
    cGUiOiJvcHRpb25zIiwiaWQiOiJwYXJzZUxldmVsIiwidmFsdWUiOjN9LHsidHlw
    ZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2YWx1ZSI6Ijx1cmw+
    XFxuPGxvYz5bJSBxdWVyeSB8IHhtbCAlXTwvbG9jPlslIElGIGxtWWVhciAhPSAn
    bm9uZScgJV1cXG48bGFzdG1vZD4kbG1ZZWFyLVslIFNXSVRDSCBsbU1TdHJpbmcg
    JV1bJSBDQVNFICdKYW4nICVdMDFbJSBtPScwMScgJV1bJSBDQVNFICdGZWInICVd
    MDJbJSBtPScwMicgJV1bJSBDQVNFICdNYXInICVdMDNbJSBtPScwMycgJV1bJSBD
    QVNFICdBcHInICVdMDRbJSBtPScwNCcgJV1bJSBDQVNFICdNYXknICVdMDVbJSBt
    PScwNScgJV1bJSBDQVNFICdKdW4nICVdMDZbJSBtPScwNicgJV1bJSBDQVNFICdK
    dWwnICVdMDdbJSBtPScwNycgJV1bJSBDQVNFICdBdWcnICVdMDhbJSBtPScwOCcg
    JV1bJSBDQVNFICdTZXAnICVdMDlbJSBtPScwOScgJV1bJSBDQVNFICdPY3QnICVd
    MTBbJSBtPScxMCcgJV1bJSBDQVNFICdOb3YnICVdMTFbJSBtPScxMScgJV1bJSBD
    QVNFICdEZWMnICVdMTJbJSBtPScxMicgJV1bJSBFTkQgJV0tJGxtRGF5PC9sYXN0
    bW9kPlxcbjxjaGFuZ2VmcmVxPlslIFVTRSBkYXRlO2NhbGMgPSBkYXRlLmNhbGM7
    RGQgPSBjYWxjLkRlbHRhX0RheXMobG1ZZWFyLG0sbG1EYXksIGRhdGUuZm9ybWF0
    KGRhdGUubm93LCAnJVknKSxkYXRlLmZvcm1hdChkYXRlLm5vdywgJyVtJyksZGF0
    ZS5mb3JtYXQoZGF0ZS5ub3csICclZCcpKSAlXVslIElGIERkID09IDAgJV1ob3Vy
    bHlbJSBFTkQgJV1bJSBJRiBEZCA9PSAxICVdZGFpbHlbJSBFTkQgJV1bJSBJRiBE
    ZCA+IDEgYW5kIERkIDwgMzAgJV13ZWVrbHlbJSBFTkQgJV1bJSBJRiBEZCA+PSAz
    MCBhbmQgRGQgPCAzNjUgJV1tb250aGx5WyUgRU5EICVdWyUgSUYgRGQgPj0gMzY1
    ICVdeWVhcmx5WyUgRU5EICVdPC9jaGFuZ2VmcmVxPlslIEVORCAlXVxuPHByaW9y
    aXR5PlslIElGIHF1ZXJ5Lmx2bCA9PSAwICVdMVslIEVMU0UgJV1bJSBVU0UgTWF0
    aDsgTWF0aC5pbnQoMTAgLyAocXVlcnkubHZsICsgMSkpIC8gMTAgKyAwLjMgJV1b
    JSBFTkQgJV08L3ByaW9yaXR5PlxuPC91cmw+XFxuIn0seyJ0eXBlIjoidW5pcXVl
    IiwicmVzdWx0IjpbImludGxpbmtzIiwibGluayJdLCJ1bmlxdWVUeXBlIjoic3Ry
    aW5nIiwidW5pcXVlR2xvYmFsIjpmYWxzZX0seyJ0eXBlIjoiZmlsdGVyIiwicmVz
    dWx0IjoiaGVhZGVycyIsImZpbHRlclR5cGUiOiJyZW1hdGNoIiwidmFsdWUiOiJ0
    ZXh0XFwvaHRtbHxhcHBsaWNhdGlvblxcL3htbCIsIm9wdGlvbiI6ImlzIn0seyJ0
    eXBlIjoiY3VzdG9tUmVzdWx0IiwicmVzdWx0IjoiaGVhZGVycyIsInJlZ2V4Ijoi
    bGFzdC1tb2RpZmllZDouKz8oXFxkezEsMn0pLj8oXFx3ezN9KS4/KFxcZHs0fSki
    LCJyZWdleFR5cGUiOiIiLCJyZXN1bHRUeXBlIjoiZmxhdCIsImFycmF5TmFtZSI6
    IiIsInJlc3VsdHMiOlsibG1EYXkiLCJsbU1TdHJpbmciLCJsbVllYXIiXX1dXSwi
    cmVzdWx0c0Zvcm1hdCI6IiRwMS5wcmVzZXQiLCJyZXN1bHRzU2F2ZVRvIjoiZmls
    ZSIsInJlc3VsdHNGaWxlTmFtZSI6InNpdGVtYXAueG1sIiwiYWRkaXRpb25hbEZv
    cm1hdHMiOltdLCJyZXN1bHRzVW5pcXVlIjoibm8iLCJxdWVyeUZvcm1hdCI6WyIk
    cXVlcnkiXSwidW5pcXVlUXVlcmllcyI6dHJ1ZSwic2F2ZUZhaWxlZFF1ZXJpZXMi
    OmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25BbGxMZXZlbHMiOmZhbHNlLCJx
    dWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFsc2V9LCJyZXN1bHRzT3B0aW9u
    cyI6eyJvdmVyd3JpdGUiOnRydWV9LCJkb0xvZyI6Im5vIiwia2VlcFVuaXF1ZSI6
    Ik5vIiwibW9yZU9wdGlvbnMiOnRydWUsInJlc3VsdHNQcmVwZW5kIjoiPD94bWwg
    dmVyc2lvbj1cIjEuMFwiIGVuY29kaW5nPVwiVVRGLThcIj8+XG48dXJsc2V0IHht
    bG5zPVwiaHR0cDovL3d3dy5zaXRlbWFwcy5vcmcvc2NoZW1hcy9zaXRlbWFwLzAu
    OVwiPlxuIiwicmVzdWx0c0FwcGVuZCI6IjwvdXJsc2V0PiIsInF1ZXJ5QnVpbGRl
    cnMiOltdLCJyZXN1bHRzQnVpbGRlcnMiOltdLCJjb25maWdPdmVycmlkZXMiOltd
    fX0=

    Notes:
    • Not every server and not for each page returned a header last-modified;
    • based on availability last-modified header for the current link will be (or not, if the header is not returned) describes the parameters <lastmod> and <changefreq>;
    • <changefreq> is calculated based on the period of time that has elapsed since the last modification of the page;
    • <priority> is calculated depending on the depth of the page;
    • depth of collecting of links regulated by parameter Parse to level;
    • not recommended to set a large depth, because limit in one sitemap.xml is 50000 URL.
    In an end result sitemap looks like this:
    [​IMG]
     
  2. mpspringer

    mpspringer A-Parser Pro License
    A-Parser Pro

    Joined:
    Oct 11, 2014
    Messages:
    1
    Likes Received:
    0
    Per Google guidelines, "all sitemap formats limit a single sitemap to 10MB (uncompressed) and 50,000 URLs. If you have a larger file or more URLs, you will have to break your list into multiple sitemaps." Can the output split the files when 50k urls is reached?
     
  3. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,547
    Likes Received:
    2,164
    To use multiple sitemap files, you must also generate an index file sitemap_index.xml. In this case, this impossible to make. But separated into multiple sitemap files every 50,000 URL - possible. To do this, in the File name field enter the following:
    Code:
    [% l = l + 1; USE Math; "sitemap"_ Math.int(l div 50000 + 1) _".xml" %]
    But for this construction has worked in this preset - must wait for the implementation of this improvement.
     
  4. Forbidden

    Forbidden Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 9, 2013
    Messages:
    3,337
    Likes Received:
    1,795
    Done in 1.1.300
     
    Support likes this.

Share This Page