Scrape internal links and give report?

Discussion in 'A-Parser Support Forum' started by r3dn4x, Nov 9, 2016.

  1. r3dn4x

    r3dn4x A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Jan 22, 2014
    Messages:
    14
    Likes Received:
    2
    I'm sure it is possible, but I am having trouble figuring out the best way to scrape all internal links on a domain, and output a report like the one given in Google Webmaster Tools.

    Example:

    /page-1/, 293 links
    /page-2/, 192 links

    then for each page, have a breakdown of all links and what anchors are used.
     
  2. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,557
    Likes Received:
    2,167
    It is necessary to use a parser HTML::LinkExtractor HTML::LinkExtractor with Parse to level function for searching links on a site with the specified depth and set the format of the result in the desired form. Also recommended to enable unique of requests.
    [​IMG]
    Code:
    eyJwcmVzZXQiOiJkZWZhdWx0IiwidmFsdWUiOnsicHJlc2V0IjoiZGVmYXVsdCIs
    InBhcnNlcnMiOltbIkhUTUw6OkxpbmtFeHRyYWN0b3IiLCJkZWZhdWx0Iix7InR5
    cGUiOiJvcHRpb25zIiwiaWQiOiJwYXJzZUxldmVsIiwidmFsdWUiOjN9XV0sInJl
    c3VsdHNGb3JtYXQiOiIkcXVlcnk6ICRwMS5pbnRjb3VudFxcbiRwMS5pbnRsaW5r
    cy5mb3JtYXQoJyRsaW5rICRjbGVhbmFuY2hvclxcbicpIiwicmVzdWx0c1NhdmVU
    byI6ImZpbGUiLCJyZXN1bHRzRmlsZU5hbWUiOiIkZGF0ZWZpbGUuZm9ybWF0KCku
    dHh0IiwiYWRkaXRpb25hbEZvcm1hdHMiOltdLCJyZXN1bHRzVW5pcXVlIjoibm8i
    LCJxdWVyeUZvcm1hdCI6WyIkcXVlcnkiXSwidW5pcXVlUXVlcmllcyI6dHJ1ZSwi
    c2F2ZUZhaWxlZFF1ZXJpZXMiOmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25B
    bGxMZXZlbHMiOmZhbHNlLCJxdWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFs
    c2UsInF1ZXJ5QnVpbGRlcnNPbkFsbExldmVscyI6ZmFsc2V9LCJyZXN1bHRzT3B0
    aW9ucyI6eyJvdmVyd3JpdGUiOmZhbHNlfSwiZG9Mb2ciOiJubyIsImtlZXBVbmlx
    dWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0c1ByZXBlbmQiOiIi
    LCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6W10sInJlc3VsdHNC
    dWlsZGVycyI6W10sImNvbmZpZ092ZXJyaWRlcyI6W10sInJ1blRhc2tPbkNvbXBs
    ZXRlIjpudWxsLCJ1c2VSZXN1bHRzRmlsZUFzUXVlcmllc0ZpbGUiOmZhbHNlLCJy
    dW5UYXNrT25Db21wbGV0ZUNvbmZpZyI6ImRlZmF1bHQiLCJ0b29sc0pTIjoiIn19
    

    Example of results:
     

Share This Page