Loading of links via js

Discussion in 'Share Experience' started by Support, Sep 18, 2015.

  1. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,544
    Likes Received:
    2,163
    Today, on the example of site http://www.chegg.com/ we learn how to parse the information that is loaded by JS-script.
    1) In this link, as described above, the content really is loaded by JS script. But in the end it too somewhere is receiving data, and therefore make somewhere request. We just need to find this query, and then use it. For this we use Developer Tools (Ctrl + Shift + I, tab Network), which are available in the browser Chrome. Open the link above and analyze it. We will see a lot of different requests and responses. After reviewing all of the answers on the subject information that interests us (for example you can search by the first name: "Cengage Advantage Books ..."), we find the desired us query.
    [​IMG]
    2) Check and analyze the founded query. For a start it will open in a new browser tab and see whether it gives what we need.
    [​IMG]
    As you can see - yes, there is the name of the book and link to it (highlighted in green), as well as a lot of other information. If you look closely - you can see that this data in format JSON. You can verify this by inserting data into Template tester at A-parser and clicking Pretiffy JSON (highlighted green square).
    [​IMG]
    3) As you can see, it was only the first page, in which only 10 results. But what if you need all the rest? Analyzing found earlier query:
    If instead of the bold one put 2 and open the link in a browser - we see that the data out is what we need (2-page) in already familiar JSON. Well, now there are all loaded into the A-parser.
    4) For parsing we will use Net::HTTP Net::HTTP. For obtaining the necessary data from all content using Parse custom result and regular expressions. By the way, the regular expression in this example, you must be very careful, because a result there are has "relatedItems" (something like similar books), and with them on the first page, you can get 18 results instead of 10.
    [​IMG]

    Notes:
    • The input is previously found request, but instead numbers substitute macroes of busting numbers. In this example, all pages will be passed with the 1st of 1000-th.
    • The result is a file with following information: Author - Title: Link to the page. Optionally you can change the format of the result as you like, changing the regular expression, and the data they collected, as well as changing himself Result format. Please note, there is still using the Result builder is replaced "\ /" to "/" because in JSON symbol "/" shielded.
    The resulting file will be as follows:
    Code:
    eyJwcmVzZXQiOiJodHRwOi8vYS1wYXJzZXIuY29tL3RocmVhZHMvMTY5OSIsInZh
    bHVlIjp7InByZXNldCI6Imh0dHA6Ly9hLXBhcnNlci5jb20vdGhyZWFkcy8xNjk5
    IiwicGFyc2VycyI6W1siTmV0OjpIVFRQIiwiZGVmYXVsdCIseyJ0eXBlIjoib3Zl
    cnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoiJGJvb2tzLmZvcm1h
    dCgnJGF1dGhvciAtICR0aXRsZTogaHR0cDovL3d3dy5jaGVnZy5jb20vJGxpbmtc
    XG4nKSJ9LHsidHlwZSI6ImN1c3RvbVJlc3VsdCIsInJlc3VsdCI6ImRhdGEiLCJy
    ZWdleCI6IlssfFxcW117XCJ0eXBlXCI6Lis/XCJwcmltYXJ5QXV0aG9yXCI6XCIo
    Lio/KVwiLis/XCJ0aXRsZVwiOlwiKC4rPylcIi4rP1wicGRwXCI6XCIoLis/KVwi
    IiwicmVnZXhUeXBlIjoiZyIsInJlc3VsdFR5cGUiOiJhcnJheSIsImFycmF5TmFt
    ZSI6ImJvb2tzIiwicmVzdWx0cyI6WyJhdXRob3IiLCJ0aXRsZSIsImxpbmsiXX1d
    XSwicmVzdWx0c0Zvcm1hdCI6IiRwMS5wcmVzZXQiLCJyZXN1bHRzU2F2ZVRvIjoi
    ZmlsZSIsInJlc3VsdHNGaWxlTmFtZSI6IiRkYXRlZmlsZS5mb3JtYXQoKS50eHQi
    LCJhZGRpdGlvbmFsRm9ybWF0cyI6W10sInJlc3VsdHNVbmlxdWUiOiJubyIsInF1
    ZXJ5Rm9ybWF0IjpbIiRxdWVyeSJdLCJ1bmlxdWVRdWVyaWVzIjpmYWxzZSwic2F2
    ZUZhaWxlZFF1ZXJpZXMiOmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25BbGxM
    ZXZlbHMiOmZhbHNlLCJxdWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFsc2V9
    LCJyZXN1bHRzT3B0aW9ucyI6eyJvdmVyd3JpdGUiOmZhbHNlfSwiZG9Mb2ciOiJu
    byIsImtlZXBVbmlxdWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0
    c1ByZXBlbmQiOiIiLCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6
    W10sInJlc3VsdHNCdWlsZGVycyI6W3sic291cmNlIjpbMCxbImJvb2tzIiwibGlu
    ayJdXSwidHlwZSI6InN0cmluZ1JlcGxhY2UiLCJhcnJheSI6ImJvb2tzIiwic2Vh
    cmNoIjoiXFwvIiwicmVwbGFjZSI6Ii8iLCJ0byI6ImxpbmsifV0sImNvbmZpZ092
    ZXJyaWRlcyI6W119fQ==
     

Share This Page