Maintaining scraped folder structure?

Discussion in 'A-Parser Support Forum' started by Webmin, Mar 3, 2016.

  1. Webmin

    Webmin A-Parser Pro License
    A-Parser Pro

    Joined:
    Jan 18, 2016
    Messages:
    3
    Likes Received:
    0
    It doesn't look like it is possible to keep the same folder structure currently when outputting scraped information.

    For example, if the input query text file looks something like this:

    cars
    cars/ford
    cars/ford/focus/
    cars/ford/focus/red
    cars/porsche
    cars/porsche/carrera
    cars/porsche/carrera/black
    cars/porsche/carrera/black/new
    cars/porsche/carrera/black/used
    cars/nissan
    cars/nissan/xtrail
    dogs/
    dogs/bulldog/black
    dogs/labrador/golden
    dogs/labrador/golden/large
    dogs/labrador/brown/large
    ...

    Is it possible to maintain the same structure in the output files?

    Also, how hard would this be to output all of the scraped information straight into a sql file so a databse of the information could be created (again keeping the same structure)?

    Thank you.
     
    #1 Webmin, Mar 3, 2016
    Last edited: Mar 3, 2016
  2. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,503
    Likes Received:
    2,148
    In this case, use $query in the file name format, for example:
    Code:
    $query/$datefile.format().txt
    This will create the folder structure:
    [​IMG]

    You can use result format like
    Code:
    INSERT INTO blah VALUES('$query', '$p1.pr', ...)\n
     
  3. Webmin

    Webmin A-Parser Pro License
    A-Parser Pro

    Joined:
    Jan 18, 2016
    Messages:
    3
    Likes Received:
    0
    Thanks for the reply.

    I should of made it clear that my input file isn't in a folder structure and that regex is used to extract the information needed from the input file. So my input text file is actually something like:

    Bedroom (#1)
    Bedroom (#1) Bedding (#2)
    Bedroom (#1) Bedding (#2) Bed Pillows (#20445)
    Bedroom (#1) Bedding (#2) Bed Skirts (#20450)
    Bedroom (#1) Bedding (#2) Bed-in-a-Bag (#20469)
    Bedroom (#1) Bedding (#2) Blankets & Throws (#175750)
    Bedroom (#1) Bedding (#2) Canopies & Netting (#48090)
    Bedroom (#1) Bedding (#2) Comforters & Sets (#45462)
    Bedroom (#1) Bedding (#2) Decorative Bed Pillows (#115630)
    Bedroom (#1) Bedding (#2) Duvet Covers & Sets (#37644)
    Bedroom (#1) Bedding (#2) Mattress Pads & Feather Beds (#175751)
    Bedroom (#1) Bedding (#2) Other Bedding (#25815)
    Bedroom (#1) Bedding (#2) Pillow Shams (#43397)
    Bedroom (#1) Bedding (#2) Quilts, Bedspreads & Coverlets (#175749)

    I know currently it's not achievable when using regex on the input file so it's more of a request if this functionality can be added (or is there a work around)

    Thanks.
     
  4. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,503
    Likes Received:
    2,148
    Then use this file name format:
    Code:
    [% query.replace('\s*\(.+?\)\s*', '/') %]$datefile.format().txt
     
  5. Webmin

    Webmin A-Parser Pro License
    A-Parser Pro

    Joined:
    Jan 18, 2016
    Messages:
    3
    Likes Received:
    0
    Thank you, I will try and let you know how I get on.

    I sent you a PM with the actual file I will be using as the input query text file in case the example I have typed up above is slightly different. Could you let me know if there are any changes to the regular expression above please?

    Thank you.
     

Share This Page