Example Use-Case
A simple example of a common implmentation/use-case of the logical definitions is as follows:
- Requestor
- takes in an array of URLs, generates a basic GET request from them
- e.g.
["/product/:ID1", "/product/:ID2" ...]
- Matcher
- Matches above URLs b/c they have the same page structure, and thus can use the same extractor
- e.g. Regex:
/product/*/or a more specific one - Requires the Response
Content-Typeheader to be HTML to match the parser
- Parser
- A builtin HTML parser
- Extractor
- Uses a basic XPath Scraping Definition provided by the user to extract relevant data
- Data plugin
- Goes into a database, in this case maybe just prints JSON to
stdout
- Goes into a database, in this case maybe just prints JSON to
Thus, the above could all be contained in one simple command-line tool.
What does the user actually set?
You can read the JSON Serialization Specification here.