OpenScraping Project
A declarative, scalable, and extensible specification and reference client for web scraping
Declarative
OpenScraping defines a simple declarative Scraping Definition format, which can be used by scraping clients (like web extensions or websites) or humans alike to define the parts of a web page to scrape.
Powerful
Each component of OpenScraping is designed to be swappable with custom plugins. This allows for a custom requester, parser, data service, and more.
Scalable
OpenScraping is designed to enable online (e.g. realtime, fetching new data from sites) or offline (batch processing of previously saved web pages/archives). It is also designed with horizontal scaling over distributed clusters in mind.