Web Scraper Cloud Parser feature release

December 23, 2019

Clean scraped data, Simplify data analysis, Process scraped data online, Data post-processing tool, Data extraction and cleaning, Transform scraped data, Web scraping software, Automated web scraper, Scraped data cleaning tool, Web Scraper Cloud parser

We are happy to finally introduce a Parser feature for Web Scraper Cloud.

Usually, to post-process data, a custom-written script or extra time editing the data manually in a spreadsheet software would be the case; however, the Parser takes care of and eases this process.

Its modular design allows the user to create, chain and further configure multiple parsers for each column to easily create the most suitable post-processing methods, ranging from very simple to more sophisticated.

Web Scraper Cloud Parser Feature Modular Design Parsers

Parser includes such parser types as :

RegEx Match;
Replace Text;
Remove Whitespaces;
Strip HTML.

Each parser type takes care of a different ability to process data. The “Replace Text” parser, for example, as the name indicates, allows replacing or removing a string. However, the “Remove Whitespaces” parser helps to clean up such fields that are scraped by the Text Selector, removing any white spaces or unnecessary new lines from the text. A very great hack is that it is possible to create multiple parsers for the same column, truly allowing the best data processing methods.

Aside from all the parser types, the Parser feature also provides such functions as creating a virtual column, allowing the user to combine information from two or more source columns and apply parsers to that virtual column. Also, a “Remove Column” function, which enables the user to remove columns, enabling the possibility of not having irrelevant data columns in the final scraped data.

Let’s take an example. After scraping multiple pages of an e-commerce web page, the data comes out looking something like this:

Web Scraper Cloud Parser Feature Data Preview

Data like this is not practical for further analysis; it is hard to read and review. To adjust the data for easier use, for this particular example, we started by removing all the unnecessary columns, such as “web-scraper-order”, “web-scraper-start-url” and such, by applying the “remove column” function of the Parser feature.

Web Scraper Cloud Parser Feature Remove Columns Function

Then with “Replace Text” parser we make sure that the “price” data output is without a dollar sign

Web Scraper Cloud Parser Replace Text Function

And finally, we create a simple “RegEx Match” parser to extract only numbers from the “reviews” column.

Web Scraper Cloud Parser Feature RegEx Regular Expression Parser Function

After all these steps, all the necessary data processing has been done, and the data output will look more simplified, easier to use and analyse.

Web Scraper Cloud Parser Feature Neat Data Processing

The Parser feature is easy to use. A basic knowledge of RexEx is useful; however, not obligatory. The modular design allows chaining multiple parsers together, allowing the user to create, for example, multiple very simple replace text operations rather than only one parser with a more complicated configuration.

A video was also created that you can find on our YouTube channel!

Go back to blog page