Web Scraper Cloud Parser feature release

Web Scraper Cloud, Data post processing, Parser

We are happy to finally introduce a Parser feature for Web Scraper Cloud.

Usually, to post process data, a custom written script or extra time editing the data manually in a spreadsheet software would be the case; however, the Parser takes care and eases this process.


Its modular design allows the user to create, chain and further on configure multiple parsers for each column to easily create the most suitable post processing methods, ranging from very simple to more sophisticated.

Web Scraper Cloud Parser Feature Modular Design Parsers

Web Scraper Cloud Parser Feature Modular Design Parsers

Parser includes such parser types as :

  • RegEx Match;
  • Replace Text;
  • Remove Whitespaces;
  • Strip HTML.

Each parser type takes care of a different ability to process data. The “Replace Text” parser, for example, as the name indicates, allows to replace or remove a string. However, the “Remove Whitespaces” parser helps to clean up such fields that are scraped by the Text Selector, removing any white spaces or unnecessary new lines from the text. A very great hack is that it is possible to create multiple parsers for the same column, truly allowing the best data processing method creations.

Aside from all the parser types, the Parser feature also provides such functions as creating a virtual column, allowing the user to combine information from two or more source columns and apply parsers to that virtual column. Also, a “Remove Column” function, which enables the user to remove columns, enabling the possibility of not having irrelevant data columns in the final scraped data.

Let’s take an example. After scraping multiple pages of an ecommerce web page, the data comes out looking something like this:

Web Scraper Cloud Parser Feature Data Preview

Web Scraper Cloud Parser Feature Data Preview

Data like this is not practical for further analyzing, it is hard to read and review. To adjust the data for an easier use, for this particular example we started with removing all the unnecessary columns, such as “web-scraper-order”, “web-scraper-start-url” and such, by applying the “remove column” function of the Parser feature.

Web Scraper Cloud Parser Feature Remove Columns Function

Web Scraper Cloud Parser Feature Remove Columns Function

Then with “Replace Text” parser we make sure that the “price” data output is without a dollar sign

Web Scraper Cloud Parser Replace Text Function

Web Scraper Cloud Parser Replace Text Function

And finally, we create a simple “RegEx Match” parser to extract only numbers from the “reviews” column.

Web Scraper Cloud Parser Feature RegEx Regular Expression Parser Function

Web Scraper Cloud Parser Feature RegEx Regular Expression Parser Function

After all these steps, all the necessary data process has been done and the data output will look more simplified, easier to use and analyze.

Web Scraper Cloud Parser Feature Neat Data Processing

Web Scraper Cloud Parser Feature Neat Data Processing

The Parser feature is easy to use. A basic knowledge of RexEx is useful; however, not obligatory. The modular design allows to chain multiple parsers together, allowing the user to create, for example, multiple very simple replace text operations than only one parser with a more complicated configuration.



Go back to blog page