Scraping E-commerce the Classical Way

September 16, 2020

Cleaning scraped e-commerce data with Web Scraper, Pagination selector Web Scraper tutorial, Web Scraper extract product titles and details, How to create selectors Web Scraper, Classical method e-commerce web scraping, How to scrape product categories with Web Scraper, E-commerce product data extraction Web Scraper, Web Scraper parser feature for data transformation, Web Scraper sitemap creation tutorial

Scraping-Ecommerce-Classical-Way-Blog-Image-Web-Scraper-Blog

With the enormous growth and development in technology, data being the main driver of modern and fast-growing companies, the online business has evolved over recent years. However, it comes as no surprise since ordering, reserving goods, and services online, while not leaving the house, is a huge time saver and accessible for everyone with a stable internet connection.

Seems like everything is centred around data nowadays, working as the fuel that moves businesses forward in this digital age. With that comes the importance of data collection. In this blog, we will show the classical way of applying Web Scraper to your e-commerce data extraction needs.

The Classical Method

The first and most primitive, also the most intuitive way of scraping with Web Scraper. It is by mapping the site using the point-and-click system to set the parameters for the scraper to follow and extract the target data. For example, first category selectors, to subcategories, product links, and the prices, names, descriptions, and so on.

Okay, this might confuse some; therefore, let us show an example to better explain:

First and foremost, to start working with Web Scraper, we need to create a sitemap that we can further develop and designate selectors to which data needs to be retrieved and how.

To do that, click on the “Create new sitemap” and decide upon a custom sitemap name, and copy-paste the website's URL, which you would like to use as the starting point for the scraper, then click “Save Sitemap”.

Create-Sitemap-First-Step-Web-Scraping-Ecommerce-Web-Scraper-Blog

Now, let us create the first selector. This will be a bit harder one in our case because of the website we have chosen. To select the two categories of “women” and “men” we are going to use the “Inspect” log of the developer tool and input a custom selector like this:

.accessible-navmenu > li[data-behavior="mega_menu"] > a:contains("Womens"), .accessible-navmenu > li[data-behavior="mega_menu"] > a:contains("Mens")

Keep in mind and make sure that this selector will be a link selector, and the “multiple” log has to be checked since various entries are needed. And we are going to name our first selector “gender” and click on “Save selector”.

Creating-First-Selector-Gender-Category-Link-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Now, diving deeper, we click on the previously created “gender” link-selector in the toolbar, and on the website go to the “women's” section. This is necessary because now we are going to create a child-selector (meaning that a selector under a selector, a new selector branch from a previous one will be created). Another link-selector that will visit each of the subcategories, and call it the “category-url”.

Selecting-Subcategories-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Now, let’s visit the first subcategory of the website, and in the developer toolbar, select the previously created selector. We have gotten to the product-list page, and this is where we are going to create our pagination selector. In general, the pagination link selector works you selecting the pagination links, which are usually below (footer) or above (header) the page content; however, this case is a little bit more difficult; therefore, we are going to create our pagination link by deriving the necessary information from the “Inspect”:

.search-results__footer a.c-btn--block[rel="next"]

Creating-Pagination-Link-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Also, another very important part when creating the pagination selector is to designate it as a child selector for the “category-url” and itself (as in two parent selectors). And do not forget to tick the “multiple” log, since various pages need to be scraped, then click on “Save selector”.

Now, on the same product-list page and under the “category-url”, we create the “product-url” selector. Here, it is also important that the “pagination” link-selector is selected as the second parent selector!

Creating-Product-URL-Link-Selectors-Scraping-Classical-Way-Web-Scraper-Blog

And now the third and final selector under the “category-url” selector will be a text selector, which will indicate from which category the product comes. is important not to check the “multiple” log, since only one entry is needed, and for this selector also, two parent selectors are needed to be designated - the “category-url” and the “pagination” selectors.

Category-Indicator-Text-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Now we visit the first product through the website and select the previously created “product-url” selector. We have gotten to the last steps of this e-commerce page. For the scrapers to collect the tiles of the product, we create a text selector. Important that the “multiple” is not checked, since, for each scraper, each “product-url” visit needs to retrieve only the one and specific product title.

Product-Title-Text-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Now we create another text selector to retrieve the price. Keeping in mind that this needs to be a child selector of the “product-url” link selector, same as for the “product-title” selector. And, the same as the “product-title”, we leave the “multiple” unticked.

Product-Price-Text-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

As our last but not least step on this e-commerce web page, we create a text selector that will retrieve the colour of the product.

Product-Color-Text-Selector-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Great! Now, to make sure that we have all the selectors in all the right places, we check the selector graph tree. The order of the selectors plays a crucial role when scraping with Web Scraper, most especially when scraping with the classical method. It explains the scraper of when (when the specific page has to be visited) and what (what in the page has to be retrieved).

Selector-Graph-Tree-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Since we have made sure that our selectors are in the right positions, we then click on “Scrape” and watch how the scraper beautifully runs and does the data collection for us.

Scraping-Data-Ecommerce-Classical-Way-Web-Scraper-Blog

Parser (Additional)

That is all of how to scrape an e-commerce site with the classical method. However, since we love our data transformed and cleaned, we imported the sitemap into Cloud Scraper and went to apply the Parser feature. Firstly ,simply just to delete the unnecessary columns like “web-scraper-order”, “web-scraper-start-url”, and such.

Deleting-Unnecessary-Columns-Parser-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Then, with a simple “Replace text” parser, we eliminated the repeating word of “Colour” in the “product-color” column, to simply project each colour in the cell and nothing more.

Replacing-Unnecessary-Strings-Parser-Data-Cleaning-Transformation-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

And lastly, created another “Replace text” parser for the “product-price” column to delete the “£” currency symbol.

Removing-Symbols-Parser-Data-Transformation-Cleaning-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

Now our data looks more consistent and neater.

Parser-Output-Data-Transformation-Cleaning-Scraping-Ecommerce-Classical-Way-Web-Scraper-Blog

That is all! The classical way of scraping data can be a hassle (depending on the webpage and the data that you are looking to gather); however, it can be quite interesting to watch and manually designate selectors.

For more information about the different selectors, visit the documentation.

For more information about the Parser feature or the tutorial video of this blog, see our YouTube channel.

Go back to blog page