Documentation

Selectors

Web scraper has multiple selectors that can be used for different type data extraction and for different interaction with the website. The selectors can be divided in three groups:

  • Data extraction selectors for data extraction.
  • Link selectors for site navigation.
  • Element selectors for element selection that separate multiple records

Selector types

Data extraction selectors

Data extraction selectors simply return data from the selected element. For example Text selector extracts text from selected element. These selectors can be used as data extraction selectors:

Link selectors

Link selectors extract URLs from links that can be later opened for data extraction. For example if in a sitemap tree there is a Link selector that has 3 child text selectors then the Web Scraper extract all urls with the Link selector and then open each link and use those child data extraction selectors to extract data. Of course a link selector might have Link selectors as child selectors then these child Link selectors would be used for further page navigation. These are currently available Link selectors:

Element selectors

Element selectors are for element selection that contain multiple data elements. For example an element selector might be used to select a list of items in an e-commerce site. The selector will return each selected element as a parent element to its child selectors. Element selectors child selectors will extract data only within the element that the element selector gave them. These are currently available Element selectors:

Selector configuration options

Each selector has configuration options. Here you can see the most common ones. Configuration options that are specific to a selector are described in selectors documentation.

  • selector - CSS selector that selects an element the selector will be working on.
  • multiple - should be checked when multiple records (data rows) are going to be extracted with this selector. Data extracted from two or more selectors with multiple checked wont be merged in a single record.
  • delay - delay before selector is being used.
  • parent selectors - configure parent selectors for this selector to make the selector tree.

Note! A common mistake when using multiple configuration option is to create two selectors alongside with multiple checked and expect that the scraper will join selector values in pairs. For example if you selected pagination links and navigation links these links couldn't be logically joined in pairs. The correct way is to select a wrapper element with Element selector and add data selectors as child selectors to the element selector with multiple option not checked.

Selecting elements

Select tool

Web Scraper has a point and click select tool that makes selecting elements accessible for all users. It can be opened by clicking on Select in selector creation interface. It will highlight elements that it will select on click in yellow and already selected elements will be highlighted in red. Elements can be deselected by clicking on them again while selector tool is still active.

Fig. 1: Point and click select tool

Hotkeys

Hotkeys can be enabled by clicking on a select tool after opening it or by selecting an element.

  • P: expend the selected selector to a parent selector.
  • C: change selected selector to a child selector. Can be used only if P key has been used prior.
  • S: select the highlighted element. Useful for selecting elements that change or change something on the page when clicked on.
  • Hold Shift: select multiple element groups.

Fig. 2: Select elements with hotkeys

Manual element selection

As we are always trying to improve the select tool to find the best CSS selector for the selected elements, there are still situations where it is required to manually specify a CSS selector. Some HTML and CSS knowledge is required to write CSS selectors. You can find more info about CSS selectors on CSS selector page.

Related videos


Was this page helpful?