A Quick Guide to CSS and jQuery Selectors for Web Scraper

June 28, 2022

JQuery selectors, web scraper, Scraping public data, scraper, CSS selectors

Selectors when it comes to CSS and jQuery refer to code you can use to interact with specific elements in the DOM. They are particularly useful when you want to style or act upon certain elements of the document without altering other elements. Through a careful use of selectors, you can manipulate a page exactly as intended or isolate the specific data you’re looking to scrape.

Requirements

To best benefit from this article, you’ll need at least basic knowledge of HTML and the DOM. Beyond this, you’ll need minor experience in CSS and jQuery. We’ll show how to use selectors within the context of the Web Scraper Chrome tool, so this extension can simplify this process as well as help you follow along with the guide.

What to Expect

In this article, we’ll cover 19 of the most relevant CSS selectors as well as nth children and nth-of-type selectors. Afterwards, we’ll cover a few jQuery selectors with a focus mainly on how you can chain them like if-then statements. For all these selector examples, we’ll be isolating elements on the webscraper.io home page.

CSS Selectors

The CSS Selectors we’re going to cover primarily relate to isolating html elements through classes, ids, element types, and attributes. By understanding the notation and logic used in the most common selectors, you can concisely write and combine selector logic to be even more precise with which elements you select from a page.

.class

The class selector selects all elements related to a specified class. For example, the .under-hero__content class selects all elements with this class found on the page:

.class1.class2

The class selector can be extended to include a second class attribute when more than one describes an element. In this example, we select specific blocks of text by referencing two elements home-cta__title and home-cta__title–testi:

image

.class1 .class2

By leaving a space in between class names within the selector, you can select a second class which is a descendent of the first class. In this example, we can see the .home-features__text class descended from the .cell class which contains a header and subtext:

class image 3

#id

Referencing the id with a hashtag lets you select all elements matching this id within a page . Here the #menu-main-menu applies specifically to the main navigation bar:

id image 1

*

Perhaps you want to select all elements on a page. This selector serves as a catch-all. We see all page elements captured here:

select all image

Element-Type

You can select all of a type of element like p or div by referencing the element type. Here we see only the span elements are selected:

element type image 1

Element-Type.class

By combining the element type and a class selector you can more specifically select elements on a page. Unlike the class selector alone, this will only choose the element-type with the corresponding class. Since classes can still apply to multiple of the same element-types, this can yield more than one result. Here we reference div elements only with the home-cta__text class:

element type class image

Element-Type#id

By combining the element type and an id selector you can pick out a specific element on the page with precision. This is functionally the same as the id selector alone since every id is unique in the document, and it will always yield only 1 result. Here we reference an li element only with the id corresponding to the first menu item:

elementy type id image

Element-Type-1, Element-Type-2

You can reference multiple comma-separated element types to increase the range of your selection. Here we can capture both p and h3 types:

element type image 2

Element-Type-1 Element-Type-2

You can limit your selection to only element types which are inside of a specific element type. A direct parent-child relationship doesn’t matter here as long as Element-Type-2 is within Element-Type-1. Here we can isolate span types only found within h2 types:

element type image 3

Element-Type-1>Element-Type-2

You can limit your selection to only elements which are direct descendants of another element. This differs from the previous selector because Element-Type-1 must be an immediate parent of Element-Type-2. Here we select a span type immediately within a p type:

element type image 4

Element-Type-1+Element-Type-2

You can limit your selection to only an Element-Type-2 placed directly after an Element-Type-1. In this case, only the ordering of the elements is relevant. We can see which p elements immediately follow an h2 element here:

element type image 5

Element-Type-1~Element-Type-2

Conversely, you can select for an Element-Type-2 which precedes an Element-Type-1. Here we select for ul elements only with a p element preceding them.

elementy type image 6

[attribute]

Select all elements with the attribute applied to the element. Here we can select for elements where the target attribute exists irrelevant of the content of the attribute:

attribute image 1

[attribute=value]

Select all elements where the attribute matches a specific value. Here our class attribute must equal only “home-features__text”. This, essentially, functions in the same way as using a class selector .home-features__text :

attribute image 2

[attribute~=value]

Select all elements where the attribute contains a word equal to the value. In this case we’re selecting for elements with the class attribute containing the “cell” word:

attribute image 3

[attribute|=value]

Selects all elements where the attribute starts with or equals the value. Here we look for the class attribute starting or equaling “home”:

attribute image 5

element[attribute^=”value”]

Selects every element type with an attribute beginning with the value. Here we’re looking only for class attributes beginning with “button”, so we can identify all the buttons on a page:

attribute image

element[attribute$=”value”]

Select every element with the attribute ending with the input value. In this case, we look for href attributes ending in “pricing-section” to select elements related to pricing links.

attribute image

element[attribute=”value”]*

Select every element with the attribute containing the input value. In this case, we look href attributes containing “test”. This works best when working with generated classes, where one item has the value of [href='test-123'] and another has been set to [href='test-345'], using the selector [href*='test'] will return both elements:

attribute image

element:nth-child(#)

Nth-child selects the element which is a child at the # position below a parent element when there are group of siblings which make up the parent’s child elements. With this selector, we don’t have to be specific about the parent element. Here we’re selecting for a p element which is the 2nd child of its parent. In this case, it follows a div which is the first child:

nth image 1

element:nth-of-type(#)

Nth-of-type selects the element which is at the # position among multiple of the same element type siblings under a parent element. For this selector, it will ignore the position for siblings which are not of element type. Here we’re selecting for the element which is the 2nd p element (it is not the 2nd element of the siblings) to appear in a group of siblings:

nth-of-type image 1

jQuery Selectors

jQuery Selectors Overview

We’re going to cover some of the main jQuery selectors particularly with respect to contains and has. Then, we’ll show how you can chain the two together for more specific element selection.

element:contains(‘text’)

Contains lets us choose specific elements which contain the input text string. Here we want an h3 containing the string “Point”:

jQuery :contains() image

element:not(:contains(‘text’))

By adding not before the contains selector, we can choose specific elements which do not contain the input text string. Here we want an h3 element not containing “Point”:

jQuery:not:contains()) image

Element-Type-1:has(Element-Type-2)

Matches an Element-Type-1 only if there is an Element-Type-2 anywhere in its descendents. Here we select for ul elements containing li elements:

element :has() image

Element-Type-1:not(:has(Element-Type-2))

Matches an Element-Type-1 only if it does not have Element-Type-2 anywhere in its descendents. Here we select for link elements which do not have img elements under them:

element :not:has()) image

Chaining These Selectors

While there are many other jQuery selectors, the reason it’s valuable to use these selectors is you can leverage them almost exclusively to logically isolate elements on a page. For both :has() and :contains(), you can consider them almost as the if-clause in an if-then statement. For example, if element div has element p or contains text, then perform some action in jQuery. Conversely, you can use the not selector to invert the logic.

All this being said, you can chain them together to isolate specific elements on a page you’re scraping. Here we look for all li elements which have descendents which do not contain the text “Pricing”.

chained selectors image

Here’s another example where we chain CSS selector conditions with jQuery selector conditions. By specifying the hierarchy between the .dropdown class, the select element, and the option elements which do not contain the ‘Select colour’ text, we’re able to isolate the three colour options for this product.

chained selectors image 2

Summary

As you’ve seen here, CSS and jQuery selectors can be very useful for web scraping. We showed examples of some of the most valuable ones you’d typically use to set up a scraping job. By knowing both sets of selectors, you can maintain greater flexibility when it comes to selection while also having the power to isolate whatever text, images, links, or other elements you need.

Extra Reading

Go back to blog page

A Quick Guide to CSS and jQuery Selectors for Web Scraper

Requirements

What to Expect

CSS Selectors

.class

.class1.class2

.class1 .class2

#id

*

Element-Type

Element-Type.class

Element-Type#id

Element-Type-1, Element-Type-2

Element-Type-1 Element-Type-2

Element-Type-1>Element-Type-2

Element-Type-1+Element-Type-2

Element-Type-1~Element-Type-2

[attribute]

[attribute=value]

[attribute~=value]

[attribute|=value]

element[attribute^=”value”]

element[attribute$=”value”]

element[attribute*=”value”]

element:nth-child(#)

element:nth-of-type(#)

jQuery Selectors

jQuery Selectors Overview

element:contains(‘text’)

element:not(:contains(‘text’))

Element-Type-1:has(Element-Type-2)

Element-Type-1:not(:has(Element-Type-2))

Chaining These Selectors

Summary

Extra Reading

element[attribute=”value”]*