Text selector is used for text selection. The text selector will extract text
from the selected element and from all its child elements. HTML will be
stripped and only text will be returned. Selector will ignore text within
<script>
and <style>
tags. New line <br>
tags will be replaced with
newline characters. You can additionally apply a regular expression to
resulting data.
The regular expression attribute can be used to extract a substring of the text that the selector extracts. When a regular expression is used the whole match (group 0) will be returned as a result.
www.regexr.com is a great site where you can learn about regular expressions and try them out.
Here are some examples that you might find useful:
text | regex | result |
---|---|---|
price: 14.99$ | [0-9]+\.[0-9]+ |
14.99 |
id: H83JKDX4 | [A-Z0-9]{8} |
H83JKDX4 |
date: 2014-08-20 | [0-9]{4}\-[0-9]{2}\-[0-9]{2} |
2014-08-20 |
For example you are scraping news site that has one article per page. The page might contain the article, its title, date published and the author. A Link selector can navigate the scraper to each of these article pages. Multiple text selectors can extract the title, date, author and article. Multiple option should be left unchecked for text selectors because each page is extracting only one record.
E-commerce sites usually have multiple items per page. If you want to scrape these items you will need an Element selector that selects item wrapper elements and multiple text selectors that select data within each item wrapper element.
For example you want to extract comments for an article. There are multiple comments in a single page and you only need the comment text (If you would need other comment attributes then see the example above). You can use Text selector to extract these comments. The Text selectors multiple attribute should be checked because you will be extracting multiple records.