How to find all the local restaurants in Yellow Pages using Web Scraper
September 02, 2019
Yellow Pages, Tutorial
Have you ever planned a trip to a different state, city or even a country but did not know any places to have a delicious dinner at? We all know that Yellow Pages is the go-to site if we want to find restaurants in specific areas but it can often be frustrating to get boggled up with bottomless pages on it. Luckily, Web Scraper is here to solve this by allowing you to extract all the information you need from Yellow Pages. So you can enjoy great meals during your trip.
In this tutorial, we will be going to New York and will try to find some restaurants by extracting restaurant addresses, phone numbers and website links using Web Scraper features.
We have included a pre-made Sitemap (Data extraction configuration) at the bottom of the tutorial, so you can seamlessly import it to Web Scraper.
What is Web Scraper?
Web Scraper is a tool which lets you extract data from modern, dynamic websites as well as lets you automate your scraping in Web Scraper Cloud. Currently, it is the most popular scraping extension with 250,000 active users. Web Scraper doesn’t entail any coding (no need to have knowledge of Python or C#), nor it requires new software installation. It simply works in a web browser.
How to install Web Scraper?
Before running your scraper, you need to install it via the Chrome Extension store: https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
After it’s done, you should be able to see Web Scraper icon in your browser top right corner. Now:
- Press three vertical dots that are placed next to the icon;
- Choose “More tools” then “Developer tools”.
A pop up will show up on the right side of you screen.
- Press three vertical dots at the top of the window
- Change the “Dock side” so that the pop up is placed at the bottom of the screen
- There you will be able to find Web Scraper tab
How to create Sitemap?
It all starts with creating a sitemap but before you do that you need to choose where you want to extract the necessary information from. In this case it will be YellowPages.com, section “Restaurants” and location “New York, NY”.
Once you have chosen the categories you are interested in, you can start creating the sitemap. To do that:
- Click “Create new sitemap” and “Create Sitemap”;
- Choose how you want to name your sitemap, in this case we will go with “yellowpages”;
- Use the search query URL as the Sitemap Start URL In this example the URL is: https://www.yellowpages.com/new-york-ny/restaurants
What about multiple pages / pagination?
In order to scrape data from multiple pages, you can: 1. Go to the second page of our restaurant selection; 2. Copy this page’s URL; 3. Put this link as your Sitemap URL;
The only difference between the first chosen sitemap URL and the second one is that “ ?page=2 ” was added to the URL.
Now replace the “2” with “ [1-x] ” (x represents the number of pages you want to scrape from);
In this example we have 5 pages; therefore, the end result Sitemap URL should look like this: https://www.yellowpages.com/new-york-ny/restaurants?page=[1-5]
Lastly, click “Create Sitemap”. The first step is done!
How to create selectors for data extraction?
Now you should be able to see a sitemap with your given name and a button “Add new selector” - press the blue button.
The first selector will be the Element wrapper selector, so, let's name it like this. You can see that each restaurant and the information about it is in a box that contains many elements.
- This is why your chosen selector type will be “Element”;
- Press “Select”;
- Go to the first Restaurant “box”;
- Click on it and then click on the couple of next restaurants;
After you have done that, the scraper should automatically select all of the rest of the element groups.
- Now press “Done selecting!”;
- Tick “Multiple” checkbox, which will instruct the scraper to extract data from every wrapper element.
- Once you click “Save selector”, your first selector is created.
The base of the selectors has been made, the further selectors will rely on your preferences of what kind of information you want to extract. In this specific case we are interested in phone number, address and website URL of the restaurant.
- Click on the selector that you have just created to go into it. You will now create selectors that will be executed within each wrapper element.
- Choose “Add new selector”.Now name this selector as you wish ( in this case the first selector will be named “Restaurant’s Phone Number”);
- Selectors type will be “Text”;
- Press “Select”;
- Now choose only the text with the telephone number of the restaurant;
- Click “Save selector”.
Similarly you can create other text selectors to extract address, title, description and other necessary information from Yellow Pages.
Web Scraper also provides the opportunity to see how the scraped data will look. To do that, use the“Data preview” button. By checking this, you can see whether the scraped data is the one you were looking for.
Let’s create a selector to extract website links. This time repeat all the steps given in the previous selector, with the only difference of the selector’s type, which will be “Link”.
Your selectors are all set! Now click on “Sitemap yellowpages” and “Scrape”!
The last step is to download scraped data:
- Click “Sitemap yellowpages”
- Choose “Export data as CSV”
- There just simply click “Download now!”
Once you have mastered this, you can make a number of sitemaps and selectors for different types of elements to find what you are looking for.
Conclusion
Web Scraper lets you scrape data from websites like Yellow Pages to find the information you need the most. Finding the right data can be possible for anyone, you just need to remember the basic steps: 1. Create sitemap; 2. Create parent element and child selectors; 3. See “Data preview”; 4. And Scrape!
It might seem complicated at first, but as you continue to scrape, it will be a piece of cake. Hope this was helpful and hope it will make your scraping experience better. Stay tuned for our upcoming posts!
Sample Sitemap
Here is a sample sitemap that we made:
{"_id":"yellowpages","startUrl":["https://www.yellowpages.com/new-york-ny/restaurants?page=[1-5]"],"selectors":[{"id":"Element wrapper selector","type":"SelectorElement","parentSelectors":["_root"],"selector":".result div.srp-listing","multiple":true,"delay":0},{"id":"Restaurant’s Phone Number","type":"SelectorText","parentSelectors":["Element wrapper selector"],"selector":"div.phones","multiple":false,"regex":"","delay":0},{"id":"Restaurant's Address","type":"SelectorText","parentSelectors":["Element wrapper selector"],"selector":"div.street-address","multiple":false,"regex":"","delay":0},{"id":"Restaurant's Website","type":"SelectorLink","parentSelectors":["Element wrapper selector"],"selector":"a.track-visit-website","multiple":false,"delay":0}]}
You can import it into Web Scraper and either use it as your template or use it to simply see all the steps we made. To do so you should: 1. Copy the sitemap above; 2. Go to “Create new sitemap” then “Import Sitemap”; 3. In the field “Sitemap JSON” paste the copied sitemap; 4. Rename the sitemap if needed; 5. Lastly click “Import Sitemap”.
After you have done that, you should be able to see the sitemap, parent element and all the selectors we have made.
If you want to scrape from different section of Yellow Pages, you can change the start URL:
- Click on “Sitemap yellowpages”;
- Choose “Edit metadata”;
- Now you can paste any URL you want to.
Legal
Before scraping any website, always check whether the site allows automated data extraction or not. In this case YellowPages.com robots.txt file doesn’t restrict automated website access from robots.