Web Scraper 0.5.0 Release
August 21, 2020
Update, release
We are happy to announce that Web Scraper 0.5.0 has been released! This release contains new features such as a new data selection UI engine, a new page load detection system, a welcome page, and a whole lot more!
User Interface (UI)
The visually most noticeable feature of the new release is the new data selection UI engine. It is faster and more resilient to websites having CSS rules that break the UI. For example, when selecting a link with a mouse click, the old UI, in most cases, would trigger a redirect which would reset the selection toolbar; however, now when selecting a link or a button, the new element selection UI will prevent redirects and other events to keep the page state unchanged.
Welcome Page
To reduce the learning curve, with the new release, we have launched a welcome page. Once the extension is added to your browser, a startup guide opens introducing the first steps of how to begin data extraction with Web Scraper. The new welcome page is like a step-by-step guide, an explanation manual, introducing processes such as moving the toolbar to the bottom of the page, instructing where and how to add selectors, and how to launch your very first scraping job.
Load Detection System
A new page load detection engine has been added. It will handle a lot of edge cases, such as:
- Immediate redirect after page load;
- Service workers;
- Hashtag changes;
- Quicker load when there is a slow loading asset;
- Won’t fail on an error page if there is an immediate redirect to a successful page;
- Data extraction will be retried if a redirect occurs during the data extraction process;
- Improved content-type checking.
Nowadays, a bunch of websites have implemented URL redirection when loading their pages. This is done for various reasons such as browser fingerprinting, to only allow real browsers access to a website, URL shortening, to prevent broken links when web pages are moved, to guide navigation into and out of a website, and so on. This motion of URL redirection can get the scraper confused about whether the page has fully loaded or not. This confusion causes a scraping slow-down and increases data loss. We have developed the load detection system that deals with these exact problems. Thanks to the added feature, our scraper is able to detect whether the website has fully loaded or not; therefore, starting to retrieve necessary information only when all of the data has been rendered, making data extraction with Web Scraper faster and more reliable.
If you want to discuss web scraping, request features, ask questions or submit bugs, visit our friendly forum at https://forum.webscraper.io/