Scraping public data. Is it legal?
March 03, 2022
web scraper, GDPR, data extraction, linkedin, Scraping public data, web scraping, facebook
We’ve all been hearing about the ongoing cases of companies failing lawsuits against web scraping activities. Worth mentioning are the cases of LinkedIn vs. HiQ on public data scraping and the case of Facebook Inc. vs. BrandTotal Ltd. on scraping data from Facebook. What is more interesting is that in one case it was ruled in favour of the data scraper, but in the other - of the company whose platform is being scraped. Seeing such cases with different outcomes raises questions, especially due to the recent General Data Protection Regulation (GDPR), on how lawful web scraping of publicly available data actually is?
Why do similar cases get different outcomes?
HiQ Labs Inc. (HiQ) is a data analytics company that by using automated bots scraped LinkedIn of information that LinkedIn users had posted on their profiles. Consequently, LinkedIn Corp. sent HiQ a cause-and-desist letter, demanding HiQ to stop accessing and scraping data from LinkedIn’s server and in response HiQ filed a lawsuit against LinkedIn. The court up until now has ruled in favour of HiQ by allowing access to publicly available data, but it cannot be clearly stated for sure as the proceeding might continue to the high court.
However, in the case of Facebook v. BrandTotal Ltd where BrandTotal Ltd used Google Chrome extensions to scrape data from Facebook users, the court ruled in favour of Facebook Inc which was the final decision. The main difference currently between both cases lies in the fact how the web scraping was carried out. In the latter case, web scraping was done in a fraudulent manner, through an extension which did not mention in any of Terms and Services that it collects vast amounts of data from Facebook users, secondly, Facebook Inc. user Terms and Services include a clause which prohibits to “access or collect data from Facebook's products "using automated means" without Facebook's permission”, therefore, any scraping of data constitutes a breach of contract.
As it can be seen in both cases, scraping publicly data can be both - somewhat lawful and unlawful if all precautionary steps and procedures have not been complied with. It is highly advised to follow any new updates on data scraping and lawfulness because the HiQ vs. LinkedIn case may impose a new landmark case on this matter in the near future.
In the cases of HiQ vs. LinkedIn and Facebook Inc. vs BrandTotal Ltd the question at hand is on the applicability of the Computer Fraud and Abuse Act (CFAA) to data that’s publicly available online, however, how would it be in the case of GDPR?
Data scraping of publicly available data and GDPR
If you have decided to scrape publicly available data, there are several things you should take into account before starting:
1. Establish a legal basis for using the data, understand the amount you are intending to scrape and from what source
So you have made a decision to scrape data - start off by understanding whether it is a publicly available source, e.g., it does not require registration before accessing data, or it does not explicitly state that using this data is prohibited. After you have established that the source indeed is public, understand for what reasons you are scraping the data so you would be able to present a legitimate reason for data acquisition if needed - If you want, you can get acquainted with the information on 6 legal bases for lawful processing in Article 6 of GDPR. Lastly, understand what amounts of data will be scraped and that no unnecessary data will be acquired.
2. Does the data contain special category data?
After understanding how much data is under the subject, make sure no data of special category are included and scraped without any need to avoid any confrontation. Special category data are any data regarding a person's race, ethnicity, sexual orientation, health data, biometric data, etc. However if you have come to a conclusion that special category data are the ones you need, be sure to especially follow all steps and to ask for the permission of the data subject for the use of data where it is possible. GDPR requires explicit permission for the use of special category data. A data impact assessment may be needed to carry out in certain cases vast amounts of special category data. If the data you scrape is as simple as emails, phone numbers and names, they are not subject to this point.
3. Reading through any terms and conditions of the database publisher, if applicable
In case the data subject has published information himself, it is publicly available information, but exists on a social media platform e.g. LinkedIn, Facebook, or any other platform, in order to not end up in a situation similar to which BrandTotal Ltd. ended up - read through Terms and Conditions of the platform. Understanding whether the holder of the data does not prohibit the use of data and whether the data subject has been informed of the possibility of third parties using their published data are crucial and ensure your understanding of policies regarding scraping as well.
4. For extra protection, you can always check the standing of local Data Protection authorities regarding personal data use from public sources.
In some countries, for instance, in France, the use of public data for marketing purposes is prohibited by national law.
Data scraping does not relieve of the duty to comply with GDPR principles
After following the precautionary steps before scraping publicly available data, it must be weighted on how GDPR steps into this matter. When dealing with or scraping any data, whether publicly available or not, certain GDPR principles must still be taken into account.
- Data minimisation principle - make sure that the scraped data is minimised to the necessary
- Purpose limitation - no data should be obtained that will serve no purpose
- Accuracy - data acquired or in storage should be accurate and frequently updated if changed over time
- Lawfulness, fairness and transparency - the data processing must be carried out lawfully and transparently, not by following BrandTotal Ltd’s example
- Storage limitation - to store the necessary and delete when not necessary anymore
- Integrity and confidentiality - ensure the maximum protection for data acquired
Can I scrape publicly available e-mails and phone numbers?
If you have followed all the previous tips and taken all into consideration - yes, you can. Just be sure you don’t use them for marketing purposes in France and provide an option for a person to opt-out of marketing emails in case you do exercise marketing based on the acquired data, as well as be ready to provide the information on where you acquired the data. Remember that the data of legal persons are not subject to GDPR so you can use publicly available legal person data for marketing purposes, it’s the physical person data you should be careful with.
To conclude… a disclaimer.
It must be taken into account that each activity of data scraping is a personal liability which is undertaken by the person carrying out scraping, therefore, this advice from Web Scraper should not be taken in full. Remember that being acquainted with GDPR and the documents available is a great start! GDPR was designed for prohibiting fraudulent activities with data, to provide transparency and safety of data subjects data, to limit the use of unnecessary data, not to stop businesses from operating. Scraping data under GDPR may seem difficult, however, it is not unachievable by applying the appropriate means.
Article by: Data Protection Specialist Elvīra Krēķe from Legally