Public data scraping is not a problem according to the US Court of Appeals for the Ninth Circuit. The court recently ruled that data scraping from a public website does not constitute computer fraud under the Computer Fraud and Abuse Act (CFAA).
In 2017, HiQ filed a lawsuit against LinkedIn’s efforts to prevent it from scraping data from users’ profiles. After determining that the CFAA – which outlaws accessing a secured computer – doesn’t apply due to the data being public, the court prohibited Linkedin from terminating HiQ scraping data.
In 2019, the US Court of Appeals for the Ninth Circuit reversed the previous decision and ruled in favor of HiQ. LinkedIn filed an appeal in March 2020, arguing that technical limitations and sending a cease-and-desist letter were invalidating authorization. As a result, any subsequent data scraping is unlawful and violates the CFAA.
US Court of Appeals for the Ninth Circuit filing clears the air about public data scraping
It can be read from the US Court of Appeals for the Ninth Circuit filing (PDF) that:
“At issue was whether, once hiQ received LinkedIn’s cease-and-desist letter, any further scraping and use of LinkedIn’s data was ‘without authorization’ within the meaning of the CFAA. The panel concluded that hiQ raised a serious question as to whether the CFAA ‘without authorization’ concept is inapplicable where, as here, prior authorization is not generally required but a particular person—or bot—is refused access.”
Several of LinkedIn’s technical security measures to prevent data scraping are highlighted in the filing:
- Prohibiting its servers from being accessed by search engine crawlers and bots – other than, naturally, Google – using the website’s usual ‘robots.txt’ file.
- A system called ‘Quicksand’ that is able to identify scraping activity through non-human behavior
- A system called ‘Sentinel’ system that has the ability to slow down activity from untrusted IP addresses.
- Another system called ‘Org Block’ that is able to create a list of known IP addresses which may have connections with large-scale scraping activities.
LinkedIn claims to block over 95 million automated attempts to scrape data every day overall.
The US Court of Appeals for the Ninth Circuit court backed up the previous decision, finding that “the balance of hardships tips significantly in HiQ’s favor,” and that depriving LinkedIn access would put HiQ’s existence at risk.
The CEO of HiQ’s stressed out the importance of the public access abilities:
“hiQ’s entire business depends on being able to access public LinkedIn member profiles. There is no current viable alternative to LinkedIn’s member database to obtain data for hiQ’s Keeper and Skill Mapper services.”
However, LinkedIn’s appeal (PDF) claims that the decision has broader ramifications according to the company’s attorneys:
“Under the Ninth Circuit’s rule, every company with a public portion of its website that is integral to the operation of its business – from online retailers like Ticketmaster and Amazon to social networking platforms like Twitter – will be exposed to invasive bots deployed by free-riders unless they place those websites entirely behind password barricades.”
“But if that happens, those websites will no longer be indexable by search engines, which will make information less available to discovery by the primary means by which people obtain information on the Internet.”
AI companies will be pleased by US appeal court’s decision
AI companies that often rely on mass data-scraping will undoubtedly be pleased with the court’s decision.
The decision of the US Court of Appeals for the Ninth Circuit to strike down the provision prohibiting AI companies from using algorithms that regularly rely on large amounts of data scraping will please many AI firms.
Authorities and privacy groups have targeted companies like Clearview AI and others that scrape billions of photos from public websites to power their facial recognition technology.
Clearview AI lawyer Tor Ekeland told CoinDesk that the “common law has never recognised a right to privacy for your face.”
The case against LinkedIn was ultimately rejected, but the topic of mass data scraping is still highly divisive. Supporters will argue that the decision by the appeal court was correct, while opponents will express their reservations about normalizing the practice.