Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Mind Your Internet Manners: When, Where and How to Crawl for Data

byRan Geva
March 22, 2018
in Tech
Home News Tech

A wealth of information, an ocean of data – and more funny cat videos than you could watch in a lifetime. The internet is all that and more, at the service of humanity that seeks to know more, do more and be more than ever before.

Much of that data is out there for the benefit of web users – within limits, of course. Some websites are happy to share their data with others; some aren’t. Websites that provide services to users – stock tips, information about jobs and salaries, TV or movie recommendations – need to fetch the data that visitors will be seeking in order to make the service useful, and they get that data by sending out web crawlers. Other sites privately use the data they collect to advance their own businesses, in turn sharing data they generate for the benefit of mankind.

Trust makes the internet go ’round

Of course, a proper web crawling system observes the “rules of the Internet road” – those rules laid out in the Robots.txt file – avoiding collecting data from sites that have logins (which indicates that the data is not there to be freely collected). Comparable to many other aspects of life, sharing web data is based on trust; bad players who break that trust make things harder for the decent folk who collect data according to the rules. Failure to observe those rules creates internet chaos, and destroys the bonds of trust between data owners and the Internet community. Plus, it’s just bad manners.

The internet community needs to keep that trust intact. Data site crawling democratizes access to information and makes otherwise difficult-to-access data easily available to people who need it. Government data on jobs and salaries, for example, could be used by an employment site to give users an idea of what a realistic salary is for their specific profession based on experience and location. Investment analysis sites would crawl a site that has information about stock prices, history, trends, etc., and use that data in planning forecasts.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

When trust is broken

A recent case involving a company called HiQ Labs and LinkedIn illustrates what could go wrong in the trust relationship. HiQ has been scraping the public profiles of LinkedIn users to keep track of their careers, gathering data from public profiles only. However, LinkedIn took offense to this, claiming that its data was not there for crawlers to “raid.” It should be noted that LinkedIn keeps its data behind a login screen, indicating that it indeed has rules that it expects crawlers to observe. LinkedIn accused HiQ of violating the Computer Fraud and Abuse Act (CFAA), committing the internet equivalent of wire fraud.

HiQ has claimed that it did nothing wrong, and that it did not violate any laws or agreements. According to attorneys for HiQ, “To choke off speech and the precursor of speech, the gathering of facts and the analysis of information, is a dangerous path down which we should not go.” As egregious as the LinkedIn people consider HiQ’s tactics, the court has so far agreed with HiQ, saying that LinkedIn’s claim of violations of the CFAA is out of place. LinkedIn, which feels it has a strong case, is appealing.

Walking the thin line

This just goes to highlight the thin line between legitimate crawling versus impolite (at the very least) scraping. As mentioned, the internet is a cooperative in a sense – sites that provide services, as well as others who have a need for data, must cooperate with those providing the data. In the final analysis, the “transaction” of crawling/sharing is dependent on the goodwill of both sides. Information is there to be used, not abused, and if the latter happens, it ruins it for the rest of us.

It would be a worthy idea for those who believe in responsible crawling to work together to root out those who give them a bad reputation. Done properly, web crawling opens up information, promotes freedom and enhances democracy.

Like this article? Subscribe to our weekly newsletter to never miss out!

Tags: surveillance

Related Posts

iPhone 17 Pro is expected to arrive with 48mp telephoto, variable aperture expected

iPhone 17 Pro is expected to arrive with 48mp telephoto, variable aperture expected

September 5, 2025
XChat expands to all X users, adds encryption

XChat expands to all X users, adds encryption

September 5, 2025
14 benefits of using the Fannie Mae Income Calculator early

14 benefits of using the Fannie Mae Income Calculator early

September 5, 2025
WhatsApp status to add close friends like Instagram

WhatsApp status to add close friends like Instagram

September 4, 2025
Samsung Galaxy Tab S11, Ultra feature Dimensity 9400+

Samsung Galaxy Tab S11, Ultra feature Dimensity 9400+

September 4, 2025
Galaxy S25 FE gets One UI 8 before other S25 models

Galaxy S25 FE gets One UI 8 before other S25 models

September 4, 2025
Please login to join discussion

LATEST NEWS

Texas Attorney General files lawsuit over the PowerSchool data breach

iPhone 17 Pro is expected to arrive with 48mp telephoto, variable aperture expected

AI chatbots spread false info in 1 of 3 responses

OpenAI to mass produce custom AI chip with Broadcom in 2025

When two Mark Zuckerbergs collide

Deepmind finds RAG limit with fixed-size embeddings

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.