The future of web scraping

Matt Ober is a general partner at Social Leverage. Matt was most recently the Chief Data Scientist at Third Point, where he built the data analytics and technology platform used to enhance the firm’s investment capabilities in equity, structured credit, venture capital, and cryptocurrency. Prior to joining Third Point, Matt was the Head of Data Strategy at WorldQuant and part of the WorldQuant Ventures founding team, focused on private investments in fintech, data, and technology companies.

Web scraping is one of those topics we are going to continue to hear about for the next few years. Every AI company is scraping. Everyone is trying to get access to data, and the question of what on the web you can scrape versus have to pay for continues to be debated.

Bright Data, the Israeli data scraping company, just won their lawsuit against X. This is important because Bright Data uses sophisticated scraping technology that allows them to get around anti-scraping attempts. One of the key things that Bright Data supposedly is doing is only scraping data that doesn’t require a login. This is something I have written about in the past, and I continue to believe holds true—the idea that you can stop people from legally scraping your site if you require a click thru or login.

As web scraping technology continues to get more sophisticated, it is going to be interesting to see if we get a bunch of acquisitions in the space. There are some very sophisticated web scraping technology companies that on their own aren’t as valuable. But if you take this technology, and put it in-house with a data-hungry company, you give yourself a ton of upside. For example, we saw Vertical Knowledge getting acquired by Babel Street.

Selling data, using data for training, and scraping data for combining with other data assets is a real advantage and can help you build a moat if you have sophisticated web scraping technology in-house. Maintaining scraping is a pain when you don’t have the right team and technology and it can get very expensive very quickly if you have to rely on third parties.

The scraping roll-ups are right around the corner!