AI is about to get expensive
by Matt Ober, General Partner at Social Leverage
The cost of training LLMs is about to get more expensive. The downstream cost of leveraging AI, I believe, will become more expensive or be continuously funded by VC money in the near team. We are already seeing the lawsuits pop up with the New York Times. Owners of data and content want to get paid!
This reminds me of web scraping, which hedge funds have been doing for the last decade. I have navigated the world of web scraping for the last 12 years, figuring out what is fair use, and what is a gray area. I know from the top hedge funds I have worked at, we had a very clear rule that said, “If a company was charging for their data or content, we couldn’t scrape to get around paying.”
Data and content providers have fought these web scraping battles before, and we have seen them win. Linkedin has had many lawsuits over the years with companies scraping their data, including the most recent HiQ case. At the end of the day, there is a lot still to be decided by the courts. We may see rulings in the next 12 to 24 months that make it more clear that web scraping is completely legal with no restrictions. I find it hard to believe, but there is an argument that if it’s on the web, it’s fair use.
Over the last few months, I have heard from dozens of data and content owners, and they are currently—or plan to—charge different price points for any client that wants to use their data for AI training. This is a significant opportunity in my opinion, as companies that are sitting on unique, proprietary data assets now have an opportunity for a potential new revenue stream and potential new clients. Lots of specific companies come to mind that will look to find clients in 2024 who want to train their AI on their data.
I think the value of proprietary data will continue to grow in 2024, and I am excited to see companies build new revenue streams and learn what the market is willing to pay. The financial services industry is a perfect candidate to begin using unique datasets for training LLMs that are experts in the markets. Looking forward to seeing the early winners!