Data operations for capital markets with Clear Street

What

Clear Street is a NYC-based independent, non-bank prime broker replacing the legacy infrastructure used across capital markets. Founded in 2018, Clear Street secured the second phase of its $435M Series B in April 2023, valuing the platform at $2B. Building next-gen infrastructure for capital markets, Clear Street has seen investment from Prysm Capital, NEAR Foundation, NextGen Venture Partners, and others.

Why

To Emilio Schapira, VP of Engineering at Clear Street, the firm’s tech ops let it take advantage of new technology “without the baggage of legacy technology.” Clear Street handles 2.5% of total US equities volume which is about $10B to $15B per day.

“This can be a significant amount of data, so you’re dealing with problems of scalability and parallelization,” Schapira said.

How Clear Street handles data lets it adapt to new technologies quickly, and deliver cutting-edge and reliable solutions to its clients. And the evolution of Clear Street’s data operations reflect its increasingly advanced technological capabilities in-house.

How

Clear Street’s data operations initially began with a central data operations team, who used Python scripts to pull transactional data from different systems. “Then we quickly realized [that] we needed a data warehouse solution,” Schapira said. The company then invested in Snowflake, its main data warehouse, and used an engineering operations team to extract data and produce reports for stakeholders.

But that operational structure “resulted in quite a bit of a bottleneck.” The firm then moved to locating data engineers within each vertical, who produce the scripts that are needed within that team. These teams are complemented by centralized compliance and corporate finance teams.

Clear Street is now moving to more of a centralized infrastructural operation, which will transform files that are dropped from external services into its data warehouse, or data streams that go through Apache Kafka. This creates more of a Platform-as-a-Service operational approach, because engineers define the where the schema of the data, and then send it to the PaaS components.

Where engineers previously used Python scripts that required ongoing maintenance, Clear Street’s data ops team is transforming that within the data warehouse. This allows more data and business analysts to build conciliation reports for compliance using tools at Clear Street’s disposal, including business intelligence tools like Sigma.

As it expands into more asset classes and geographies, Clear Street understands that the compliance landscape becomes more complex. The firm’s data warehouse structure lets its control team easily generate compliance-related reports.

Schapira said that, while generative AI has been “incredible” compared to previous iterations of artificial intelligence, the technology won’t replace data and compliance operations wholesale. It “maybe could replace that first line of scraping and looking and raising [flags],” he said. An investigator still has to go over the case and create a suspicious activity report. As a result, Clear Street has generative AI in its roadmap, but it’s not shifting its roadmap in response to generative AI’s ascendance. Clear Street’s development team is making use of GitHub Copilot, but uses a system of two reviewers to vet the code pushed out by the platform.