Big Data Needs Data Scientists to Make Sense of It All

In 2011, McKinsey caused a stir among aficionados of big data by saying the country needed 140,000 to 190,000 data scientists.

Universities heard the challenge, and opportunities and programs in data science and advanced data analytics began popping up around the country.

What does a data scientist do, and how does one develop the necessary knowledge and skills? We asked Bonnie Holub, who has a Ph.D. in AI and has watched the discipline develop — she served on advisory boards to two university data science programs.

“They both had strong components of practitioner skills and engagement with industry beyond the walls of the university. The most successful programs included apprenticeships and often are catering to full-time working professionals who are engaged with industry.”

Holub, who has previously worked at Honeywell, Cognizant and PWC, is managing partner, practice lead for data science in Teradata’s central region.

“You need industry knowledge and deep skills. I look for technical skills and business value-oriented mindsets. There are plenty of data scientists who are technically trained and understand statistics and can write the code, but those are just table stakes,” she added.

“We need people who can help develop KPIs and can build the reports and dashboards that companies can use to run their businesses globally in real time.”

A large international bank decided to use enterprise data to improve fraud detection, which was running as low as 30%. Data scientists from Teradata developed machine learning and deep algorithms to detect fraudulent transactions. The result was a 50% detection rate for fraud and a 50% reduction in false positives, saving the bank millions.

Customer experience is a popular buzz phrase, but in some regulated areas, like consumer finance, it can be a requirement to resolve every complaint within a specified length of time. Quick response can also be useful for the business.

“Complaints may contain early warning signs for systemic problems,” said Holub in a presentation. Teradata developed a complaints analysis application that uses AI and machine learning to prioritize complaints, prescribe resolutions, identify emerging issues and drill down to look for causes. Banks saw dramatic time savings from identifying and solving global issues rather than treating them case by case.

Consumer finance organizations, and other regulated industries, are often required to monitor and analyze communications with customers to prevent misleading information.

“The task of screening thousands of employee-to-customer communications can be monumental,” Holub said.

The benefit is saving millions in fines and keeping the firm’s name out of the news headlines, she added. To streamline the process, Teradata used machine learning to dramatically reduce false positives.  Then it deployed natural language processing in a risk prediction workflow. Finally, it developed a user interface to deliver the results to compliance analysts.  

False positives were down 40X and valid positives were up 3X.