Why financial services firms evaluating new machine learning initiatives should leapfrog traditional techniques and deploy deep learning.

At first glance, deep learning has the hallmark hype of a technology trend that prudent institutions would do well to avoid. But beyond their celebrated successes in image recognition and translation, there are subtle and persuasive reasons for financial services organizations to deploy deep learning models, even if they haven’t used simpler machine learning methods before. Deep learning models can be far more powerful than traditional methods, easier to maintain and faster to develop. These advantages outweigh common concerns over talent shortages and model interpretability.

Deep learning models provide several capabilities that are invaluable for financial services applications. The most relevant is how naturally they handle complex transactional sequences of data, such as the entire credit card transaction history of a consumer, or the history of financial statements of a company.

Traditional models either require analysts to summarize past transactions in the form of a number of statistics, or reduce the transactions to a time series of a few continuous values. Neither approach does justice to the rich context of the whole transactional history. Recurrent deep learning architectures such as Long Short-Term Memory layers (LSTMs) are purpose-built for understanding such complex sequences in their entirety.

Deep learning models can also be trained to make predictions about individual identifiers — such as customer, entity, product or security identifiers — directly. For example, in predicting customer behavior, rather than having to rely solely on demographic data or summaries of past transactions as a proxy for each customer, deep learning models can use an embedding layer to observe each customer’s behavior in the entire dataset and predict their future behavior accordingly. This technique works even when you are dealing with millions or even billions of unique IDs.

In addition, deep learning models can handle complex unstructured datasets. They can process and learn from natural language data (underwriting documentation, analyst commentary, customer service communications) and image data (photographs, satellite imagery) that are associated with your transactions and outcomes of interest.

Deep learning models may not outperform specialized hand-tuned algorithms such as regression and gradient boosting machines on small datasets, but they invariably outperform those approaches as data set sizes scale. As a result, when presented with millions of observations, they can be orders of magnitude more effective than traditional techniques.

Yet even when your datasets are smaller, there are good reasons to consider deep learning.

Newly developed frameworks like Google’s Tensorflow provide a unifying palette for solving machine learning problems and allow developers to avoid making costly, simplifying assumptions about your objective or the physics of the system you are modeling. No amount of data or computation will save an algorithm if it’s training objective or assumptions are misaligned with reality. So the flexibility that these frameworks provide can be invaluable for novel applications.

Finally, deep learning models can be more cost-effective to develop and maintain. Because they do not rely on humans to do complex feature engineering (e.g., computing statistics about sequences, text or images), you don’t need to spend months experimenting with features in development. Furthermore, the code for computing these features becomes a maintenance burden, and complex engineering pipelines are often required for computing them on your history of data and ensuring they are available in real time for making predictions. Deep learning obviates the need for all of this engineering complexity, and because the codebase is simpler, the models are often more stable and robust in production.

Reservations about deep learning models

The most common critique of deep learning models is that they are black boxes that are hard to interpret. But this is also true of all modern machine learning algorithms. Even decision trees and regression models built using SAS or scikit-learn in Python are difficult to interpret when there are tens or hundreds of features. If transparency is of utmost importance, then use a very simple model; otherwise, go all in.

Another common concern is that building deep learning models is an art that isn’t widely understood yet, and there is a shortage of talent capable of building effective models. While this is certainly true if you are competing for state-of-the-art performance on established image or language benchmarks, deep learning architectures can be quite simple for new implementations. Complexity can then be added over time without having to change frameworks.

In the end, deep learning offers an incredibly powerful method for tackling machine learning problems in financial services. While deep learning may not lead to the general artificial intelligence panacea that popular media imagines, it is still an incredibly powerful machine learning method. Given the exploding volumes of rich data being collected, it is clear that the most successful algorithms will be based in deep learning in the next ten years. While transparency and talent will remain concerns for machine learning in general, they shouldn’t prevent organizations from capitalizing on the advantages of deep learning now.

Jeremy Stanley is an independent consultant working with companies building data and machine learning capabilities in New York and San Francisco. He was most recently the VP of Data Science at Instacart, the $4bn same-day grocery delivery start-up based in San Francisco, where he led machine learning for logistics, quality, search, personalization and growth. He previously led machine learning and engineering in New York at Sailthru and Collective and consulted with financial services companies on machine learning at EY.