Fraud detection machine learning

Recent insights from a survey of 200 business executives indicate a growing concern about fraud detection challenges during the customer onboarding process within financial institutions. Notably, synthetic fraud and deepfakes are rising to the top of the agenda, with more than one in three executives ranking them as primary concerns. More than 1 in 3 execs rank synthetic fraud (37%) and deepfakes (35%) as top concerns. This concern is well-founded, considering the FTC's report of an estimated $8.8 billion lost to fraud in 2022.

The financial services industry is at the vanguard of combating fraud, yet the survey highlights that fraud detection is not the sole challenge. A significant 84% of executives admit to lacking complete confidence in their company's ability to adapt to evolving data privacy regulations. Furthermore, nearly half of these leaders believe that the onboarding process for customers is overly complex and lengthy.

Financial institutions must strike a delicate balance between effective fraud detection machine learning and:

  • Minimizing onboarding friction: Onboarding more users smoothly and efficiently.
  • Compliance with regulations: Ensuring the onboarding process adheres to legal standards.
  • Cost management: Avoiding excessive expenditure without compromising on fraud detection.

None of these business priorities can be allowed to overshadow the others. Achieving an optimal balance is a complex challenge for the industry, and this is where the implementation of fraud detection machine learning becomes critical. Machine learning, a branch of artificial intelligence (AI), provides a scalable solution for processes like Know Your Customer (KYC) and fraud detection. Despite the challenges in training fraud detection machine learning algorithms, the technology offers substantial benefits.

What is fraud detection using machine learning?

Machine learning, a subset of AI, allows systems to learn from data without explicit programming. Although AI has been present since the 1960s, the field of machine learning has surged with the introduction of deep learning in recent years.

The power of machine learning comes from the fact that statistical models can infer patterns from the data with little guidance from humans. Often they end up performing better than humans on many tasks such as image recognition (machines interpret and categorize images or videos), translation (machines translate text or speech from one language to another, eg. Google Translate), and gameplay (machines designed to play chess and other games, eg. AlphaGo). 

In essence, fraud detection using machine learning involves utilizing these intelligent systems to identify and prevent fraudulent activities.

How to use AI and machine learning in fraud detection

Fraud detection is a critical component of the identity verification process during customer onboarding, crucial for meeting compliance requirements such as KYC and AML, and safeguarding against financial crime. Preventing fraud also helps protect their customers, their business reputation and bottom line.

Using machine learning in fraud detection within banking and other financial institutions offers numerous benefits: it's faster, more cost-effective, and often more precise than human analysis. However, training machine learning solutions for optimal results necessitates a thoughtful and rigorous approach. Fraud detection is challenging — for machines as well as humans — and those challenges should be addressed when training algorithms.

The challenges of training machine learning algorithms

Fraud prevention is at odds with low friction

If stopping fraud was our only concern, businesses could do so tomorrow by blocking 100% of customers. But any business that does will go bust pretty quickly. Businesses need strong fraud prevention capabilities to protect their customers and bottom line. But they also need to offer low-friction, seamless and intuitive sign-up processes for genuine customers. Making the sign-up process as easy and frictionless as possible is key for businesses' long-term growth. 

Fraud is dynamic and ever-changing

Fraudsters are innovative and invent new methods of attack all the time. They are constantly looking for ways to get past business defenses. Remove one route, and they’ll try and try again until they find another. Fraud detection machine learning solutions must be innovative and adaptable to get ahead of the latest attack vectors. 

There are thousands of identity documents

Each country has several different ID types. And each of these IDs will have several versions in circulation at one time. Extrapolate this across the globe and that’s thousands of different documents, each with a different format, standard and varying character sets. Training machine learning models to detect fraud across all these documents and their different variations is a challenge to say the least. 

‘Noisy’ data can impact the algorithms

There is no guaranteed ground truth in fraud detection. Comparison with genuine specimen documents is one of the best ways to detect forgeries, and a lot of fraud is obvious and easy to catch. But for highly sophisticated forgeries, even the best experts might struggle to tell the difference. This is what’s called a subtle signal — where sometimes the difference between a fraud and a genuine is smaller than the difference between two genuine samples. The performance of fraud detection machine learning models is reliant on the quality of data used to train the algorithms.

Not enough data

Following the previous point, it can be hard to get enough good-quality data to train machine learning fraud detection algorithms. Sometimes we might only see a few documents with a certain type of fraud. But to accurately train machine learning models, you need large data sets. 

What machine learning models are used for fraud detection?

Every fraud detection machine learning algorithm is slightly different. But generally, training machine learning algorithms for fraud detection involves the following steps:

  1. Data sourcing: The first step is to source the data that will train the model. Often, this raw data is noisy and unlabelled.
  2. QC / labeling: Human experts label and curate the data to weed out low-quality data.
  3. Model training: Train the model using large-scale computing platforms.
  4. Model evaluation: Evaluate the performance of the model against an unknown dataset (the holdout set) to see if it performs as expected.
  5. Model deployment: If the model evaluation is conclusive, the final step is to deploy the model to production.
  6. Monitoring: Monitoring the model's performance in production and adjust and improve it as needed.
Training fraud detection machine learning models

What to consider in fraud detection machine learning solutions

There are a lot of technical processes involved in building and training fraud detection machine learning models. It takes time, resources and a lot of expertise. So many financial institutions will use specialist, third-party solutions to help their fraud detection efforts.

So what should businesses look for in a fraud detection solution that leverages AI and machine learning? What makes one fraud detection machine learning model better than another?

Data is key

As mentioned earlier, the best results come from training machine learning models on large volumes of high-quality data. This gives businesses a higher degree of certainty that

a) They’re catching more fraud, and
b) They’re catching fraud more accurately.

At Onfido, we train our fraud detection machine learning models on a wide range of genuine and fraudulent datasets. By exposing the model to genuine and fraudulent samples it gets better over time at detecting the difference between the two.

We even develop our own fraud samples in-house to ensure we have enough high-quality data to train the models. We also have an industrial-grade data cycle and machine learning operations, which helps us maintain state-of-the-art models in production for the benefit of our customers.

Look for expertise

Machine learning models aren’t built overnight. The best fraud detection solutions draw on years of research, training and expertise.

Onfido’s unique applied scientist and analytics team combines decades of experience in industrial research. They’ve built our fraud detection technology over the past 10 years, leveraging existing off-the-shelf models plus building specialized, in-house models. We have developed unique expertise leveraging unsupervised, supervised and self-supervised machine learning. Combining classical computer vision with deep learning, we've designed several patented algorithms specifically for fraud detection.

Solid infrastructure

Identity verification is not a nice-to-have, but a critical piece of infrastructure of the internet. Our customers and end users expect outstanding robustness and reliability from Onfido. This is why we've built a robust cloud-based computing platform that can handle and monitor our traffic in real-time.

In addition, the data we handle is very sensitive since it contains private information of end users. We take strong measures to protect this data. We enforce the data deletion policy agreed upon with our customers using programmatic data deletion across our entire platform. Finally, a dedicated in-house security team ensures that we hold state-of-the-art security standards within the company.

Relationships with regulatory bodies

AI governance, how we handle personal data, and the future of fraud detection are all key considerations when building a successful long-term product. At Onfido, we have developed strong ties with regulatory bodies such as the ICO in the UK.

Anti-bias considerations

Fraud detection solutions need to work equally well, for everybody. It’s crucial to build fair models and prevent bias from creeping in, especially when biometrics are involved. 

Define, measure and mitigate bias

Learn more about what we’ve been doing at Onfido to define, measure, and mitigate biometric bias in our whitepaper Building AI without Bias.

Download whitepaper

The benefits of fraud detection using machine learning

Efficient and scalable

Businesses that opt for fraud detection using machine learning will be able to process information much more quickly and in much larger volumes. This reduces the number of verification checks that go for manual review. In turn, internal fraud teams will spend less time manually reviewing documents.

Manual reviews also aren’t scalable. For one thing, they are limited to business hours. For another, as businesses grow or see a sudden increase in applications, manual processes can cause bottlenecks. This pushes up costs as businesses have to hire more reviewers to cope with demand. Ultimately, these slowdowns are going to turn genuine end-users away. 

Repeatable and deterministic

It’s much easier (and quicker, and cheaper) to improve a machine-learning model than to train a human analyst. For example, border guards have years of training to help them identify fraudulent documents. Even then, humans are more likely to make errors, for example, if they’re tired. Comparatively, we can deploy a new machine-learning model across the globe in minutes. 

More accurate

We can train machine learning algorithms to detect minute changes in documents that could point to potential signs of fraud. For highly sophisticated forgeries, even an expertly trained human eye might struggle to detect fraud. Fraud detection machine learning algorithms can pinpoint small changes across document layout, fonts, data consistency and much, much more.

Cost effective

As a business grows, they will naturally onboard more customers, and have to deal with more fraud. Relying on manual reviews means hiring more analysts to keep up with demand. Comparatively, you only need one machine-learning system to go through all the data you throw at it, regardless of the volume. This is much more scalable for businesses that see seasonal ebbs and flows in sign-ups. Fraud detection machine learning systems can help combat fraud as onboarding volumes, without dramatically increasing risk management costs.

The future of fraud detection machine learning

For businesses prioritizing staying ahead of sophisticated fraud tactics, machine learning is an indispensable resource. It offers a scalable, robust, and cost-effective approach to fraud detection, managing the delicate balance between rapid customer onboarding and effective fraud protection. While machine learning requires advanced skills and infrastructure, its proper implementation will continue to bring unparalleled benefits to the industry.

Build trust with enhanced fraud prevention

Learn how the Onfido Real Identity Platform can help you strike the perfect balance between fraud prevention and customer acquisition.

Discover fraud prevention