top of page

How does an AI learn to detect accounting fraud?

In this article, we explain how AI learns at a high level without using code or math.


In accounting fraud, by definition, a fraudster has to invent transactions or change legitimate transactions.

However, we don't need to know how he manipulates transactions.

To catch him, we only need to know what legitimate accounting transactions look like.








What does legitimate accounting data look like?

Fraud experts have used Benford’s Law for a long time.

According to Benford’s Law, low digits appear more frequently (1) than high digits (2) for legitimate transactions.


Image 1: A fraudster has to invent transactions or change legitimate transactions

Image 2: Legitimate accounting data looks almost universally the same

"Knowing what legitimate transactions look like, we trained an AI on 100 billion

accounting transactions to distinguish between legitimate and fraudulent ones."


AI is loosely based on how humans learn.

An AI learns similarly to a human brain by inputs (1), outputs (2), and an objective function that guides the learning process.

In our case, the objective function of the AI is to minimize the number of errors in predicting accounting fraud.

Neural networks, the “brain” of an AI, have gained prominence due to their ability to learn complex patterns from data.

To learn anything, AI requires enormous amounts of data.

While general AIs require more data, task-focused AIs require less data.

For example, ChatGPT-3 required 300 billion data points (words), while AuditedAI required 100 billion data points (accounting transactions).

The result? AI requires enormous amounts of data to learn complex patterns and detect patterns it has even never seen before!


Image 3: AI mimics the way that biological neurons signal to each other


Image 4: In general, more data leads to a higher accuracy for neural network based models such as Deep Learning

"Why not use Machine Learning? Generally, Machine Learning algorithms work well with tabular data requiring fewer data. However, accounting fraud is a different data format (i.e., long format) with billions of rows.
In this case, Deep Learning outperforms Machine Learning.” 





We start by feeding the AI our custom-made dataset of accounting transactions.

In the first few iterations (i.e., learning process), the AI does poorly at distinguishing between legitimate and fraudulent accounting transactions …

… however, the AI learns by doing many iterations and adjusting its prediction.

In other words, the AI learns the multidimensional patterns associated with accounting fraud.

After many iterations ….

… the accuracy of the AI improves.

Similarly to how humans acquire knowledge, the AI learns through an iterative process, improving its prediction power with each iteration.

As we will see next in the use cases, the AI learns to detect even very complicated fraud patterns.


Image 5: At the beginning of learning, the AI makes a lot of mistakes


Image 6: Through iterations, the AI learns to predict better


Image 7: After many passes, the AI learns to predict fraud

The result? AuditedAI can detect complex accounting fraud types automatically.


For examples, scams such as Wirecard, Ponzi-schemes such as Madoff, and advanced techniques where fraudsters attempt to fake legitimate accounting transactions (see: Use Cases). 





PS: Deep Learning is a breakout success


Detecting accounting fraud is complex as it is a high-dimensional and non-linear problem. However, Deep Learning is a breakout success.


For nearly all of human civilization, our approach to understanding complexity has been theory, careful experiments, and more theory. With the Deep Learning approach, we say forget about theory; it’s too complex.


Let the model learn the complexity without us having to impose or imagine what the complexity might be or look like. In other words, we’re replacing rigorous, explicit mathematical theories with empirical black-box approximations.

How an AI learns to detect accounting fraud through inputs and outputs (sample code):


Sample model with (1) inputs of accounting data, (2) outputs as a probability, and (3) an objective function which minimizes the number of errors in predicting accounting fraud (binary cross entropy).

bottom of page