Fraudulent Transaction Detection Using Markov Model From Scratch

Sumeet Lalla
Artificial Intelligence in Plain English
4 min readNov 6, 2019

--

Finite State Machine storing customer’s transaction patterns

Intuition

Based on the customer’s purchasing pattern, the transaction can be categorized under Low, Medium and High. The threshold value for figuring out the same can be done by analyzing the past patterns of the transaction being done.

Approach

The customer’s buying patterns were recorded from a web application with which the model was integrated. If the customer has purchased a low priced item, then the current state would be the low state and now if the customer purchases a medium priced item, a transition would be recorded from the low state to medium state and the current state would be medium state. Based on the number of transitions to and from a given state, state probabilities and transition probabilities are calculated. For example, if the current state is y after transitions, then the transition probability can be calculated as:

Here y represents the current state after transition and x represents the previous state which has transitions to y. The subscript notation,i defined the number of states in the system. The numerator is the number of transitions from a given state of x to y while the denominator is the number of transitions from state x to y. The transition probability can be viewed as Bayes conditional probability i.e.

where y and x are the values defined prior. The state probability can be defined as:

The state probability for the current state can be defined as:

The state probability of the previous state and the transition probability from the previous to current state are independent of each other hence the product of the probabilities of the two independent events will give the probability of the current state.

Here state probability is independent of the transitions and time series events.

The above state probability calculation corresponds to Bayes Total probability theorem. Three simultaneous linear equations can be obtained based on the same. Here the unknown on which the simultaneous equations is state probability. The approach used in solving the simultaneous equations was Gaussian Elimination as the corresponding coefficient matrix is square.

Based on the state probability calculation from the above, individual probabilities of low,medium and high states can be calculated. The probabilities can be compared and which ever quantity is maximum will tell the transaction pattern of the customer is likely to fall in that state. The above process can be repeated for a number of transaction patterns of the customer and the mean of state probabilities can be calculated. This can be corresponded to the training of the model for a customer’s transaction pattern.

In the next section, the transaction patterns will be segmented into fraudulent and not fraudulent.

Segmenting Transactions as Fraudulent/Not Fraudulent

Based on the state and transition probabilities calculated from the above,now if the customer buys items in a certain transaction pattern, then the product of the probabilities would be calculated i.e. the state and transition probabilities and the final state is recorded when the transaction pattern ends. Here the final state on which our focus would be is taken as High. The corresponding products associated with it will be plotted. Here the target labels Fraudulent/Not Fraudulent is unknown so unsupervised machine learning technique,k-means Clustering can be used where the number of clusters as 2 can be specified. Based on the point distribution in clustering, the fraudulent patterns can be observed as outliers in cluster formation. Another approach that can be used is to define a threshold probability for the transaction pattern when ending in High state and if the product of the probabilities calculated from above is less than it, then the transaction pattern can be segmented as fraudulent.

In the next section, the model integration with a web application would be defined.

Model Framework and Integration with Web API

The Model was developed using PHP Framework and followed Test Driven Development i.e. the model class was unit testable and interactions with the user interface and backend was handled using JQuery and AJAX. A relational SQL database, MariaDB was used to store the transaction pattern for each user as blob object type.

Further Work

  1. Making the object retrieval faster using MemCache and Redis for relational database.
  2. Usage of No SQL databases for faster CRUD operations.
  3. Making the Model Framework platform independent.
  4. Using the generated transaction pattern of a customer and comparing it with other customer for predictive analytics and market segmentation for the seller.

For the code files,refer the Github link.

--

--