The recent, massive acceleration in ecommerce has created a corresponding increase in online payments fraud. Worldwide, fraud costs businesses more than an estimated $20 billion annually. Plus, for every dollar lost to fraud, the total cost to businesses is actually much higher due to increased operational costs, network fees, and customer churn.
Not only is fraud expensive, but sophisticated fraudsters are constantly finding new ways to exploit weaknesses, making fraud challenging to combat. That’s why we built Stripe Radar, a machine learning–based fraud prevention solution, fully integrated within the Stripe platform. Radar’s machine learning leverages the data from hundreds of billions of dollars in payments processed across the Stripe network each year to accurately detect fraud and quickly adapt to the latest trends, enabling you to grow without increasing fraud.
This guide introduces Stripe Radar and how we leverage the Stripe network to detect fraud, provides an overview of the machine learning techniques we use, explains how we think about the efficacy and performance of fraud detection systems, and describes how other tools in the Radar suite can help businesses optimize their fraud performance.
Introduction to online credit card fraud
A payment is considered fraudulent when the cardholder does not authorize the charge. For example, if a fraudster makes a purchase using a stolen card number that hasn’t been reported, it’s possible the payment would be processed successfully. Then, when the cardholder discovers the fraudulent use of the card, he or she would question the payment with his or her bank by filing a dispute (also known as a “chargeback”).
Businesses can challenge a chargeback by submitting evidence that shows the payment was valid. However, for card-not-present transactions, if the payment is deemed by networks to have been truly fraudulent, the cardholder will win and the business will be liable for the loss of goods and other fees.
Historically, businesses have used brute-force rules to predict and block suspected fraudulent charges. However, hard-coded rules—for example, blocking all credit cards used abroad—may result in blocking many good transactions. Machine learning, on the other hand, can detect more nuanced patterns to help you maximize revenue. In machine learning parlance, a false negative is when the system misses something it is designed to detect—in this case, a fraudulent transaction. A false positive is when the system flags something it shouldn’t have—for example, blocking a legitimate customer. Before we get into the details of machine learning, it’s important to understand the tradeoffs involved.
With false negatives, businesses are often responsible for the original transaction amount plus chargeback fees (the cost associated with the bank reversing the card payment), higher network fees as a result of the dispute, and higher operational costs from reviewing charges or fighting disputes. Plus, if you incur too many disputes, you could end up in a network chargeback monitoring program, which can lead to higher costs or, in some cases, the inability to accept card payments.
False positives, or false declines, are when a legitimate customer tries to make a purchase but is prevented from doing so. False declines can cause the business to take both a gross profit and reputational hit. In fact, in a recent survey, 33% of consumers said they wouldn’t shop again at a business after a false decline.
There is a tradeoff between preventing more disputes (false negatives) and reducing blocking legitimate customers (false positives)—the fewer you have of the former, the more you need to tolerate of the latter (and vice versa). When you prevent more fraud, you’ll increase the number of good customers blocked. On the other hand, reducing the number of false positives often increases the likelihood of more true fraud slipping through the cracks. Businesses need to decide how to balance the two based on their margins, growth profile, and other factors.
If a business’s margins are small (for example, if you sell food online), the cost of a fraudulent transaction might need to be offset with hundreds of good transactions—making each false negative very expensive. Businesses with this profile may lean toward casting a wide net when attempting to stop potential fraud. On the other hand, if a business’s margins are high, say for a SaaS business, the reverse is true. The lost revenue from one legitimate blocked customer may outweigh the cost of increased fraud.
Illustrative embedding coordinates
Improving performance with rules and manual reviews
With Radar for Fraud Teams, you can fine tune your protection by directly adjusting your risk threshold to block or allow more payments. Alongside the more automatic machine learning algorithms, Radar for Fraud Teams also lets individual businesses compose customized rules (for example, “block all transactions above $1,000 when the IP country does not match the card’s country”), request interventions, and manually review flagged payments in the Dashboard.
Such rules can be seen as simple “models” (they can be represented as decision trees, after all!), and they should be evaluated—with a full consideration of the tradeoff between precision and recall—in the same way as models. When you create a rule with Radar, we’ll present historical statistics on the number of matching transactions that were actually disputed, refunded, or accepted to help aid with these calculations before the rule is even implemented. Once live, you can see the impact on false positive and dispute rates by rule.
Just as important, rules, interventions, and manual reviews allow users to change the shape of the precision-recall curve in their favor by adding in proprietary, business-specific logic (rules) or by expending some additional effort (manual review).
If you realize that the machine learning algorithms are frequently missing a certain type of fraud particular to your business (and that fraud is easily identifiable to you), you can compose a rule to block it. That specific intervention will typically increase recall with little cost to precision, in effect moving the operating point along a less steep, more favorable precision-recall curve.
By sending some classes of transactions to manual review instead of blocking them outright, you can gain precision without a hit to recall. Similarly, by sending some transactions to manual review instead of allowing them outright, you can gain recall without a hit to precision.
Of course, in these cases, you are paying for these gains with additional human work (and exposing yourself to the accuracy of your team’s assessments), but having manual review, rules, and interventions to authenticate high-risk customers as additional tools gives you another lever to optimize fraud outcomes.
We hope this guide helps you understand how machine learning is applied to fraud prevention at Stripe and how to gauge the efficacy of your fraud systems. You can learn more about Radar’s features or explore our docs.
If you have any questions or would like to learn more about Stripe Radar, please reach out.