Zero One: Shopify: On the Front Lines of Fraud
For businesses of all sizes, few things strike more fear, shake the customer experience, and wreak havoc on the balance sheet than fraud. Something has to be done. E-commerce vendors need a way, perhaps machine learning, to stay a step ahead of the miscreants.
Should fraud detection tap into the power of machine learning?
Not yet, says David Lennie, senior vice president of data and analytics at Shopify and former director of data science and engineering at Netflix. Humans need to learn before the machines do, he says.
Over the last couple of years, Lennie has been building and refining a fraud detection system for Shopify’s more than 500,000 merchant stores across 175 countries, mostly small and medium-sized businesses. The tech vendor’s cloud-based commerce platform also supports major brands, such as Red Bull, Nestle and GE.
Shopify’s massive sales-transaction volume – Shopify processed more than 7,000 orders per minute at peak times during Black Friday and Cyber Monday – makes it an ideal place to develop a data-driven fraud detection system.
“I don’t think anyone else in the world has so much information about entrepreneurs and small businesses,” Lennie says.
Zero One sat down with Lennie to learn more about this important aspect of business and how data and technology are helping companies detect fraud.
How did you get into fraud detection?
Lennie: Any business person faces fraud as an everyday occurrence. If we solve it, then we’re solving that for everyone on the Shopify platform. We also know fraud modeling needs a lot of transactions, a lot of orders. We could add a lot of value using data and machine learning to build a continuously evolving algorithm for all of our merchants.
It seemed like a really obvious starting point.
The work started at the beginning of 2016, not long after I joined the company. We had been leveraging a third-party fraud solution to do a lot of the scoring work for us and were looking at a contract renewal.
We decided to see if we could at least meet the same targets, the same benchmarks as the industry leading tool. I had about three people just sort of tinkering with it, working on it on the side. Within the first quarter we found that we were able to get the same kind of performance for fraud detection as the solutions we had been using.
We kept at it in the second quarter and found that we were actually starting to be more effective than those models. Not across the board, but in certain categories. By the third quarter, we got consistently better than the other solution across the board.
How were you able to beat the third-party solution?
Lennie: You can get general fraud warnings from some outsourced solutions, but we were able to bring so much more context to the model. We also have all of the behavioral and usage data of the merchant. If you add in the context about your transactions, about your merchants, you’re able to move the needle and get better results.
For instance, an external engine might just be looking at the order and then scoring by type of product, source of the order, payment method. Those are great things everyone uses to drive their fraud models. But we have more of the interaction data, more of the traffic pattern, more of the other kind of enrichment data that we can layer in to add context and precision to the model.
A great example of contextual knowledge is the tourism industry, especially around selling tickets for tourist attractions. In a rules-based system, if the shipping address does not equal the billing address, it’s fraud. However, having the context that the shop industry is tourism, our algorithms give us the ability to say this feature is not as important for this shop. Now our model automatically picks up on these.
Where are you outperforming the third-party solution?
Lennie: We moved the needle on two key measures: false positives, which are orders flagged as risky but turn out to be legit sales, and true positives, which is real fraud.
We made the most movement on reducing the false positives, because the contextual knowledge gave us a lot more confidence in what was happening and removed some of the perceived risk that was in the original model. We took false positives down initially by more than 85 percent.
We also improved our detection of true positives, the real fraud, by more than 10 percent. That’s always going to be harder to move because the real innovation in the fraud space comes from true positives. You’re always working to figure out ways to detect the latest and greatest fraud technique coming from very smart people on the other side.
Is this where machine learning comes in?
Lennie: There are multiple layers to it.
We have a deep history of chargebacks, so we’re finding the transactions that eventually turned into chargebacks and then start to cluster and pull out the predictive attributes. We’re able to see the most predictive attribute to fraud.
By layering in traffic pattern, demand, or search information, we can see a spike in orders that isn’t really correlated with other patterns from that shop or merchant. It looks fabricated or maybe packaged in a way that you had never seen. Some of the usage and transaction information on the demand side could then be used to increase the likelihood that the score would detect this as fraud.
This is a programmed system right now. We’re revising the model ourselves on a regular basis. We continue to monitor the effectiveness to see if we’re in line with the chargeback rates. But it is not a learning model at this point.
Why not a true machine learning system? Are you concerned about the “black box” in fraud detection?
Lennie: Yeah, that’s one of the risks. Honestly, we want to spend a lot more time training ourselves before we really jump into the pure ML (machine learning) model. It’s important we learn about the characteristic behaviors of this space, how the data works, before we set it loose.
Machine learning and AI are tools and techniques that, when necessary or when they add value to solving your problem, you should absolutely apply them. But you should never have as your goal to make an ML-powered model.
Tom Kaneshige writes the Zero One blog covering digital transformation, AI, marketing tech and the Internet of Things for line-of-business executives. He is based in Silicon Valley. You can reach him at firstname.lastname@example.org.