The Importance of Humans in Machine Learning- Based Fraud Systems
By Nitesh Kumar, Head of Data Science, Affirm, Inc.
Machines can only predict future behavior that’s representative of the past
The effectiveness of a machine learning system depends on how well the model generalizes, or performs on previously unseen instances or inputs. This generalization depends on how well the training sample represents the unseen instances the model acts on. Unfortunately, the fraud game is inherently adversarial, so the problem isn’t stationary. As the system gets better at stopping old fraud strategies by learning historical examples, fraudsters develop novel attack vectors to beat the system. This severely limits the generalizability and shelf-life of a fraud detection model, so people are required to constantly monitor its actions and performance.
Humans are aware of context and capable of logical reasoning
Machines cannot incorporate new information well; they look at only what they are trained to look at. For example, if there is a security breach in a large phone service or email provider, a human can take that knowledge into account while reviewing cases. However, the machine cannot spontaneously update itself to react appropriately to the new information.
No business wants to collect fraud outcomes by letting their systems be vulnerable to such attacks, which is where humans must step in
Humans are also capable of trying out complex approaches as they review cases. It is common for humans to contact the applicant to confirm if fraud occurred. During the course of the conversation, the human expert might choose to do a variety of things. The human could (a) ask the applicant to answer some questions associated with their past, (b) ask for their social security number, (c) ask the user to complete a set of tasks sent through email in order to confirm the applicant’s identity. Machines, on the other hand, are not capable of carrying out such detailed, wide-ranging approaches that require taking action based on responses or feedback that cannot be predicted in advance.
An important consideration in fraud detection systems is how different cases link with one another. Most fraud detection systems create an underlying graph structure that connects cases through attributes they share. The hypothesis is that a fraudster attempts to get through the fraud systems making repeated attacks using different identities. However, in all such attacks there is some constant attribute, such as the device through which the attack was carried out, or the requested shipping address. The graph structure helps connect these otherwise disparate applications and identify them as an attack. Since this graph tracks a subset of all possible attributes at any point of time, it can miss connections simply because the attribute isn’t represented on the graph. A human does not have this constraint, and can improvise attributes not available to the pre-designed graph. In other words, humans are free to reason about anything, not just a predetermined set of attributes.
Humans are better at annotating outliers and inliers
Machines can easily detect whether a case is an outlier (i.e. different from the instances that the model was trained on). However, it cannot easily predict the outcome label as the machine or model was never trained on such an instance. Such outliers require human expertise and intuition, as mere extrapolation is seldom the right approach, and that is generally what a machine is limited to.
Similarly, when a certain attribute or instance is an inlier (i.e. is observed more than represented in the training sample); it could suggest an outcome that’s different than what the model was trained on. However, the model wouldn’t be able to confirm this as it requires additional context that a human needs to take into account. The human expert could help determine whether the inlier is a result of a data issue where all IPs were incorrectly recorded to be the same or a fraudster was repeatedly trying to attack the system from the same IP.
Machines can help address scale and manage human intervention through active learning
As a business scales, it attracts more frequent and novel attacks from fraudsters. It is hard for a rapidly growing internet scale business to hire and train the human resources required to keep up with the scale. Machines, however, can rank order and present only the most dubious transactions that require manual review based on the resources available. While there are different strategies to sort cases for manual review, most or all boil down to two goals (1) refer complex cases where the model is uncertain about its decision, and (2) get outcome labels that can help improve the next generation of the model the most.
AI Is All About... Humans
Dany DeGrave, Founder, Unconventional Innovation
Intelligent Process Automation: The New Imperative for Business Services
Robert H. Brown, AVP, Center for the future of Work, Cognizant [NASDAQ:CTSH]
The New Era of Unmanned Aircrafts: Drones
Chris Proudlove, SVP, Global Aerospace
Fostering Virtual Collaboration for Higher Education Institutions in Today's Digital Learning Environment
Dennis Bonilla, Executive Dean, College of Information Systems and Technology, School of Business and College of Criminal Justice at University of Phoenix