4 Click Fraud Detection Techniques to Protect Your Campaigns
When it comes to tackling click fraud -- one of the most prevalent types of ad fraud -- there are several techniques that can be used depending on a number of aspects (including time to analyze the click, the point in the flow, and the available data).
While each technique has its advantages and disadvantages, there are a lot of factors to keep in mind when applying them as a click fraud detection technique, including:
- Computing cost -- is the technique fast or slow to compute?
- Collected telemetry -- is some telemetry only available via secure connections?
- Reluctance to error -- are false positives worse than negatives or vice versa?
- Context -- is where the click occurred relevant?
With these factors in mind, we’ve divided the most common click fraud detection techniques: heuristic rules, statistical analysis, behavioral analysis, and machine learning techniques, and explained exactly how they work.
1. Heuristic rules
Also known as rules-based detection, heuristic rules for ad fraud detection are predefined rules used to identify ad fraud patterns. These rules are used to identify traffic or user behavior that falls outside the standard range of established parameters defined by experts, or outside of the given context.
There are two categories of heuristic rule-based click fraud detection: Expert knowledge (rules created based on expert knowledge) and contextual (rules created based on the relevant context).
Heuristic rules based on expert knowledge usually have thresholds and targets and might look something like this:
“Block traffic coming from Apple-based browsers in Windows platform”
Contextual rules are used to block traffic that doesn’t fit within the ‘context’. For example, these rules might block traffic coming from locations outside of specified geographies, and might look like the following:
“Block traffic coming from Asia and Europe time zones in US-targeted campaigns”
Let’s take a deeper look at both categories.
Expert knowledge can be further divided into two categories: Device telemetry rules and network telemetry rules.
Device telemetry rules
These rules are created by obtaining telemetry about the user’s device and comparing it to “source of truth” data, e.g. checking if a user that is claiming to browse from an iOS device has some iOS-only video codec.
Network telemetry rules
These rules are created by comparing data obtained from the different network layers, from headers to TCP low-level data. Network telemetry rules require precise knowledge about how internet protocols work and are usually harder for fraudsters to circumvent.
Contextual rules are based on the specific context surrounding the traffic: User data and geolocation.
Geolocation rules are based on the origin of the traffic (commercial ISP or data center, blacklisted IPs) as well as any discrepancies between location and user telemetry (for example, users coming from the United States with a Chinese timezone and keyboard).
Advantages of using heuristic rules
- They’re reliable and usually hard to skip
Disadvantages of using heuristic rules
- They require constant reviews and updates due to several factors (browser and device changes, fraudsters continually improving bots, etc)
2. Statistical analysis
Anti-fraud solutions employ a great deal of statistical analysis both to investigate clusters of traffic and to automatically detect traffic with unusual patterns or distributions.
We categorize statistical-based rules into two kinds: Threshold rules and statistical anomalies.
This is when certain criteria -- usually defined by the customers -- meet the defined threshold. Metrics that are susceptible to fraud, such as paid clicks or conversions and leads, are used to set these thresholds.
For example, when a singular IP address repeatedly clicks on a paid ad, this might trigger the threshold to detect the presence of invalid traffic.
Establishing these rules requires more complex calculations and usually involves using several data frames in order to draw the line between expected values and invalid ones.
Examples of statistical anomalies include the distribution of traffic by several parameters, such as device model or OS and the time between clicks or leads.
Advantages of using statistical analysis
- It can help with identifying new fraud based on repetition or manual fraud methods
- Thresholds can be adapted to fit each business case
Disadvantages of using statistical analysis
- Outliers can trigger false positives
- Usually requires manual fine-tuning
3. Behavioral analysis
Although most device telemetry can be captured when the visit occurs, user-based events need to be recorded anonymously and analyzed asynchronously. This data provides crucial information regarding how a user is interacting with a page or product and can be used to decide whether that interaction is human-like or not.
Behavioral analysis focuses on assessing whether the user’s behavior is that of a real human.
This includes extracting, analyzing, and deciding if data such as scrolling, keystrokes, and clicks are executed in a human-like manner rather than an automated one. This is performed by checking that the patterns follow an expected distribution and examining the behavior to detect too many repetitions or outliers.
Advantages of using behavioral analysis
- Can help with sophisticated bots that can skip heuristic rules
Disadvantages of using behavioral analysis
- Outliers can trigger false positives
- Requires more time to perform
- Data and privacy concerns
4. Machine learning
Sophisticated ad fraud techniques can easily hide within large volumes of traffic, and machine learning techniques help identify relationships and patterns within these large datasets.
To classify new data and predict the likelihood of ad fraud, machine learning methodologies use historical data that has already been classified to detect patterns and build models. With little human intervention, these models begin to self-train themselves as they are fed more and more data. This is vital since most sophisticated fraud schemes today are undetectable by the human eye.
We categorize these techniques into three machine learning paradigms: Supervised learning, unsupervised learning, and semi-supervised learning.
The supervised learning technique involves labeling the data to train the models. As a click fraud detection technique, this can be broken down into further classifications.
Classification algorithms can separate invalid and valid clicks by building models and training them with labeled clicks (invalid traffic/valid traffic). The algorithm’s accuracy is directly related to the quality and quantity of labeled data used for training the models.
The unsupervised learning technique involves using models that are not supplied with labeled data. The models discover clusters and patterns by themselves without knowing which label applies. This technique can be broken down into clustering algorithms and outlier detection.
Clustering algorithms group data that has similar characteristics and features without labeling them. This means that an expert can review the data and decide which label should apply to each identified group.
Outlier detection algorithms find data points that differ from the rest. From here, an expert reviews the detected anomalies to decide how to approach them.
The semi-supervised learning technique involves using partially labeled data. Unsupervised learning is applied and guided with the labeled data.
Active learning methodologies involve first applying unsupervised methods and then applying supervised methods to a small subset of the results (i.e. a representative of each identified cluster). The selected data is given to an expert to classify.
Advantages of using machine learning
- Discovery factor (can help identify new fraud clusters)
- Can help in challenging false positives and newly discovered fraud
- Updates and training can be automated
Disadvantages of using machine learning
- Computing costs
- Slower than other methods
- Usually not reliable with previously unseen data
Click fraud detection techniques: Which one best protects your campaigns from ad fraud?
Ad fraud is expected to cost the digital advertising industry $120 billion yearly. Click fraud comprises a large portion of ad fraud. Sophisticated ad fraud techniques drain your budgets and are evolving every day, which means it’s more important than ever to apply the most robust click fraud detection techniques.
A proactive solution that combines rules-based programs and machine learning is the only way to truly address and stem the tide of ad fraud. To protect your organization, anti-ad fraud solution Opticks applies its novel machine-learning algorithms and expert rules to separate valid traffic from invalid traffic.
To find out how Opticks combines the accuracy of human expertise with the scalability of machine learning to prevent the damage caused by click fraud, contact us today and schedule a free demo.