Search Rank Fraud Detection and Prevention


Search rank fraud in commercial peer-opinion services (e.g., hosted by Google, Amazon, Apple), where large numbers of fake reviews are posted for hosted products, is used to emulate realistic, spontaneous activities from unrelated people, thus increase product rank and financial gain, and even to promote malware and censorship. Workers who specialize in search rank fraud, exploit vulnerabilities in the peer-opinion system to artificially boost the reputation of products that they are hired to promote. In this project we proposed a vertical approach to (1) dissect and document fraudulent job markets, investigate strategies that can disrupt their equilibrium, (2) study and model behaviors that differentiate fraudsters from honest users in online services, then design search rank fraud detection algorithms that leverage these findings, and (3) develop techniques to help users assess identified fraud.

Study professional raters and site dynamics We have designed a questionnaire to study the capabilities, behaviors and detection avoidance strategies of fraud workers, and conducted qualitative and quantitative studies with fraud workers that we recruited from freelancing sites. We have further collected data from and monitored more than 247,000 Google Play apps, over more than 6 months, and used it to understand the dynamics of the market from an application and developer perspective [4, 8].

Search rank fraud detection in Google Play and Yelp We have developed FairPlay, a system to correlate review activities, and uniquely combine detected review relations with linguistic and behavioral signals gleaned from longitudinal Google Play app data, to detect fake reviews, and apps promoted through search rank fraud [5, 6]. FairPlay discovered hundreds of fraudulent apps that currently evade Google Bouncer's detection technology. FairPlay enabled us to discover a novel, ``coercive campaign'' attack type, where app users are harassed into writing a positive review for the app, and install and review other apps. We further developed a system that leverages the wealth of spatial, temporal and social information provided by Yelp, to detect venues that have been promoted through search rank fraud [2, 3].

Fraud de-anonymization. We introduced the fraud de-anonymization problem, to attribute user accounts flagged by fraud detection algorithms in online peer-opinion systems, to the human workers in crowdsourcing sites, who control them [9, 11]. We modeled fraud de-anonymization as a maximum likelihood estimation problem, and introduced an unconstrained optimization solution [11].

Fraud preemption through Bitcoin-like puzzles We have introduced the concept of fraud preemption systems, that, instead of reacting to fraud posted in the past, discourage workers from posting fraud in the first place [7]. Specifically, we introduced and developed stateless, verifiable, Bitcoin-inspired computational puzzles, to be computed on the devices from which users post activities in peer-opinion sites. These puzzles impose minimal performance overhead on devices associated with honest accounts, but impose increasing penalties on devices associated with accounts detected to be involved in search rank fraud campaigns.

People

Publications


[11] [ACM CCS] "Fraud De-Anonymization for Fun and Profit"
Nestor Hernandez, Mizanur Rahman, Ruben Recabarren, Bogdan Carbunar.
In Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS), October 2018. [pdf][acceptance rate = 16.6%]

[10] [AAAI ICWSM] "AbuSniff: Automatic Detection and Defenses Against Abusive Facebook Friends"
Sajedul Talukder and Bogdan Carbunar.
In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), Stanford, June 2018. [pdf][full paper acceptance rate = 16%]

[9] [ACM Hypertext] "Search Rank Fraud De-Anonymization in Online Systems"
Mizanur Rahman, Nestor Hernandez, Bogdan Carbunar, Duen Horng Chau.
In Proceedings of the 29th ACM Conference on Hypertext and Social Media (HT), Baltimore, July 2018. [pdf]

[8] [IEEE TCSS] "A Longitudinal Study of Google Play"
Rahul Potharaju, Mizanur Rahman, Bogdan Carbunar.
IEEE Transactions on Computational Social Systems (TCSS), Volume 4, Issue 3, September 2017. [pdf]

[7] [ACM WebSci] "Stateless Puzzles for Real Time Online Fraud Preemption"
Mizanur Rahman, Ruben Recabarren, Bogdan Carbunar, Dongwon Lee.
Proceedings of the ACM Web Science Conference (WebSci), Troy NY, June 2017. [pdf]

[6] [IEEE TKDE] "Search Rank Fraud and Malware Detection in Google Play"
Mahmudur Rahman, Mizanur Rahman, Bogdan Carbunar, Duen Horng Chau.
IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 29, Issue 6, June 2017. [pdf]

[5] [SIAM SDM] "Fraud and Malware Detection in Google Play"
Mahmudur Rahman, Mizanur Rahman, Bogdan Carbunar, Duen Horng Chau.
In Proceedings of the SIAM International Conference on Data Mining (SDM), May 2016. [pdf]

[4] [IEEE/ACM ASONAM] "A Longitudinal Study of the Google App Market"
Bogdan Carbunar, Rahul Potharaju.
In Proceedings of the IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM) [full paper acceptance rate=18%], Paris, August, 2015. [pdf]

[3] [Wiley SAM] "To Catch a Fake: Curbing Deceptive Yelp Ratings and Venues"
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, Duen Horng (Polo) Chau.
Statistical Analysis and Data Mining, Wiley, Pang-Ning Tan and Arindam Banerjee, editors (invited), 2015. [Preliminary version]

[2] [SIAM SDM]  [Best Student Paper Award !]
Turning the Tide: Curbing Deceptive Yelp Behaviors
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, Duen Horng (Polo) Chau.
In Proceedings of the SIAM International Conference on Data Mining (SDM), Philadelphia, April 2014. [pdf]

[1] [HotPOST@ICDCS]   [Best Paper Award !]
"Yelp Events: Building Bricks Without Clay?"
Jaime Ballesteros, Bogdan Carbunar, Mahmudur Rahman, Naphtali Rishe.
In Proceedings of the 5th International Workshop on Hot Topics in Peer-to-peer Computing and Online Social Networks (HotPOST), July 2013. [pdf]

Data Sets


  • Periodic snapshots of the data of 87,000 Google Play apps taken between October 2014 and May 2015 [zip archive [8 GB uncompressed]]. It includes details of the reviews received by each app, its permissions, and information collected from the reviewers of the apps. It also includes the reviews, reviewer info and permissions of the gold standard fraudulent, malware and benign apps.
  • [For easy access]. Reviews of the gold standard fraudulent, malware and benign apps [zip archive].
  • [For easy access]. Data of the 1,600 ``wild'' apps [zip archive].

    Funding

    This work has been partially funded through generous support from: