Search rank fraud in commercial peer-opinion services (e.g., hosted by Google, Amazon, Apple), where large numbers of fake reviews are posted for hosted products, is used to emulate realistic, spontaneous activities from unrelated people, thus increase product rank and financial gain, and even to promote malware and censorship. Workers who specialize in search rank fraud, exploit vulnerabilities in the peer-opinion system to artificially boost the reputation of products that they are hired to promote. In this project we proposed a vertical approach to (1) dissect and document fraudulent job markets, investigate strategies that can disrupt their equilibrium, (2) study and model behaviors that differentiate fraudsters from honest users in online services, then design search rank fraud detection algorithms that leverage these findings, and (3) develop techniques to help users assess identified fraud.

Study professional raters and site dynamics We have designed a questionnaire to study the capabilities, behaviors and detection avoidance strategies of fraud workers, and conducted qualitative and quantitative studies with fraud workers that we recruited from freelancing sites. We have further collected data from and monitored more than 247,000 Google Play apps, over more than 6 months, and used it to understand the dynamics of the market from an application and developer perspective [4, 8].

Search rank fraud detection in Google Play and Yelp We have developed FairPlay, a system to correlate review activities, and uniquely combine detected review relations with linguistic and behavioral signals gleaned from longitudinal Google Play app data, to detect fake reviews, and apps promoted through search rank fraud [5, 6]. FairPlay discovered hundreds of fraudulent apps that currently evade Google Bouncer's detection technology. FairPlay enabled us to discover a novel, ``coercive campaign'' attack type, where app users are harassed into writing a positive review for the app, and install and review other apps. We further developed a system that leverages the wealth of spatial, temporal and social information provided by Yelp, to detect venues that have been promoted through search rank fraud [2, 3].

Fraud de-anonymization. We introduced the fraud de-anonymization problem, to attribute user accounts flagged by fraud detection algorithms in online peer-opinion systems, to the human workers in crowdsourcing sites, who control them [9, 11]. We modeled fraud de-anonymization as a maximum likelihood estimation problem, and introduced an unconstrained optimization solution [11].

Fraud preemption through Bitcoin-like puzzles We have introduced the concept of fraud preemption systems, that, instead of reacting to fraud posted in the past, discourage workers from posting fraud in the first place [7]. Specifically, we introduced and developed stateless, verifiable, Bitcoin-inspired computational puzzles, to be computed on the devices from which users post activities in peer-opinion sites. These puzzles impose minimal performance overhead on devices associated with honest accounts, but impose increasing penalties on devices associated with accounts detected to be involved in search rank fraud campaigns.



Data Sets

  • Periodic snapshots of the data of 87,000 Google Play apps taken between October 2014 and May 2015 [zip archive [8 GB uncompressed]]. It includes details of the reviews received by each app, its permissions, and information collected from the reviewers of the apps. It also includes the reviews, reviewer info and permissions of the gold standard fraudulent, malware and benign apps.
  • [For easy access]. Reviews of the gold standard fraudulent, malware and benign apps [zip archive].
  • [For easy access]. Data of the 1,600 ``wild'' apps [zip archive].


