ArXiv Pre-Print
Asmit Nayak
,
Shirley Zhang
Equal Contribution
,
Yash Wani
Equal Contribution
,
Rishabh Khandelwal
,
Kassem Fawaz
November 12, 2024
Deceptive patterns (DPs) in digital interfaces manipulate users into making unintended decisions, exploiting cognitive biases and psychological vulnerabilities. These patterns have become ubiquitous across various digital platforms. While efforts to mitigate DPs have emerged from legal and technical perspectives, a significant gap in usable solutions that empower users to identify and make informed decisions about DPs in real-time remains. In this work, we introduce AutoBot, an automated, deceptive pattern detector that analyzes websites’ visual appearances using machine learning techniques to identify and notify users of DPs in real-time. AutoBot employs a two-staged pipeline that processes website screenshots, identifying interactable elements and extracting textual features without relying on HTML structure. By leveraging a custom language model, AutoBot understands the context surrounding these elements to determine the presence of deceptive patterns. We implement AutoBot as a lightweight Chrome browser extension that performs all analyses locally, minimizing latency and preserving user privacy. Through extensive evaluation, we demonstrate AutoBot’s effectiveness in enhancing users’ ability to navigate digital environments safely while providing a valuable tool for regulators to assess and enforce compliance with DP regulations.
USENIX Security 2024
Rishabh Khandelwal
Equal Contribution
,
Asmit Nayak
Equal Contribution
,
Paul Chung
,
Kassem Fawaz
August 13, 2024
Google has mandated developers to use Data Safety Sections (DSS) to increase transparency in data collection and sharing practices. In this paper, we present a comprehensive analysis of Google’s Data Safety Section (DSS) using both quantitative and qualitative methods. We conduct the first large-scale measurement study of DSS using apps from Android Play store (n=1.1M). We find that there are internal inconsistencies within the reported practices. We also find trends of both over and under-reporting practices in the DSSs. Next, we conduct a longitudinal study of DSS to explore how the reported practices evolve over time, and find that the developers are still adjusting their practices. To contextualize these findings, we conduct a developer study, uncovering the process that app developers undergo when working with DSS. We highlight the challenges faced and strategies employed by developers for DSS submission, and the factors contributing to changes in the DSS. Our research contributes valuable insights into the complexities of implementing and maintaining privacy labels, underlining the need for better resources, tools, and guidelines to aid developers. This understanding is crucial as the accuracy and reliability of privacy labels directly impact their effectiveness.
The Web Conference 2024
Asmit Nayak
Equal Contribution
,
Rishabh Khandelwal
Equal Contribution
,
Earlence Fernandes
,
Kassem Fawaz
May 13, 2024
Browser extensions offer a variety of valuable features and functionalities. They also pose a significant security risk if not properly designed or reviewed. Prior works have shown that browser extensions can access and manipulate data fields, including sensitive data such as passwords, credit card numbers, and Social Security numbers. In this paper, we present an empirical study of the security risks posed by browser extensions. Specifically, we first build a proof-of-concept extension that can steal sensitive user information. We find that the extension passes the Chrome Webstore review process. We then perform a measurement study on the top 10K website login pages to check if the extension access to password fields via JS. We find that none of the password fields are actively protected, and can be accessed using JS. Moreover, we found that 1K websites store passwords in plaintext in their page source, including popular websites like Google.com and Cloudflare.com. We also analyzed over 160K Chrome Web Store extensions for malicious behavior, finding that 28K have permission to access sensitive fields and 190 store password fields in variables. To analyze the behavioral workflow of the potentially malicious extensions, we propose an LLM-driven framework, Extension Reviewer. Finally, we discuss two countermeasures to address these risks: a bolt-on JavaScript package for immediate adoption by website developers allowing them to protect sensitive input fields, and a browser-level solution that alerts users when an extension accesses sensitive input fields. Our research highlights the urgent need for improved security measures to protect sensitive user information online.
WPES@CCS 2023
Rishabh Khandelwal , Asmit Nayak , Paul Chung , Kassem Fawaz
October 26, 2023
The increasing concern for privacy protection in mobile apps has prompted the development of tools such as privacy labels to assist users in understanding the privacy practices of applications. Both Google and Apple have mandated developers to use privacy labels to increase transparency in data collection and sharing practices. These privacy labels provide detailed information about apps’ data practices, including the types of data collected and the purposes associated with each data type. This offers a unique opportunity to understand apps’ data practices at scale. In this study, we conduct a large-scale measurement study of privacy labels using apps from the Android Play Store (n=2.4M) and the Apple App Store (n=1.38M). We establish a common mapping between iOS and Android labels, enabling a direct comparison of disclosed practices and data types between the two platforms. By studying over 100K apps, we identify discrepancies and inconsistencies in self-reported privacy practices across platforms. Our findings reveal that at least 60% of all apps have different practices on the two platforms. Additionally, we explore factors contributing to these discrepancies and provide valuable insights for developers, users, and policymakers. Our analysis suggests that while privacy labels have the potential to provide useful information concisely, in their current state, it is not clear whether the information provided is accurate. Without robust consistency checks by the distribution platforms, privacy labels may not be as effective and can even create a false sense of security for users. Our study highlights the need for further research and improved mechanisms to ensure the accuracy and consistency of privacy labels.
ArXiv Pre-Print
Asmit Nayak
Equal Contribution
,
Rishabh Khandelwal
Equal Contribution
,
Kassem Fawaz
August 30, 2023
In this work, we perform a comprehensive analysis of the security of text input fields in web browsers. We find that browsers’ coarse-grained permission model violates two security design principles: least privilege and complete mediation. We further uncover two vulnerabilities in input fields, including the alarming discovery of passwords in plaintext within the HTML source code of the web page. To demonstrate the real-world impact of these vulnerabilities, we design a proof-of-concept extension, leveraging techniques from static and dynamic code injection attacks to bypass the web store review process. Our measurements and case studies reveal that these vulnerabilities are prevalent across various websites, with sensitive user information, such as passwords, exposed in the HTML source code of even high-traffic sites like Google and Cloudflare. We find that a significant percentage (12.5%) of extensions possess the necessary permissions to exploit these vulnerabilities and identify 190 extensions that directly access password fields. Finally, we propose two countermeasures to address these risks: a bolt-on JavaScript package for immediate adoption by website developers allowing them to protect sensitive input fields, and a browser-level solution that alerts users when an extension accesses sensitive input fields. Our research highlights the urgent need for improved security measures to protect sensitive user information online.
ArXiv Pre-Print
Asmit Nayak
Equal Contribution
,
Rishabh Khandelwal
Equal Contribution
,
Paul Chung
,
Kassem Fawaz
March 14, 2023
Privacy nutrition labels provide a way to understand an app’s key data practices without reading the long and hard-to-read privacy policies. Recently, the app distribution platforms for iOS(Apple) and Android(Google) have implemented mandates requiring app developers to fill privacy nutrition labels highlighting their privacy practices such as data collection, data sharing, and security practices. These privacy labels contain very fine-grained information about the apps’ data practices such as the data types and purposes associated with each data type. This provides us with a unique vantage point from which we can understand apps’ data practices at scale.
USENIX Security 2023
Rishabh Khandelwal , Asmit Nayak , Hamza Harkous , Kassem Fawaz
April 02, 2022
Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices use dark patterns to manipulate users into making website-friendly choices which put users’ privacy at risk. In this work, we develop CookieEnforcer, a new system for automatically discovering cookie notices and deciding on the options that result in disabling all non-essential cookies. In order to achieve this, we first build an automatic cookie notice detector that utilizes the rendering pattern of the HTML elements to identify the cookie notices. Next, CookieEnforcer analyzes the cookie notices and predicts the set of actions required to disable all unnecessary cookies. This is done by modeling the problem as a sequence-to-sequence task, where the input is a machine-readable cookie notice and the output is the set of clicks to make. We demonstrate the efficacy of CookieEnforcer via an end-to-end accuracy evaluation, showing that it can generate the required steps in 91% of the cases. Via a user study, we show that CookieEnforcer can significantly reduce the user effort. Finally, we use our system to perform several measurements on the top 5k websites from the Tranco list (as accessed from the US and the UK), drawing comparisons and observations at scale.
PrivateNLP@EMNLP 2020
Rishabh Khandelwal , Asmit Nayak , Yao Yao , Kassem Fawaz
November 19, 2020
Online services utilize privacy settings to provide users with control over their data. However, these privacy settings are often hard to locate, causing the user to rely on provider-chosen default values. In this work, we train privacy settings centric encoders and leverage them to create an interface that allows users to search for privacy settings using free-form queries. To achieve this, we create a custom Semantic Similarity dataset, which consists of real user queries covering various privacy settings. We then use this dataset to fine-tune the state of the art encoders. Using these fine-tuned encoders, we perform semantic matching between the user queries and the privacy settings to retrieve the most relevant setting. Finally, we also use these encoders to generate embeddings of privacy settings from the top 100 websites and perform unsupervised clustering to learn about the online privacy settings types.