AI assistants, like Siri and Alexa, handle encrypted communications, but researchers have discovered a vulnerability that allows hackers to decipher AI assistant responses with surprising accuracy.
Recent research reveals a concerning vulnerability in AI assistants, excluding Google Gemini, exploited through token-length side-channel analysis. This method allows attackers to decipher encrypted responses, jeopardizing user privacy and security. It underscores the imperative for fortified transmission security measures and advocates for mitigation strategies. AI assistants, such as ChatGPT and Copilot, have revolutionized digital interaction, offering diverse services from query resolution to complex decision support. However, their widespread adoption also raises significant data privacy and security apprehensions. Addressing this vulnerability is paramount to upholding user trust and ensuring the safe utilization of AI technologies in the evolving digital landscape.
The Research
The paper, “ What Was Your Prompt? A Remote Keylogging Attack on AI Assistants” was published by Roy Weiss, Daniel Ayzenshteyn, Guy Amit, and, Yisroel Mirsky of Ben-Gurion University, Israel from the Offensive AI Research Lab.
In this research, the researchers introduce a new side-channel vulnerability that permits the reading of encrypted AI Assistant responses across the web: the token-length side-channel. The investigations reveal that numerous providers, including major names like OpenAI and Microsoft, are susceptible to this vulnerability. Deciphering the precise content of responses based purely on the length of token sequences presents a significant challenge due to the complexity and variety of possible grammatically correct sentences constructed from tokens, which function similarly to words.
To address this issue, several strategies were employed:
- Leveraging the capabilities of a large language model (LLM) to interpret these sequences,
- Enhancing the LLM’s context between sentences to refine the search parameters, and
- Implementing a known-plaintext attack approach by adapting the model to mimic the target model’s stylistic nuances.
These techniques enabled the researchers to accurately reconstruct 29% of the responses from AI assistants and to correctly determine the subject matter of 55% of these responses. Thus showcased the significance of this vulnerability by executing the attack on OpenAI’s ChatGPT-4 and Microsoft’s Copilot, analyzing both browser and API-generated traffic.
(Mitigating a token-length side-channel attack in our AI products)
Tokens and Tokenizers
Tokens are the smallest units of text that carry meaning in natural language processing (NLP). When AI models, like ChatGPT, process language, they break down sentences into these tokens, which can include words, punctuation, and spaces. The process, known as tokenization, is crucial for understanding and generating language. However, it also introduces a vulnerability when the length of these tokens becomes predictable.
Figure 1: Overview of the attack
- A packet capture of an AI assistant’s real-time response reveals a token-sequence side-channel.
- The side-channel is parsed to find text segments which are then reconstructed using sentence-level context and knowledge of the target LLM’s writing style
Large Language Models (LLMs) in AI Assistants
AI assistants rely on LLMs to process user prompts and generate responses. These models use tokens to understand and respond to queries in a coherent and contextually relevant manner. However, the sequential transmission of these tokens, necessary for the real-time functionality of AI assistants, exposes a side-channel through which the length of the tokens can be inferred.
The Token-length Side-channel Vulnerability
This vulnerability arises from the way LLMs, such as GPT-4, handle data transmission. Responses are generated and sent as a series of tokens. While the communication is encrypted, the sequential transmission of tokens allows an observer to infer the length of these tokens, providing a means to guess the content of the communication.
Attack Model
The attack model considers three entities: the user (Bob), the AI assistant (Alice), and the attacker (Eve). In this scenario, Eve captures encrypted traffic between Bob and Alice and extracts the token-length sequence from the response. Despite encryption, the size of the packets can reveal the length of the tokens, thereby allowing Eve to infer sensitive information shared in the conversation.
Token Inference Attack
The token inference attack methodology consists of several steps, including traffic interception, message identification, sequence extraction, and response inference. By utilizing a combination of heuristic analysis and the power of another LLM, attackers can reconstruct the AI assistant’s responses. This process involves predicting with LLMs, ranking options based on probability, and resolving the predicted response by concatenating the best segments.
Figure 2: An overview of the attack framework:
- Encrypted traffic is intercepted and then
- The start of the response is identified, then
- The token-length sequence T is extracted and
- A heuristic is used to partition T into ordered segments (T0,T1,…).
- Each segment is used to infer the text of the response.
This is done by (A) using two specialized LLMs to predict each segment sequentially based on prior outputs, (B) generating multiple options for each segment and selecting the best (most confident) result, and (C) resolving the predicted response Rˆ by concatenating the best segments together.
Technical Parameters:
- Setup of Baseline Model (GPT-4): The baseline model uses GPT-4 to infer responses based on the length of tokens in sequences.
- Grouping Tokens Analysis: Analysis was conducted on how many tokens are grouped by OpenAI’s ChatGPT in-browser service over a 24-hour period, showing percentages for grouped tokens at different times of the day.
Technical Output:
- Attack Success Rate on First Segment: Among 10k test-set responses from GPT-4, the attack had a success rate of over 54.5% for inferring the first segment of the responses. For segments inferred with very high accuracy (φ>0.9), the success rate was 28.9%.
- Performance Metrics: The evaluation involved several metrics, such as cosine similarity, Rouge metrics, and Edit Distance (ED), which showed varied results in reconstructed responses. Cosine similarity and Rouge metrics often indicated successful reconstructions, while ED provided a more conservative evaluation.
- Inference Performance on ChatGPT-4 Responses: The document also detailed performance for different segment lengths considered in the attack, with the overall success rates and specific metrics such as φ values, Rouge-1 scores, and Edit Distances for segments up to 20 and for all segments combined.
- Vendor Traffic Evaluation: The performance was evaluated for different services including OpenAI’s GPT-4 in-browser, marketplace, API, and Microsoft’s Copilot. This included measurements like Attack Success Rate (ASR) under conditions with and without token grouping, showcasing the transferability of the attack model between different AI assistants.
These technical details underscore the comprehensive analysis undertaken to understand and quantify the vulnerability and its impact. The study evaluates the ability to infer encrypted AI assistant responses based on token-length sequences, illustrating the potential for significant privacy implications.
Ethical Disclosure and Mitigation
Upon discovering this vulnerability, the researchers responsibly disclosed their findings to the affected vendors, including OpenAI and Microsoft, and engaged in discussions to help mitigate the risks. Several countermeasures are proposed, such as adding random padding to messages, grouping tokens before transmission, and batching responses to obscure the token-length information.
The discovery of the token-length side-channel vulnerability in AI assistants like ChatGPT and Copilot underscores the complex challenges at the intersection of AI, privacy, and security. It highlights the need for ongoing vigilance, responsible disclosure, and collaboration between researchers and industry to safeguard user privacy in the face of evolving cyber threats. As AI technologies continue to permeate various aspects of society, addressing these vulnerabilities becomes paramount to maintaining trust and ensuring the secure deployment of AI assistants.
Leave a Reply