ICML 2026

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

1The Hong Kong University of Science and Technology (Guangzhou)
*Corresponding author
PASA Teaser

Left: Illustration of PASA

PASA is a principled watermarking approach operating in the latent embedding space over semantic clusters. By anchoring shared randomness to semantic clusters via a secret key, PASA remains robust against semantic-invariant attacks, such as paraphrasing, while ensuring distortion-free generation.

Right: Robustness under Paraphrasing

Quantitative results demonstrate that PASA outperforms standard vocabulary-space watermarking baselines across varying paraphrase strengths in both AUC-ROC and TPR@1%FPR.

Abstract

Watermarking large language models (LLMs) has emerged as a promising approach for identifying LLM-generated text and supporting responsible AI deployment. However, existing watermarking methods are often vulnerable to semantic-invariant attacks, such as paraphrasing, which can substantially weaken or remove watermark signals while preserving the original meaning.

We propose PASA, a principled, robust, and distortion-free watermarking algorithm that embeds and detects watermark signals at the semantic level. PASA operates over semantic clusters in a latent embedding space and establishes a distributional dependency between token sequences and auxiliary sequences through shared randomness synchronized by a secret key and semantic history. This design is grounded in our theoretical framework, which characterizes a jointly optimal embedding-detection pair and captures the fundamental trade-offs among detection accuracy, robustness, and distortion.

Extensive evaluations across multiple LLMs and semantic-invariant attacks demonstrate that PASA remains robust even under strong paraphrasing attacks while preserving high text quality. Compared with standard vocabulary-space watermarking baselines, PASA achieves stronger robustness and detection performance. Ablation studies further validate the effectiveness of our key design choices and hyperparameter settings.

Overview of PASA

PASA Method Overview

Left: Semantic Mapping

PASA first constructs a semantic mapping function f, which partitions the latent token embedding space into K semantic clusters.

Top: Generation

At each step t, the next-token prediction distribution Qt is transformed into the cluster distribution Qft. The auxiliary distribution Pζt is truncated by a threshold α and contains an overflow state ζ̃ to ensure false-alarm error control. Auxiliary sampling of ζt uses a seed generated by a PRF with a secret key and w semantic history as input. The sampled auxiliary random variable ζt then guides the sampling of the next token xt within the selected semantic cluster.

Bottom: Detection

For a potentially modified observed token sequence, the detector approximates the generation distribution through a surrogate small language model. The detection score accumulates based on the alignment between the resampled ζt and the observed semantic cluster f(xt).

Experiment Results

We evaluate PASA under standard detection settings and challenging semantic-invariant attacks. The results show that PASA achieves robust detection while preserving text quality and maintaining practical detection cost.

How to Use PASA?

PASA provides a simple reproduction pipeline for generating watermarked text. To reproduce the main experiment, follow these steps:

  1. Clone the PASA repository from GitHub.

    git clone https://github.com/ai-kunkun/PASA.git
    cd PASA
  2. Setup the conda environment and install dependencies.

    conda create -n pasa python=3.10 -y
    conda activate pasa
    
    pip install torch transformers accelerate datasets pandas tqdm nltk tokenizers
    python -c "import nltk; nltk.download('punkt')"
  3. If you encounter any issues, please open an issue at PASA Issues, and we will assist you as soon as possible.

BibTeX

If you find our work helpful, please cite our paper:

@misc{ai2026pasa,
  title         = {PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks},
  author        = {Ai, Zhenxin and He, Haiyun},
  year          = {2026},
  eprint        = {2605.10977},
  archivePrefix = {arXiv}
}