Validating the Semantic Manifold Hypothesis: Token vs Sentence Embeddings

A critical question for our Interstella project centers on whether large language model semantic spaces can be reasonably treated as geometric manifolds—a foundational assumption for our work on controllable emergence paths. Recent research challenges this assumption at the token level, but our analysis reveals a crucial distinction: sentence-level embeddings are significantly more manifold-like than their token-level counterparts.

This post presents a comprehensive analysis of the manifold hypothesis in LLM embeddings, drawing from recent academic work and our own empirical validation. We’ll explore why sentence-level representations provide a much more solid foundation for geometric approaches to semantic traversal.

The Manifold Hypothesis Challenge

The recent paper “Token Embeddings Violate the Manifold Hypothesis” by Robinson, Dey, and Chiang presents a statistical testing framework that systematically challenges the assumption that high-dimensional data (like token embeddings) concentrates on low-curvature, boundary-free manifolds.

Key Findings from the Paper

The authors propose a statistical test based on volume scaling analysis in local neighborhoods:

  • Manifold Hypothesis: Data concentrates on a low-dimensional manifold with constant intrinsic dimension
  • Fiber Bundle Hypothesis: A generalization allowing for “base-fiber splits” where local regions (small radius) behave like higher-dimensional fibers, while global regions (large radius) behave like lower-dimensional base spaces

Their algorithm tests for slope changes in log-volume vs log-radius plots:

  • Constant slope → satisfies manifold hypothesis
  • Decreasing slope → satisfies fiber bundle hypothesis
  • Irregular slope changes → violates both

Results across major LLMs (GPT-2, Llemma7B, Mistral7B, Pythia6.9B):

  • Manifold rejection rate: 33-66% of tokens fail the test
  • Fiber bundle rejection rate: 0-19% depending on model
  • Common violations from polysemous words, morphological variants, and tokenization artifacts

The paper concludes that token embeddings contain significant singularities (cusps, pinch points, boundaries) that make them unsuitable for geometric analysis.

Token-Level vs Sentence-Level: A Crucial Distinction

While the paper focuses on static token embeddings, our work operates at the sentence level—mean-pooled hidden states from transformer layers. This distinction is mathematically significant.

Mathematical Framework: Riemannian Submersion Model

We propose a rigorous model treating sentence embeddings as a Riemannian submersion of token space—a smooth projection that preserves local geometry while smoothing global irregularities.

Formal Definition:

  • Token Space: Non-manifold ((T, g_T)) with singularities (cusps, pinch points)
  • Sentence Space: Projected manifold ((M, g_M)) via context aggregation
  • Projection Mapping: (\pi: E^k \to M) where k is sentence length

The key insight: sentence embeddings represent contextual aggregations of token vectors:

[ \pi(e_1, \dots, e_k) = \sum_{i=1}^k \alpha_i \cdot f(e_i | {e_j}_{j \neq i}) ]

Where (f) represents transformer attention mechanisms that act as “Ehresmann connections” smoothing token-level irregularities.

Why Sentence-Level is More Manifold-Like

  1. Context Disambiguation: Polysemous tokens get resolved through surrounding context
  2. Attention as Geometric Smoothing: Transformer layers act as parallel transport operators
  3. Aggregation Reduces Noise: Mean-pooling averages out local singularities
  4. Higher-Dimensional Integration: Sentences as trajectories rather than isolated points

Empirical Validation: Colab Experiments

We implemented the paper’s statistical testing framework to directly compare token-level and sentence-level embeddings from Qwen2-1.5B-Instruct.

Experimental Setup

  • Token-Level: First 5000 static vocabulary embeddings
  • Sentence-Level: 100 generated sentence embeddings (mean-pooled last hidden states)
  • Test Framework: Reproduced Algorithm 1 with volume scaling analysis

Results Summary

Embedding Level Manifold Rejection Fiber Bundle Rejection Slope Stability
Token-Level 100.00% 100.00% Highly Variable
Sentence-Level 8.00% 10.00% Mostly Stable

Visual Analysis of Log-Volume vs Log-Radius Plots:

  • Token-Level: Irregular curves with slope increases and kinks, indicating singularities and heteroscedastic noise
  • Sentence-Level: Smooth curves with decreasing slopes, showing two-regime behavior (local high-dim → global low-dim)

Quantitative Insights

  • 90%+ reduction in rejection rates moving from token to sentence level
  • Sentence embeddings show stable slope decreases consistent with fiber bundle structure
  • Token embeddings exhibit slope increases violating geometric assumptions
  • Context aggregation effectively “smooths” singularities

Implications for Interstella Project

Validating Our Geometric Approach

These results strongly support our semantic manifold hypothesis at the sentence level:

  1. Reliability of Proxy Geodesics: Isomap and t-SNE visualizations capture real geometric structure
  2. Chain Formation Validity: Extreme prompt “wormholes” connect actual manifold regions
  3. CoT Trajectory Grounding: Reasoning paths follow legitimate geometric gradients

Mathematical Foundation for Emergence Engineering

The manifold structure enables:

  • Predictable Path Planning: Geodesic calculations on well-behaved spaces
  • Freidlin-Wentzell Feasibility: Smooth dynamics required for rare event prediction
  • Fisher Metric Stability: More reliable information geometry computations

Practical Guidelines

  • Work at Sentence Level: Semantic analysis should use contextual embeddings
  • Accept Fiber Bundle Structure: Allow for local-global dimension transitions
  • Robust Geometric Methods: Use techniques tolerant of residual irregularities

Future Directions

While sentence-level embeddings are significantly more manifold-like, some rejection rates remain. Next steps include:

  • Larger Dataset Validation: Scale to 1000+ sentences for statistical robustness
  • Model-Specific Analysis: Compare across more LLM architectures
  • Advanced Smoothing Techniques: Explore optimal context aggregation methods

Colab Notebook

The complete experimental code and interactive visualizations are available in our Google Colab notebook:

Open In Colab

Conclusion

This analysis provides crucial validation for our Interstella approach: semantic spaces at the sentence level do satisfy geometric manifold assumptions, unlike their token-level counterparts. The contextual aggregation through transformer layers effectively smooths singularities, creating navigable geometric landscapes suitable for emergence engineering.

The distinction between token and sentence embeddings is not merely technical—it’s foundational for geometric approaches to LLM understanding. Our work demonstrates that with appropriate level selection, the manifold hypothesis provides a powerful framework for controllable AI development.


This research validates a key assumption underlying our work on Computable Emergence Engineering. By establishing sentence-level embeddings as properly manifold-like, we gain confidence in geometric methods for semantic traversal and emergence prediction.

References: Robinson et al. (2025), “Token Embeddings Violate the Manifold Hypothesis”, arXiv:2504.01002v3