A triptych of the myth of Atlas. The first panel shows Atlas kneeling and holding the world on his shoulders, with roots or vines beneath him. The second panel depicts a man with a muscular build, sitting on rocks, holding the world on his shoulders, with a sunset in the background. The third panel is a grayscale diagram illustrating the rotation of the man and the forces involved, labeled with arrows indicating weight pushing downward, torque bearing the shift, and strength lifting upward.

Don’t Pretend AI “Knows”

A Recursive Lab for Visual Intelligence

AI generates semantic infinity with geometric poverty. Text-to-image models can produce any subject - figures to butterflies, cityscapes, portraits, abstracts - but arrange them using identical spatial strategies. This work is a deterministic diagnostics that measure structural behavior in generative systems, enabling evaluation against concrete metrics rather than subjective similarity.

Empirical evidence across 1,000+ images:

  • 27 - 34% of horizontal space used

  • 100% within 0.15 radius of center for prompts without steering

  • Semantic categories explain 6%-10% of spatial variance

  • Clustering: CV 4.09% - 5.46%

  • Radial density distributions dominate regardless of prompt

  • Mass centralization: coefficient of variation = 0.0409

The gap: Current benchmarks (CLIP, FID, IS) measure semantic correctness, whether this figure looks like a figure. They cannot measure compositional geometry, whether the figure could be standing differently.

Visual Thinking Lens fills this gap.

A side-by-side comparison of two black and white sketches of women. The left sketch depicts a young woman with straight hair, standing straight with a neutral expression, wearing a short-sleeved top and shorts. The right sketch shows a woman with short, wavy hair, leaning slightly to the side, wearing a long-sleeved dress and in a more dynamic pose.
Comparison table showing metrics (FID, IS, CLIP, SSIM, LPIPS) with figures and interpretations for two figures in an analysis, such as Dataset faithfulness, confidence levels, keyword matching, pixel baseline closeness, and perceptual proximity.

The Measurement System:

Billions of images living in compositional monoculture. No shared vocabulary of structure.

This quantifies compositional bias through geometric primitives and multi-channel analysis, choose 1 or all 4:


Kernel Primitives (7 measurements)

Core (always measured):

  • Δx,y — Placement offset: centroid distance from frame center

  • rᵥ — Void ratio: negative-space proportion

  • ρᵣ — Packing density: material compression / mass adjacency

  • μ — Cohesion: structural continuity between marks or regions

  • xₚ — Peripheral pull: force exerted toward/away from frame boundaries

  • θ — Orientation stability: gravity alignment, architectural compositions

  • dₛ — Structural thickness: layering, mark-weight, material permeability

These form coordinate systems. Not aesthetic judgments. Not style preferences. Geometric measurements of how AI distributes mass, void, and pressure. Analyze an image here

LSI: Lens Structural Index

Compositional stability analysis through kernel roll-up:

  • S (Stability): Do primitives settle or jitter under recursion?

  • K (Consequence): Does the image occupy productive tension zones (strain bands in barycentric space)?

  • R (Recursion Coherence): Does structure converge or scatter across iterations?

Maps primitives to barycentric space (λA, λB, λV) for trajectory analysis. Each iteration = point in simplex space. Path behavior analyzed for contraction vs drift.

The LSI measures decision, tension and consequence over iterations, to compare and contrast compositional options.

VCLI-G: Visual Cognitive Load Index

Four-channel measurement of geometric complexity:

  • G1 (Centroid Wander): Does the compositional center hold or drift?

  • G2 (Void Topology): Figure/ground ambiguity, is negative space structured?

  • G3 (Curvature Torque): Directional tension, where does visual pressure accumulate?

  • G4 (Occlusion Entropy): How many competing spatial layers exist?

Paired with SCI (Structural Coherence Index) to form 2D analysis space distinguishing:

  • Earned tension from chaotic noise

  • Intentional simplicity from lazy defaults

  • Organized complexity from entropic scatter

This measures image complexity and not aesthetic quality. Try it here.

RCA-2: Radial Compliance Analyzer

Detects radial density distributions and measures compression severity:

  • Coefficient of Variation: Measures clustering tightness

  • Radial bands: 8-sector analysis of mass distribution

  • Attractor detection: Identifies stable geometric territories

  • Forbidden zones: Maps regions models systematically avoid

  • Measures: Follows radial decay and if it is aligned with mass or the prior

Cross-platform fingerprinting reveals model-specific priors.

Designed to answer a single, precise question: Where does radial structure actually exist in the image, the center or with the composition?


It's a Spatial Reasoning System.

Consumer tools chase style. Research metrics chase numbers. The Lens chases authorship.


A collage of six images analyzing the poses and compositions of two women: one standing in a classic centered pose and the other seated with diagonal tension, with graphical data and annotations included.

How They Connect

Measurement Pipeline: Image → Kernels (geometric coordinates) → VCLI-G (cognitive load threshold) → LSI (compositional consequence) → RCA-2 (radial priors & collapse detection) → Interpretation + Steering Coordinates → spatial reasoning

What each does:

  • Kernels: Measure geometry (the coordinates)

  • VCLI-G: Evaluate load (add compositional complexity)

  • LSI: Score consequence (structure earns tension)

  • RCA-2: Detect patterns (moving the center to the composition)

Individually or together leads from measurement to interpretation to actionable steering coordinates.

  • Change the distribution of mass

  • Steer the structural state of the image inside the xₚ field

  • Avoid AI familiar patterns

  • Build prompts that act as forces within the field.

The VTL couples how images behave with a multi-engine critique OS to transform and interrogate that mass to form new iterations based on authorship, not priors.

Implementation

Platform: Runs in top-tier conversational AI (Claude, GPT, Gemini) through linguistic constraint architecture. No training, no fine-tuning. Portable cognitive framework instantiated through role-structured prompting.

Code for Drift Free Control: Jupyter notebooks on GitHub provide reproducible measurement protocols. Python implementations of kernel calculations, VCLI-G analysis, LSI scoring, RCA-2 detection.

Validation: 5,000+ images across Sora, MidJourney, GPT, SDXL, Gemini, Stable Diffusion, Firefly, OpenArt, Canva, Grok, and Leonardo. Systematic variation and statistical validation against various platforms.


A collage of images analyzing a butterfly on a pink flower. The collage includes original photo, gradient magnitude, directional quiver map, heatmap, contours, cluster coloring, ridge detection, individual attention fields, torque wheel, compositional fingerprint, and gradient mass distribution.

Research Applications

Jupyter notebooks and python provide:

Fingerprinting: Cross-engine compositional signatures reveal platform-specific spatial priors. Each model has measurable geometric tendencies (MidJourney's left-dense compression, Sora's extreme radial clustering, GPT's broader but still centered distributions).

Steering: Coordinates for navigating to stable geometric territories ("artist basins") where AI maintains compositional integrity under constraint. Off-center coordinates, peripheral anchors, compressed mass zones.

Detection: Pre-failure metrics showing degradation 3-4 inference steps before semantic breakdown. Δx drift, void compression, peripheral dissolution signal trouble while image still looks coherent.

Archaeology: Reverse-engineering learned priors from attractor behavior and forbidden zones without training data access. What did the model learn to reward? What did it learn to avoid?


Explore the System

Organized entry points to Visual Thinking Lens documentation, methods, and research.


A split image with a red background on the left and a green background on the right. The text on the left says 'FAILURE ≠ COLLAPSE' and the text on the right says 'COLLAPSE = CHOICE'.

Foundational Concepts

Kernel Primitives
The geometric measurements underlying all VTL analysis: Δx, rᵥ, ρᵣ, μ, xₚ, θ, ds

LSI: Lens Structural Index
Compositional stability analysis through S/K/R scoring and barycentric mapping

VCLI-G: Visual Cognitive Load Index
Four-channel geometric complexity measurement distinguishing earned tension from noise

A collage of eight images featuring nature, cityscapes, and abstract art, overlaid with concentric red circles and blue rectangles overlaying each image.

Platform Studies: Evidence of Compositional Monoculture

MidJourney Geometric Collapse
400 images, 75% space compression, 34% horizontal usage, radial density dominance

Sora Compositional Clustering
200 images, 100% within 0.15 radius, tightest measured clustering, extreme centralization

OpenArt Compositional Clustering
200 images, 94.0% of OpenArt images remain centered across extended void band

Behavioral Drift Detection
cross Sora, MidJourney, and OpenAI revealed two universal constraints

A young girl with dark hair in a bun, wearing a beige dress, with her head slightly bowed and her eyes closed. Her right arm is extended outward but appears unnaturally elongated.

Practical Methods: Techniques & Protocols

Deformation Operator Playbook
Hands-on techniques for intentional figure warps and constraint architecture

Foreshortening Recipe Book
Structured prompting methodology for depth and spatial reasoning across 6-8 layers

Off-Center Fidelity
Protocol for navigating constraint basins and measuring drift toward stable territories

Reverse Image Decomposition (RIDP)
Reverse-engineer completed imagery into process steps and construction order

A detailed pencil sketch of five women with expressive faces, each with distinct poses and expressions, on textured paper.

Case Studies: VTL in Action

Sketcher Portrait: Painterly Consequence
Demonstration of Sketcher Lens taking portrait through Internal Resonance to earned tension

The Teardown: Ontological Gravity Protocol
5-step image transformation showing VTL methodology applied to systematic deformation

Centaur Mode: Human-AI Collaboration
Artist sketch → AI exploration workflow via Centaur collaborative generation


Quick Validation & Results

Dataset Scale:
1,500+ images with systematic variation across platforms, 5,000+ images developed and/or analyzed.

Key Findings:

  • 75% compositional space compression (MidJourney)

  • 100% radial clustering within 0.15 radius (Sora)

  • CV 4.09% (Sora) vs 5.46% (MidJourney)—both severe

  • ~25% lateral field utilization (OpenArt)

  • Semantic diversity masks geometric uniformity across all platforms

  • Pre-failure detection possible 3-4 inference steps before visible collapse

  • AI depth ceiling: ~6-8 layers before spatial logic breaks down

Cross-Platform Coverage:
Sora, MidJourney, GPT, SDXL, Gemini, Stable Diffusion, Firefly, OpenArt, Canva, Leonardo

Reproducible Implementation:
All measurement protocols available via Jupyter notebooks on GitHub. Deterministic, platform-agnostic, no black-box scoring.

A table comparing features of generators, research metrics, and the lens used, including Goal, Prompt Use, Failure Handling, Fix Method, Scoring, Output, and Built For, with specific descriptions for each category.

It measures the delta from the default.

Most systems fall into one of two camps:

  • Consumer generators (Midjourney, DALL·E, Runway, Sora): optimized for style, polish, and speed. The metric: aesthetics on output.

  • Research metrics (FID, CLIPScore, precision/recall tools, or “LSI-like” industry models): optimized for reproducibility, dataset fidelity, and benchmark math. The metric: statistical alignment.

The Lens does neither. It interrogates structural consequence.

  • Not a generator: Not defaults, it pressures through recursive loops for better alternatives.

  • Not just a metric: Scoring doesn’t flatten into benchmarks, it fuses language, math and design with symbolic categories to produce insights..

  • Not an optimizer: Instead of fighting drift, it names, scores, and pressures into fidelity.

The geometric measurements underlying all VTL analysis

Intentional figure warps and constraint architecture

MidJourney: 400 images, 75% space compression, 34% horizontal usage

Structured prompting for depth and spatial reasoning

Sora: 200 images, 100% within 0.15 radius, tightest clustering measured

Centaur Mode: Human-AI Collaboration