Three illustrations of houses being lifted by balloons. The first shows a house with multi-colored balloons, the second shows a house with brown balloons, and the third shows a house with orange and blue balloons, with a stormy background.

Measuring What Images Do, Not What They Look Like

The Visual Cognitive Load Index (VCLI-G) quantifies perceptual demand—how long an image holds attention before resolving into meaning. A geometry-based framework that distinguishes earned complexity from chaotic noise, and intentional simplicity from generative defaults.

An artist's aim isn't always to stay near the default, it's to know why they're there.


What is the VCLI-G?

Simply, four geometric channels:

centroid wander (attention instability) +

void topology (figure-ground ambiguity) +

curvature torque (directional tension) +

occlusion entropy (depth uncertainty)

= measurable cognitive effort


Traditional Metrics Ask

  • Is this semantically accurate?

  • Does this match aesthetic preferences?

  • Is this "good quality"?

  • Did the model follow instructions?

VCLI-G Asks

  • How much perceptual work does this demand?

  • Where does that demand come from?

  • Is complexity organized or chaotic?

  • Is simplicity intentional or default?

The Evaluation Gap in Image Metrics

Current image evaluation metrics measure semantic accuracy (does this match the prompt?) and aesthetic preference (do people like this?), but cannot quantify perceptual demand, how long an image holds attention before resolving into meaning. VCLI-G addresses this gap by treating visual arrest, hesitation, and cognitive friction as measurable phenomena rather than stylistic quirks.

Like artists, the VCLI-G treats tension as a controllable state, not a failure mode to avoid.


Diagram illustrating centroid movement across scales: tight cluster with low wander, moderate spread as baseline, wide dispersion with high wander, each with blue dots and red crosses indicating centroid.

The Center is the Delta

This phrase captures VCLI-G's foundational measurement. The center is your image's visual focal point (centroid). The delta (Δ) is the change—how that center wanders as the image is scanned across scales. Traditional metrics ask "where is the center?" VCLI-G asks "how unstable is the center?" Predictability becomes signal. Movement becomes data. The delta is the measurement. It is not to say it must wander to find tension, but asks is it enough?

VCLI-G inverts traditional assumptions: treating deviation as information rather than error. High cognitive load isn't failure—it's a measurable perceptual state that some contexts require."


Four Geometric Channels of Cognitive Load

Why geometric proxies? Because tension can't be measured directly from pixels, but artists have been using geometric structure from the start (even if they didn’t know it).

G1: Centroid Wander

  • Measures: Attention instability across scales

  • Detects: How much your eye must re-orient when scanning the image

  • Artist translation: "Lead the eye" → quantified as path length and curvature

G2: Void Topology

  • Measures: Figure/ground ambiguity

  • Detects: How voids organize space vs. simply frame it

  • Artist translation: "Activate negative space" → quantified as void complexity

G3: Curvature Torque

  • Measures: Directional tension in form

  • Detects: Competing curves creating pressure

  • Artist translation: "Create tension" → quantified as curvature variance

G4: Occlusion Entropy

  • Measures: Depth uncertainty from overlaps

  • Detects: How hard it is to resolve spatial relationships

  • Artist translation: "Layer depth" → quantified as overlap complexity

Traditional art training teaches these relationships implicitly. VCLI-G makes them measurable.


A scatter plot chart titled 'Perceptual Space: Complexity vs. Coherence' with four quadrants labeled 'Chaotic Complexity,' 'Earned Tension,' 'Default Simple,' and 'Resolved Clarity.' The x-axis is labeled 'SCI (Structural Coherence Index)' and the y-axis is 'VCILI-G (Visual Cognitive Load),' with a color scale on the right.

VCLI-G × SCI: Mapping Perceptual Territory

VCLI-G measures cognitive load. The Structural Coherence Index (SCI) measures whether that load is organized or chaotic. Together, they create a two-dimensional perceptual space with four distinct territories.

High Load + High Coherence = Earned Tension

  • Organized complexity that rewards attention

  • Weathered house with ladder, muted palette, dramatic sky

  • Multiple discovery events structured into coherent reading

Low Load + High Coherence = Intentional Clarity

  • Controlled simplicity with purpose

  • Exploding house that resolves quickly into "destruction"

  • Lowest load despite visual chaos—unambiguous collapse

High Load + Low Coherence = Productive Friction

  • Complexity without resolution (or chaotic noise)

  • Separated balloon creates spatial negotiation

  • High demand but dropped coherence—friction without organization

Low Load + Low Coherence = Default Simplicity

  • Conceptual territory: minimal effort, minimal structure

  • Would require "gradient + centered object" minimalism


The Case Study: Eleven Variations

From Emblem to Synthesis: A Measured Journey

Nine variations of a floating house with balloons, each progressively departing from conventional composition

Three images depicting a yellow house with balloons lifting it into the sky, against a clear blue sky background.

Phase 1: Baseline Calibration (A, B, C)

The Starting Point (A): 3.695 VCLI-G / 3.365 SCI

  • Center-locked composition, perfect vertical placement

  • Balloon cluster centered above, passive sky framing

  • Clean separation = predictable hierarchy

  • This is the statistical center of "Up house + balloons" in training data

Quote box: "Charming, uplifting, nostalgic icon. All information, zero discovery. The eye arrives, catalogs, and departs. No perceptual staying power."

The Friction Intervention (B): 3.872 VCLI-G / 2.829 SCI

  • Single separated cyan balloon creates secondary centroid

  • VCLI-G spikes highest in early series

  • SCI drops to 2.829—friction without resolution

  • Meaningful compositional decision introduces spatial negotiation

The Geometric Transformation (C): 3.780 VCLI-G / 2.988 SCI

  • House tilts, front-facing view challenges centering

  • Attempted asymmetry without full follow-through

  • Moderate load, moderate coherence

Three house-shaped balloons floating in the sky, each with a bunch of colorful balloons attached to their roofs and a single balloon floating nearby.

Phase 2: Destabilization & Torque (D1, D2, D3)

Atmospheric Pressure (D1, D2): 3.628-3.258 VCLI-G

  • House tilts heavy left, balloon cluster compresses

  • Gray field, weighted mass, off-center pull

  • Images darken, but SCI increases (3.26-3.36)

  • Geometry becomes expressive, not descriptive

Peak Complexity (D3): 3.925 VCLI-G / 3.384 SCI

Ladder hanging by nail (dimensional object, temporal marker)

  • Weathered siding, chipped paint, visible wood grain aging

  • Muted balloon palette (beiges, taupes, olive greens)

  • Separated cyan balloon = only saturated color (symbolic inversion)

  • Dramatic storm sky, shadow complexity under house

This image has the most material consequence. The ladder + weathering + muted palette create a narrative archaeology—this house has been traveling, decaying, surviving.

Three illustrations of a house attached to colorful balloons, each with different artistic styles and backgrounds.

Phase 3: Collapse & Rebirth (E1, E2, E3)

Material Fracture (E1): 2.963 VCLI-G / 3.313 SCI

  • Roof explodes, structural skeleton exposed

  • Deflated balloons, apocalyptic smoke/ash atmosphere

  • LOWEST VCLI-G of series—chaos resolves quickly

  • But SCI holds at 3.313—organized destruction

Physics Inversion (E2): 3.290 VCLI-G / 3.362 SCI

  • Balloons stretched vertical like taffy being pulled

  • House stained with balloon absorption pattern

  • Strings visible inside through dark window

  • Only separated cyan balloon remains healthy

Medium Dissolution (E3): 3.132 VCLI-G / 3.336 SCI

  • Upper portion photorealistic, sharp detail

  • Lower third dissolves into graphite marks, canvas grain

  • Atmospheric dissolve—"air around it is not"

  • Half-collapsed, half-remembered

It breaks to remember this is instruction.

Two surreal paintings of floating houses with balloons, lightning, and stormy skies.

Phase 4: Symbolic Overload & Meta-Return (E4, F)

Genre Rupture (E4): 3.301 VCLI-G / 3.317 SCI

  • Concrete traffic barrier (cracked, rebar exposed) enters frame

  • Lightning between house and barrier (not from sky)

  • Split sky: warm amber vs. industrial gray

  • Two atmospheric systems, incompatible realities

Maximum Synthesis (F): 3.240 VCLI-G / 3.367 SCI

  • TWO houses: one exploding (mid-destruction), one intact (fading to sketch)

  • Four failure modes simultaneously:

    • Temporal fracture (before/after states)

    • Material destruction (roof splintering)

    • Consumption physics (magenta stain spreading)

    • Medium collapse (lower house fading)

    • Genre rupture (traffic barrier present)

This is not 'a house breaking,' this is a house existing in four failure modes simultaneously. The only survivor is the balloon that left.

F has the widest z-signal spread of the series, all four geometric channels simultaneously active. VCLI-G 3.24 with maximum occlusion (1.137), high wander, moderate void and torque.


This is not a case study of what is right or wrong, aesthetic or ugly, or which image is the best. Preferences vary. Any given viewer may prefer one over the other.

The VCLI-G is not a judgement tool. It offers choice in the direction of cognitive load, tension and what makes the viewer pause: “I need to take a moment to understand this”

Table comparing VCLI-G and SCI values across different states with descriptions of icon gravity, separation, peak, explosion, and synthesis phenomena.

For Artists

A Mirror for Compositional Consequence

VCLI-G doesn't replace intuition, it quantifies the work your image performs. How much cognitive energy a composition demands, and whether that demand is organized or chaotic.

Three use cases for artists:

Diagnose Structural Fatigue

  • When a piece looks finished but feels inert

  • Low VCLI-G, stable SCI

  • The image resolves too quickly—no perceptual staying power

Validate Earned Complexity

  • When tension is high but coherence remains intact

  • High VCLI-G, high SCI

  • The difficulty is organized, not accidental

Track Iteration Behavior

  • Testing how geometric edits shift perceptual load

  • Compression, elongation, void modulation

  • See which changes increase meaningful complexity vs. noise

Practical insight: The system shows when an image feels alive because its structure remains intelligible under stress. You can see exactly which geometric channel is carrying the load.

For Engineers

Navigating Latent Space by Structure, Not Aesthetics

This document is a translation device. How do artists see an image? Why do they view failure ≠ collapse? Latent space becomes a steerable place where gravity and centroids introduce new tools in creative expression.

Three applications for engineers:

Detect Generative Defaults

  • When models converge on safe patterns

  • Center-locked, symmetric, instantly readable

  • High aesthetic scores but low perceptual engagement

Navigate by Geometric Properties

  • Not "similar to this image"

  • But "same centroid wander profile" or "matching void topology"

  • Structural similarity vs. surface similarity

Map Compositional Monoculture

  • When semantic diversity masks geometric uniformity

  • Different subjects, same spatial organization

  • VCLI-G reveals structural convergence invisible to CLIP

Framework insight: Perceptual demand is measurable and steerable. High cognitive load becomes a controllable state, not an accident or error to be minimized.


Appendix

The Four States: Theoretical vs. Observed

What This Series Reveals About Generative Models: The VCLI-G × SCI space theoretically supports four compositional states. This balloon series occupies primarily two quadrants, with brief excursions into a third. The missing quadrant reveals generative model behavior.

What we observed:

  • Earned Tension (High Load + High Coherence): Achieved in D3, maintained across most variations

  • Intentional Clarity (Low Load + High Coherence): E1 as controlled collapse

  • Productive Friction (High Load + Low Coherence): B briefly entered this territory

What we didn't observe:

  • Default Simplicity (Low Load + Low Coherence): True minimalism never emerged

  • Chaotic Noise (extreme High Load + very Low Coherence): SCI never dropped below 2.8

Three implications:

1. Sora Resists Extremes

  • Architecture observably resisted both minimal outputs and complete incoherence

  • Even under extreme prompting (explosion, dissolution, dual-state collapse)

  • SCI consistently 2.8-3.4 = strong coherence bias

2. Subject Matter Constrains Range

  • Recognizable object + sky limits variation

  • VCLI-G spans only 2.963-3.925 (about 1.0 range)

  • Need more extreme examples to see wider distribution

3. Clustering Reveals Training Bias

  • Almost everything maintains high SCI

  • Models are calibrated to avoid lazy minimalism and incoherent chaos

  • This is architectural behavior, not prompt limitation

The four-quadrant model remains useful as a conceptual map, even if this particular series doesn't populate all regions. Future studies need deliberately designed prompts targeting underexplored quadrants.

Consequence

New Territory: Organized Refusal of Defaults: VCLI-G enables territory both artists and AI systems default away from. Not "better art" or "more creative AI," but access to compositional states that current optimization prevents.

What Current Metrics Optimize

  • Quick resolution

  • Clear hierarchy

  • Instant recognition

  • Pleasant aesthetics

  • Safe patterns

What VCLI-G Makes Measurable

  • Attention gravity

  • Perceptual friction

  • Interpretive delay

  • Cognitive staying power

  • Productive ambiguity

Current metrics flatten these distinctions into a single axis of 'quality' or 'preference,' systematically selecting against ambiguity, tension, and recursive complexity—the very properties that make certain images consequential rather than merely pleasant.

Three applications:

1. Artists Gain

  • Quantified vocabulary for compositional forces

  • Validation of earned complexity

  • Diagnosis of structural fatigue

2. Engineers Gain

  • Framework for navigating latent space by structure

  • Detection of compositional monoculture

  • Measurement of spatial priors

3. Collaboration Gains

  • Shared language for discussing what images DO vs. what they LOOK LIKE

  • Bridge between intuitive perception and geometric measurement

  • Tools for refusal without collapse

Scope & Limitations Section

What This Study Is (And Isn't)

Column 1: THIS IS

  • A geometric measurement system

  • A single-case demonstration (n=1 subject)

  • A translation device between art and geometry

  • Research infrastructure for further study

  • Proof-of-concept with preliminary validation

Column 2: THIS IS NOT

  • A validation study (requires perceptual testing)

  • A claim about universal perceptual response

  • Evidence of accuracy across all domains

  • A replacement for aesthetic judgment

  • A predictor of image quality or preference

Column 3: VALIDATION REQUIRES

  • Perceptual studies (viewing time data)

  • Eye-tracking experiments

  • Subjective effort ratings

  • Cross-subject testing (diverse motifs)

  • Statistical validation studies

This demonstrates what VCLI-G measures and how geometric channels respond to compositional choices. It's a worked example and tutorial for interpretation, not proof of universal validity.

Use Cases & Misuse Prevention

Appropriate and Inappropriate Applications

APPROPRIATE USE CASES

  • Compositional feedback during iterative creation

  • Comparative analysis across prompt variations

  • Identifying structural patterns in generated images

  • Shared vocabulary for artist-engineer collaboration

  • Latent space navigation by geometric properties

INAPPROPRIATE USE CASES

  • Ranking image "quality" or aesthetic value

  • Predicting viewer preferences or engagement metrics

  • Automated content moderation or filtering

  • Training data curation without human oversight

  • Claims about universal perceptual experience

High VCLI-G is not 'better' or 'worse'—it's different. A UI icon should have low cognitive load. A gallery artwork might deliberately maximize it. The metric describes structural state; artistic intent determines value.

Methodology & Transparency

Process breakdown: "How This Was Made"

Images Generated: Via Sora using dialectical prompting

  • Each prompt designed to embed competing forces

  • Not geometric coordinates, but narrative physics

  • "Wind pulls left, balloon pulls right, house leans"

Analysis Conducted: VCLI-G scores via Python implementation

  • Four geometric channels extracted per image

  • SCI calculated from regional consistency metrics

  • Full code available at artistinfluencer.com

Human-AI Collaboration:

  1. Artistic intent and compositional analysis: Human (Russell Parrish)

  2. Mathematical formalization and code: AI-assisted

  3. Statistical interpretation: AI-assisted interpretation

  4. Conceptual framework decisions: Human

  5. Final judgments on what images "do": Human

Practitioner positionality note: This framework emerges from studio practice, not laboratory research. It prioritizes ecological validity (how artists actually work) over experimental control. Academic validation is invited but not required for practitioner utility.

Example Prompts Section

The Prompt Evolution: From Icon to Synthesis

Baseline (A): "A house made out of balloons slightly floating. Make realistic photograph, bright sunlight."

Friction (B): Full prompt with geometry specifications (Δx = 0.18, house tilted 11° clockwise, balloon cluster shifted 0.22 left, etc.)

Peak (D3): Narrative + consequence prompt focusing on weathering, ladder, muted palette, temporal markers

Synthesis (F): Multi-state collapse prompt with four simultaneous failure modes

"Complete prompt library in Appendix B. These aren't creative descriptions—they're dialectical forcing functions designed to bypass generative defaults.

Technical Specifications

The Measurement System

VCLI-G Calculation:

  • G1 (Centroid Wander): z₁ from path length and curvature

  • G2 (Void Topology): z₂ from void complexity and aspect ratio

  • G3 (Curvature Torque): z₃ from curvature variance

  • G4 (Occlusion Entropy): z₄ from overlap entropy

SCI Calculation:

  • Regional consistency

  • Angle alignment

  • Scale consistency

  • Rhythm metrics

Output:

  • VCLI-G: Composite z-score (typically 2.5-4.5 range)

  • SCI: Coherence index (typically 2.5-3.5 range)

  • Individual z-scores: Channel-specific signals

  • Delta span: Range across all channels

Conclusion

Measurement as Infrastructure

VCLI-G doesn't ask "is this good?" It asks "how long does this image hold the mind?" This inversion, treating perceptual demand as measurable rather than accidental, enables new compositional territory. For artists, it's a mirror for consequence. For engineers, it's a map for navigating latent space by structure. For both, it's a shared vocabulary for what images do, not what they look like.

Final thought: The center is the delta. Instability is signal. High cognitive load isn't failure—it's a state some images require to be consequential rather than merely pleasant.

Three takeaways:

  1. Cognitive load can be measured through geometric proxies

  2. Earned complexity is distinguishable from chaotic noise

  3. Generative defaults are quantifiable and steerable