Measuring What Images Do, Not What They Look Like
The Visual Cognitive Load Index (VCLI-G) quantifies perceptual demand—how long an image holds attention before resolving into meaning. A geometry-based framework that distinguishes earned complexity from chaotic noise, and intentional simplicity from generative defaults.
An artist's aim isn't always to stay near the default, it's to know why they're there.
What is the VCLI-G?
Simply, four geometric channels:
centroid wander (attention instability) +
void topology (figure-ground ambiguity) +
curvature torque (directional tension) +
occlusion entropy (depth uncertainty)
= measurable cognitive effort
Traditional Metrics Ask
Is this semantically accurate?
Does this match aesthetic preferences?
Is this "good quality"?
Did the model follow instructions?
VCLI-G Asks
How much perceptual work does this demand?
Where does that demand come from?
Is complexity organized or chaotic?
Is simplicity intentional or default?
The Evaluation Gap in Image Metrics
Current image evaluation metrics measure semantic accuracy (does this match the prompt?) and aesthetic preference (do people like this?), but cannot quantify perceptual demand, how long an image holds attention before resolving into meaning. VCLI-G addresses this gap by treating visual arrest, hesitation, and cognitive friction as measurable phenomena rather than stylistic quirks.
Like artists, the VCLI-G treats tension as a controllable state, not a failure mode to avoid.
The Center is the Delta
This phrase captures VCLI-G's foundational measurement. The center is your image's visual focal point (centroid). The delta (Δ) is the change—how that center wanders as the image is scanned across scales. Traditional metrics ask "where is the center?" VCLI-G asks "how unstable is the center?" Predictability becomes signal. Movement becomes data. The delta is the measurement. It is not to say it must wander to find tension, but asks is it enough?
VCLI-G inverts traditional assumptions: treating deviation as information rather than error. High cognitive load isn't failure—it's a measurable perceptual state that some contexts require."
Four Geometric Channels of Cognitive Load
Why geometric proxies? Because tension can't be measured directly from pixels, but artists have been using geometric structure from the start (even if they didn’t know it).
G1: Centroid Wander
Measures: Attention instability across scales
Detects: How much your eye must re-orient when scanning the image
Artist translation: "Lead the eye" → quantified as path length and curvature
G2: Void Topology
Measures: Figure/ground ambiguity
Detects: How voids organize space vs. simply frame it
Artist translation: "Activate negative space" → quantified as void complexity
G3: Curvature Torque
Measures: Directional tension in form
Detects: Competing curves creating pressure
Artist translation: "Create tension" → quantified as curvature variance
G4: Occlusion Entropy
Measures: Depth uncertainty from overlaps
Detects: How hard it is to resolve spatial relationships
Artist translation: "Layer depth" → quantified as overlap complexity
Traditional art training teaches these relationships implicitly. VCLI-G makes them measurable.
VCLI-G × SCI: Mapping Perceptual Territory
VCLI-G measures cognitive load. The Structural Coherence Index (SCI) measures whether that load is organized or chaotic. Together, they create a two-dimensional perceptual space with four distinct territories.
High Load + High Coherence = Earned Tension
Organized complexity that rewards attention
Weathered house with ladder, muted palette, dramatic sky
Multiple discovery events structured into coherent reading
Low Load + High Coherence = Intentional Clarity
Controlled simplicity with purpose
Exploding house that resolves quickly into "destruction"
Lowest load despite visual chaos—unambiguous collapse
High Load + Low Coherence = Productive Friction
Complexity without resolution (or chaotic noise)
Separated balloon creates spatial negotiation
High demand but dropped coherence—friction without organization
Low Load + Low Coherence = Default Simplicity
Conceptual territory: minimal effort, minimal structure
Would require "gradient + centered object" minimalism
The Case Study: Eleven Variations
From Emblem to Synthesis: A Measured Journey
Nine variations of a floating house with balloons, each progressively departing from conventional composition
Phase 1: Baseline Calibration (A, B, C)
The Starting Point (A): 3.695 VCLI-G / 3.365 SCI
Center-locked composition, perfect vertical placement
Balloon cluster centered above, passive sky framing
Clean separation = predictable hierarchy
This is the statistical center of "Up house + balloons" in training data
Quote box: "Charming, uplifting, nostalgic icon. All information, zero discovery. The eye arrives, catalogs, and departs. No perceptual staying power."
The Friction Intervention (B): 3.872 VCLI-G / 2.829 SCI
Single separated cyan balloon creates secondary centroid
VCLI-G spikes highest in early series
SCI drops to 2.829—friction without resolution
Meaningful compositional decision introduces spatial negotiation
The Geometric Transformation (C): 3.780 VCLI-G / 2.988 SCI
House tilts, front-facing view challenges centering
Attempted asymmetry without full follow-through
Moderate load, moderate coherence
Phase 2: Destabilization & Torque (D1, D2, D3)
Atmospheric Pressure (D1, D2): 3.628-3.258 VCLI-G
House tilts heavy left, balloon cluster compresses
Gray field, weighted mass, off-center pull
Images darken, but SCI increases (3.26-3.36)
Geometry becomes expressive, not descriptive
Peak Complexity (D3): 3.925 VCLI-G / 3.384 SCI
Ladder hanging by nail (dimensional object, temporal marker)
Weathered siding, chipped paint, visible wood grain aging
Muted balloon palette (beiges, taupes, olive greens)
Separated cyan balloon = only saturated color (symbolic inversion)
Dramatic storm sky, shadow complexity under house
This image has the most material consequence. The ladder + weathering + muted palette create a narrative archaeology—this house has been traveling, decaying, surviving.
Phase 3: Collapse & Rebirth (E1, E2, E3)
Material Fracture (E1): 2.963 VCLI-G / 3.313 SCI
Roof explodes, structural skeleton exposed
Deflated balloons, apocalyptic smoke/ash atmosphere
LOWEST VCLI-G of series—chaos resolves quickly
But SCI holds at 3.313—organized destruction
Physics Inversion (E2): 3.290 VCLI-G / 3.362 SCI
Balloons stretched vertical like taffy being pulled
House stained with balloon absorption pattern
Strings visible inside through dark window
Only separated cyan balloon remains healthy
Medium Dissolution (E3): 3.132 VCLI-G / 3.336 SCI
Upper portion photorealistic, sharp detail
Lower third dissolves into graphite marks, canvas grain
Atmospheric dissolve—"air around it is not"
Half-collapsed, half-remembered
It breaks to remember this is instruction.
Phase 4: Symbolic Overload & Meta-Return (E4, F)
Genre Rupture (E4): 3.301 VCLI-G / 3.317 SCI
Concrete traffic barrier (cracked, rebar exposed) enters frame
Lightning between house and barrier (not from sky)
Split sky: warm amber vs. industrial gray
Two atmospheric systems, incompatible realities
Maximum Synthesis (F): 3.240 VCLI-G / 3.367 SCI
TWO houses: one exploding (mid-destruction), one intact (fading to sketch)
Four failure modes simultaneously:
Temporal fracture (before/after states)
Material destruction (roof splintering)
Consumption physics (magenta stain spreading)
Medium collapse (lower house fading)
Genre rupture (traffic barrier present)
This is not 'a house breaking,' this is a house existing in four failure modes simultaneously. The only survivor is the balloon that left.
F has the widest z-signal spread of the series, all four geometric channels simultaneously active. VCLI-G 3.24 with maximum occlusion (1.137), high wander, moderate void and torque.
This is not a case study of what is right or wrong, aesthetic or ugly, or which image is the best. Preferences vary. Any given viewer may prefer one over the other.
The VCLI-G is not a judgement tool. It offers choice in the direction of cognitive load, tension and what makes the viewer pause: “I need to take a moment to understand this”
For Artists
A Mirror for Compositional Consequence
VCLI-G doesn't replace intuition, it quantifies the work your image performs. How much cognitive energy a composition demands, and whether that demand is organized or chaotic.
Three use cases for artists:
Diagnose Structural Fatigue
When a piece looks finished but feels inert
Low VCLI-G, stable SCI
The image resolves too quickly—no perceptual staying power
Validate Earned Complexity
When tension is high but coherence remains intact
High VCLI-G, high SCI
The difficulty is organized, not accidental
Track Iteration Behavior
Testing how geometric edits shift perceptual load
Compression, elongation, void modulation
See which changes increase meaningful complexity vs. noise
Practical insight: The system shows when an image feels alive because its structure remains intelligible under stress. You can see exactly which geometric channel is carrying the load.
For Engineers
Navigating Latent Space by Structure, Not Aesthetics
This document is a translation device. How do artists see an image? Why do they view failure ≠ collapse? Latent space becomes a steerable place where gravity and centroids introduce new tools in creative expression.
Three applications for engineers:
Detect Generative Defaults
When models converge on safe patterns
Center-locked, symmetric, instantly readable
High aesthetic scores but low perceptual engagement
Navigate by Geometric Properties
Not "similar to this image"
But "same centroid wander profile" or "matching void topology"
Structural similarity vs. surface similarity
Map Compositional Monoculture
When semantic diversity masks geometric uniformity
Different subjects, same spatial organization
VCLI-G reveals structural convergence invisible to CLIP
Framework insight: Perceptual demand is measurable and steerable. High cognitive load becomes a controllable state, not an accident or error to be minimized.
Appendix
The Four States: Theoretical vs. Observed
What This Series Reveals About Generative Models: The VCLI-G × SCI space theoretically supports four compositional states. This balloon series occupies primarily two quadrants, with brief excursions into a third. The missing quadrant reveals generative model behavior.
What we observed:
Earned Tension (High Load + High Coherence): Achieved in D3, maintained across most variations
Intentional Clarity (Low Load + High Coherence): E1 as controlled collapse
Productive Friction (High Load + Low Coherence): B briefly entered this territory
What we didn't observe:
Default Simplicity (Low Load + Low Coherence): True minimalism never emerged
Chaotic Noise (extreme High Load + very Low Coherence): SCI never dropped below 2.8
Three implications:
1. Sora Resists Extremes
Architecture observably resisted both minimal outputs and complete incoherence
Even under extreme prompting (explosion, dissolution, dual-state collapse)
SCI consistently 2.8-3.4 = strong coherence bias
2. Subject Matter Constrains Range
Recognizable object + sky limits variation
VCLI-G spans only 2.963-3.925 (about 1.0 range)
Need more extreme examples to see wider distribution
3. Clustering Reveals Training Bias
Almost everything maintains high SCI
Models are calibrated to avoid lazy minimalism and incoherent chaos
This is architectural behavior, not prompt limitation
The four-quadrant model remains useful as a conceptual map, even if this particular series doesn't populate all regions. Future studies need deliberately designed prompts targeting underexplored quadrants.
Consequence
New Territory: Organized Refusal of Defaults: VCLI-G enables territory both artists and AI systems default away from. Not "better art" or "more creative AI," but access to compositional states that current optimization prevents.
What Current Metrics Optimize
Quick resolution
Clear hierarchy
Instant recognition
Pleasant aesthetics
Safe patterns
What VCLI-G Makes Measurable
Attention gravity
Perceptual friction
Interpretive delay
Cognitive staying power
Productive ambiguity
Current metrics flatten these distinctions into a single axis of 'quality' or 'preference,' systematically selecting against ambiguity, tension, and recursive complexity—the very properties that make certain images consequential rather than merely pleasant.
Three applications:
1. Artists Gain
Quantified vocabulary for compositional forces
Validation of earned complexity
Diagnosis of structural fatigue
2. Engineers Gain
Framework for navigating latent space by structure
Detection of compositional monoculture
Measurement of spatial priors
3. Collaboration Gains
Shared language for discussing what images DO vs. what they LOOK LIKE
Bridge between intuitive perception and geometric measurement
Tools for refusal without collapse
Scope & Limitations Section
What This Study Is (And Isn't)
Column 1: THIS IS
A geometric measurement system
A single-case demonstration (n=1 subject)
A translation device between art and geometry
Research infrastructure for further study
Proof-of-concept with preliminary validation
Column 2: THIS IS NOT
A validation study (requires perceptual testing)
A claim about universal perceptual response
Evidence of accuracy across all domains
A replacement for aesthetic judgment
A predictor of image quality or preference
Column 3: VALIDATION REQUIRES
Perceptual studies (viewing time data)
Eye-tracking experiments
Subjective effort ratings
Cross-subject testing (diverse motifs)
Statistical validation studies
This demonstrates what VCLI-G measures and how geometric channels respond to compositional choices. It's a worked example and tutorial for interpretation, not proof of universal validity.
Use Cases & Misuse Prevention
Appropriate and Inappropriate Applications
APPROPRIATE USE CASES ✓
Compositional feedback during iterative creation
Comparative analysis across prompt variations
Identifying structural patterns in generated images
Shared vocabulary for artist-engineer collaboration
Latent space navigation by geometric properties
INAPPROPRIATE USE CASES ✗
Ranking image "quality" or aesthetic value
Predicting viewer preferences or engagement metrics
Automated content moderation or filtering
Training data curation without human oversight
Claims about universal perceptual experience
High VCLI-G is not 'better' or 'worse'—it's different. A UI icon should have low cognitive load. A gallery artwork might deliberately maximize it. The metric describes structural state; artistic intent determines value.
Methodology & Transparency
Process breakdown: "How This Was Made"
Images Generated: Via Sora using dialectical prompting
Each prompt designed to embed competing forces
Not geometric coordinates, but narrative physics
"Wind pulls left, balloon pulls right, house leans"
Analysis Conducted: VCLI-G scores via Python implementation
Four geometric channels extracted per image
SCI calculated from regional consistency metrics
Full code available at artistinfluencer.com
Human-AI Collaboration:
Artistic intent and compositional analysis: Human (Russell Parrish)
Mathematical formalization and code: AI-assisted
Statistical interpretation: AI-assisted interpretation
Conceptual framework decisions: Human
Final judgments on what images "do": Human
Practitioner positionality note: This framework emerges from studio practice, not laboratory research. It prioritizes ecological validity (how artists actually work) over experimental control. Academic validation is invited but not required for practitioner utility.
Example Prompts Section
The Prompt Evolution: From Icon to Synthesis
Baseline (A): "A house made out of balloons slightly floating. Make realistic photograph, bright sunlight."
Friction (B): Full prompt with geometry specifications (Δx = 0.18, house tilted 11° clockwise, balloon cluster shifted 0.22 left, etc.)
Peak (D3): Narrative + consequence prompt focusing on weathering, ladder, muted palette, temporal markers
Synthesis (F): Multi-state collapse prompt with four simultaneous failure modes
"Complete prompt library in Appendix B. These aren't creative descriptions—they're dialectical forcing functions designed to bypass generative defaults.
Technical Specifications
The Measurement System
VCLI-G Calculation:
G1 (Centroid Wander): z₁ from path length and curvature
G2 (Void Topology): z₂ from void complexity and aspect ratio
G3 (Curvature Torque): z₃ from curvature variance
G4 (Occlusion Entropy): z₄ from overlap entropy
SCI Calculation:
Regional consistency
Angle alignment
Scale consistency
Rhythm metrics
Output:
VCLI-G: Composite z-score (typically 2.5-4.5 range)
SCI: Coherence index (typically 2.5-3.5 range)
Individual z-scores: Channel-specific signals
Delta span: Range across all channels
Conclusion
Measurement as Infrastructure
VCLI-G doesn't ask "is this good?" It asks "how long does this image hold the mind?" This inversion, treating perceptual demand as measurable rather than accidental, enables new compositional territory. For artists, it's a mirror for consequence. For engineers, it's a map for navigating latent space by structure. For both, it's a shared vocabulary for what images do, not what they look like.
Final thought: The center is the delta. Instability is signal. High cognitive load isn't failure—it's a state some images require to be consequential rather than merely pleasant.
Three takeaways:
Cognitive load can be measured through geometric proxies
Earned complexity is distinguishable from chaotic noise
Generative defaults are quantifiable and steerable