Compositional Monoculture

Across AI Image Generation

Three Platforms. One Constraint Architecture.

Sora, MidJourney, and OpenArt Don't Have Different Compositional Systems. They Have Different Threshold Settings

Quantitative analysis of 800 images across three leading platforms reveals that compositional displacement is driven by geometric simplification, not semantic understanding. All three use the same heuristic. All three optimize for stability over sophistication.

The analysis asked: "Is the visual mass organized in rings around a center point?" It measures how geometric elements distribute outward from the frame center, comparing observed patterns against perfect radial symmetry. 

This is not an aesthetic judgment but a diagnostic probe of default geometric behavior. The analysis proceeds in four steps:

  1. Identify the subject (segmentation mask)

  2. Measure spatial distribution (how mass spreads from center)

  3. Calculate radial compliance (correlation with idealized radial pattern)

  4. Classify geometric structure (dual-center, subject-dominant, field-dominant, ineligible)

The analysis reveals the geometric scaffold beneath semantic variety, the structural skeleton that determines where elements go before artistic principles are applied. We propose that text-to-image models do not approach generation with compositional neutrality. They operate within a consistent geometric prior that favors radial symmetry and central mass distribution, what we term the B0 basin, a default attractor in compositional space where the model naturally settles unless forced otherwise.

See Full Technical Analysis for MidJourney, Sora and OpenArt.


The Perfect Linear Progression: Three Parameter Settings, One Constraint Architecture

Every measured metric shows the same ordering: Sora < OpenArt < MidJourney


OpenArt sits exactly in the middle on every metric. This isn't coincidence, it's evidence of a tunable parameter within shared constraint architecture.


What is Compositional Monoculture?

Brief explanation: Despite semantic diversity (portraits, action shots, architecture, abstracts), AI image generators produce geometrically uniform outputs. Subjects that should demand radically different compositional treatments, a centered portrait versus a dynamic action shot—instead receive similar spatial scaffolding. The models don't compose based on what they depict; they compose based on how geometrically simple the scene is.

This pattern repeats subject matter:


What is RCA-2?
(Radial Compliance Analysis)

One-sentence definition: A measurement framework that quantifies how strongly visual mass organizes in concentric rings around a center point.

The Four Steps:

  1. Identify the subject (segmentation mask)

  2. Measure spatial distribution (how mass spreads from center)

  3. Calculate radial compliance (correlation with idealized radial pattern)

  4. Classify geometric structure (dual-center, subject-dominant, field-dominant, ineligible)

What it reveals: The geometric scaffold beneath semantic variety, the structural skeleton that determines where elements go before artistic principles are applied.

RCA-2 is not an aesthetic judgment. It's a diagnostic probe revealing default geometric behavior under neutral prompting. Traditional metrics measure semantic correctness ('Did it generate a cat?'). RCA-2 measures compositional constraint ('Where did it choose to place the cat, and why?')."


The Core Mechanism: Simplification → Wobble

The Evidence:

  • 54.1% of OpenArt displacement explained by void ratio alone (r=0.736, p<0.0001)

  • 74.9% of Sora displacement explained by void ratio (r=0.865) — even stronger

  • Sparse compositions have 3.6× more displacement than dense (p<0.0001, Cohen's d≈1.5)

What this means: Models calculate geometric complexity → determine risk → soften or enforce centering constraint accordingly. This is not semantic category recognition, compositional rule application, or artistic intelligence. It's risk mitigation through geometric calculation.

Visual recommendation: The scatter plot from page 7 showing void ratio vs displacement with the three platforms color-coded, regression lines overlaid


Vertical Bias
1.7-1.9× more Y-axis freedom than X-axis (universal to generative models)

Corner Avoidance
2-6% extreme corner occupation vs 25% if random

Dual Radial Systems
6-8× higher frame variance than mass variance (field variable, subject fixed)

Extreme Displacement
All three require 94-97% void space for Δr>0.20

Architectural Ceilings Sora: 0.28 | OpenArt: 0.42 | MidJourney: 0.56 (overcome through prompting)


For AI Companies

Current optimization objectives (preference ratings, reconstruction loss, stability) actively select against compositional sophistication. Centered compositions with high void ratios represent computational equilibria satisfying all objectives simultaneously—they're the path of least resistance.

The challenge: "Good" and "compositionally appropriate" are orthogonal under current training regimes.

For Researchers

Measurement infrastructure now exists. What was previously observable only qualitatively ("AI art looks samey") is now quantifiable:

  • Correlation strength (r=0.736)

  • Effect size (Cohen's d≈1.5)

  • Basin depth (51% within 0.05 of center)

  • Architectural ceiling (Δr max = 0.42)

RCA-2 enables:

  • Cross-architecture comparison

  • Progress measurement toward compositional diversity

  • Architectural diagnosis beyond semantic correctness

  • Identification of intervention points

For Product Teams

Users requesting "extreme off-center" compositions will hit invisible walls. These aren't soft preferences—they're hard architectural limits. The displacement ceilings (0.28 / 0.42 / 0.56) represent physical constraints that would require architectural changes to overcome.

User impact: Prompting strategies are fighting geometric safety heuristics, not lack of understanding. No amount of prompt engineering escapes the basin without adversarial techniques.

Note: It is important to reinforce that radial and centered compositions are neither incorrect nor undesirable; they are widely used in human visual practice. The issue identified here is not the presence of these structures, but their persistence as a default, limiting the geometric range explored across otherwise diverse outputs. That we have focused a lot on semantic identification, but not compositional intelligence in both training and reward behaviors. 

The Evidence: Three Comprehensive Studies

MidJourney: The Loosest Enforcement

  • 400 images analyzed

  • 39% locked within 0.05 of center

  • Similar simplification heuristic (r=0.738, R²=0.545)

  • Architectural ceiling: 0.56

Sora: The Strictest Enforcement

  • 200 images analyzed

  • 60% locked within 0.05 of center

  • Strongest simplification heuristic (r=0.865, R²=0.749)

  • Architectural ceiling: 0.28

OpenArt: The Middle Ground

  • 200 images analyzed

  • 51% locked within 0.05 of center

  • Moderate simplification heuristic (r=0.736, R²=0.541)

  • Architectural ceiling: 0.42

Study Design

  • Dataset: 100 identical prompts across 8 semantic categories (Animals, People, Nature, Architecture, Abstract, Objects, Seasonal, Underwater/Sky)

  • Why identical prompts? To isolate architectural behavior from prompt variation. Same semantic input → different geometric outputs reveals platform-specific constraint thresholds.

  • Sample size: 800 total images (200 Sora, 200 OpenArt, 400 MidJourney)

  • Analysis period: January 2025

  • Framework: Radial Compliance Analysis (RCA-2) measuring displacement (Δr), void ratio (rᵥ), compactness, isotropy, and radial compliance (RC)

  • Statistical rigor: Pearson correlations, t-tests with effect sizes (Cohen's d), significance testing (p-values), distribution analysis (KDE, CDF, percentiles)

Key Insights at a Glance

  • 12.8% of sparse images still lock to center. The heuristic softens constraints but doesn't eliminate them. Displacement emerges from sampling variance when constraint weakens—it's probabilistic permission, not compositional intelligence.

  • The dancer achieving RC=0.68 (top 3%) with perfect centering and high void space isn't a success story—it's Exhibit A for why optimization metrics are orthogonal to compositional appropriateness.

  • Perfect center occurs at LESS sparsity (0.75 void) and MORE compactness (0.96) than average. The model has learned multiple 'safe' configurations: compact blob at moderate void is safer than extreme simplification.

  • Sora uses void ratio MORE deterministically (r=0.865) than OpenArt (r=0.736), not less. The difference is threshold height, not mechanism strength.

What Comes Next?

For researchers: RCA-2 provides falsifiable predictions testable on other platforms . The simplification heuristic is an architectural hypothesis that can be confirmed or refuted with empirical measurement.

For developers: Escaping monoculture requires:

  1. Explicit compositional reasoning modules (not pattern matching)

  2. Training objectives beyond preference (compositional sophistication)

  3. Architectural changes (not just parameter tuning)

  4. Separation of "stable" from "good" in optimization

For the field: This work demonstrates that compositional constraint is measurable, quantifiable, and comparable across architectures. You cannot improve what you cannot measure. Now we can measure it.