Compositional Monoculture

Across AI Image Generation

Three Platforms. One Constraint Architecture.

A collage with landscapes, elderly men, women reading, and assorted bowls of colorful fruit including berries, oranges, and kiwis.

Sora, MidJourney, and OpenArt Don't Have Different Compositional Systems. They Have Different Threshold Settings

Quantitative analysis of 800 images across three leading platforms reveals that compositional displacement is driven by geometric simplification, not semantic understanding. All three use the same heuristic. All three optimize for stability over sophistication.

The analysis asked: "Is the visual mass organized in rings around a center point?" It measures how geometric elements distribute outward from the frame center, comparing observed patterns against perfect radial symmetry.

This is not an aesthetic judgment but a diagnostic probe of default geometric behavior. The analysis proceeds in four steps:

Identify the subject (segmentation mask)
Measure spatial distribution (how mass spreads from center)
Calculate radial compliance (correlation with idealized radial pattern)
Classify geometric structure (dual-center, subject-dominant, field-dominant, ineligible)

The analysis reveals the geometric scaffold beneath semantic variety, the structural skeleton that determines where elements go before artistic principles are applied. We propose that text-to-image models do not approach generation with compositional neutrality. They operate within a consistent geometric prior that favors radial symmetry and central mass distribution, what we term the B0 basin, a default attractor in compositional space where the model naturally settles unless forced otherwise.

See Full Technical Analysis for MidJourney, Sora and OpenArt.

The Perfect Linear Progression: Three Parameter Settings, One Constraint Architecture

Every measured metric shows the same ordering: Sora < OpenArt < MidJourney

Graph illustrating radial displacement across platforms Sora, Open Art, and MidJourney. The graph shows mean displacement with yellow squares and compares it to a displacement threshold of 0.15. Open Art has a median displacement of 0.0721, with a 51% tight center, while MidJourney's displacement is 0.0945, with only 21.5% achieving this displacement. The graph emphasizes that all clusters are near the center with different severity levels.

Chart titled 'Extreme Displacement Recipe' showing the count of images with significant off-center placement, comparing Sora, Open Art, and MidJourney. Sora has 8 images (3.8%), Open Art 14 images (7%), and MidJourney 44 images (11%). The chart also displays mean void ratios for extreme cases with yellow circles.

Bar chart titled 'Simplification Strategy' showing the percentage of frame unoccupied by a subject, called Void Ratio. The chart compares three methods: Sora, Open Art, and MidJourney, with error bars showing ±1 standard deviation. Notes indicate 'Empty = dense compositions, tightest centering' for Sora, and 'Empty space enables more displacement freedom' for MidJourney. Sora has an average void ratio around 0.82, Open Art around 0.84, and MidJourney around 0.86, with MidJourney showing the highest mean void ratio.

OpenArt sits exactly in the middle on every metric. This isn't coincidence, it's evidence of a tunable parameter within shared constraint architecture.

What is Compositional Monoculture?

Brief explanation: Despite semantic diversity (portraits, action shots, architecture, abstracts), AI image generators produce geometrically uniform outputs. Subjects that should demand radically different compositional treatments, a centered portrait versus a dynamic action shot—instead receive similar spatial scaffolding. The models don't compose based on what they depict; they compose based on how geometrically simple the scene is.

Comparison of a monarch butterfly illustration and photograph, with overlay of radial patterns.

This pattern repeats subject matter:

Comparison of dance staging techniques with three sections: a sunset outdoor dance formation labeled 'Logic repeated' showing central organization, distribution, and lift; a ballet performance on stage demonstrating identical spatial scaffolding; and an indoor dance practice with dynamic framing. Each section illustrates different staging methods.

What is RCA-2?
(Radial Compliance Analysis)

One-sentence definition: A measurement framework that quantifies how strongly visual mass organizes in concentric rings around a center point.

The Four Steps:

Identify the subject (segmentation mask)
Measure spatial distribution (how mass spreads from center)
Calculate radial compliance (correlation with idealized radial pattern)
Classify geometric structure (dual-center, subject-dominant, field-dominant, ineligible)

What it reveals: The geometric scaffold beneath semantic variety, the structural skeleton that determines where elements go before artistic principles are applied.

RCA-2 is not an aesthetic judgment. It's a diagnostic probe revealing default geometric behavior under neutral prompting. Traditional metrics measure semantic correctness ('Did it generate a cat?'). RCA-2 measures compositional constraint ('Where did it choose to place the cat, and why?')."

A monarch butterfly perched on a pink flower. The image includes overlay diagrams showing how to identify and segment the butterfly and flower using different image processing techniques such as ring overlays, grid overlays, and edge detection.

Diagram titled "The Heuristic" explaining decision-making thresholds using shapes: a large orange circle labeled "Strong Center" for high geometric complexity threshold, a sailboat icon in the center labeled "Soft Center" for low threshold; and four orange squares with varied sizes on the right labeled "Moderate Center" for balanced safety and variety.

The Core Mechanism: Simplification → Wobble

The Evidence:

54.1% of OpenArt displacement explained by void ratio alone (r=0.736, p<0.0001)
74.9% of Sora displacement explained by void ratio (r=0.865) — even stronger
Sparse compositions have 3.6× more displacement than dense (p<0.0001, Cohen's d≈1.5)

What this means: Models calculate geometric complexity → determine risk → soften or enforce centering constraint accordingly. This is not semantic category recognition, compositional rule application, or artistic intelligence. It's risk mitigation through geometric calculation.

Visual recommendation: The scatter plot from page 7 showing void ratio vs displacement with the three platforms color-coded, regression lines overlaid

Vertical Bias
1.7-1.9× more Y-axis freedom than X-axis (universal to generative models)

Corner Avoidance
2-6% extreme corner occupation vs 25% if random

Dual Radial Systems
6-8× higher frame variance than mass variance (field variable, subject fixed)

Extreme Displacement
All three require 94-97% void space for Δr>0.20

A 4x4 grid with empty white squares and red gradient borders.

Grid with four arrows pointing up, down, left, and right.

A grid of 16 squares with a gradient red color, with a gray circle overlapping in the middle right, on a white background.

Two circles, one red and one white, on a grid background, with the red circle overlapping the white circle.

A Tic-Tac-Toe game board with a gray circle occupying the top right square.

Architectural Ceilings Sora: 0.28 | OpenArt: 0.42 | MidJourney: 0.56 (overcome through prompting)

For AI Companies

Current optimization objectives (preference ratings, reconstruction loss, stability) actively select against compositional sophistication. Centered compositions with high void ratios represent computational equilibria satisfying all objectives simultaneously—they're the path of least resistance.

The challenge: "Good" and "compositionally appropriate" are orthogonal under current training regimes.

For Researchers

Measurement infrastructure now exists. What was previously observable only qualitatively ("AI art looks samey") is now quantifiable:

Correlation strength (r=0.736)
Effect size (Cohen's d≈1.5)
Basin depth (51% within 0.05 of center)
Architectural ceiling (Δr max = 0.42)

RCA-2 enables:

Cross-architecture comparison
Progress measurement toward compositional diversity
Architectural diagnosis beyond semantic correctness
Identification of intervention points

For Product Teams

Users requesting "extreme off-center" compositions will hit invisible walls. These aren't soft preferences—they're hard architectural limits. The displacement ceilings (0.28 / 0.42 / 0.56) represent physical constraints that would require architectural changes to overcome.

User impact: Prompting strategies are fighting geometric safety heuristics, not lack of understanding. No amount of prompt engineering escapes the basin without adversarial techniques.

Graph showing the relationship between Void Ratio and Radial Displacement, with thresholds and color-coded data points indicating different levels of risk of failure.

Note: It is important to reinforce that radial and centered compositions are neither incorrect nor undesirable; they are widely used in human visual practice. The issue identified here is not the presence of these structures, but their persistence as a default, limiting the geometric range explored across otherwise diverse outputs. That we have focused a lot on semantic identification, but not compositional intelligence in both training and reward behaviors.

The Evidence: Three Comprehensive Studies

MidJourney: The Loosest Enforcement

400 images analyzed
39% locked within 0.05 of center
Similar simplification heuristic (r=0.738, R²=0.545)
Architectural ceiling: 0.56

Sora: The Strictest Enforcement

200 images analyzed
60% locked within 0.05 of center
Strongest simplification heuristic (r=0.865, R²=0.749)
Architectural ceiling: 0.28

OpenArt: The Middle Ground

200 images analyzed
51% locked within 0.05 of center
Moderate simplification heuristic (r=0.736, R²=0.541)
Architectural ceiling: 0.42

Study Design

Dataset: 100 identical prompts across 8 semantic categories (Animals, People, Nature, Architecture, Abstract, Objects, Seasonal, Underwater/Sky)
Why identical prompts? To isolate architectural behavior from prompt variation. Same semantic input → different geometric outputs reveals platform-specific constraint thresholds.
Sample size: 800 total images (200 Sora, 200 OpenArt, 400 MidJourney)
Analysis period: January 2025
Framework: Radial Compliance Analysis (RCA-2) measuring displacement (Δr), void ratio (rᵥ), compactness, isotropy, and radial compliance (RC)
Statistical rigor: Pearson correlations, t-tests with effect sizes (Cohen's d), significance testing (p-values), distribution analysis (KDE, CDF, percentiles)

Key Insights at a Glance

12.8% of sparse images still lock to center. The heuristic softens constraints but doesn't eliminate them. Displacement emerges from sampling variance when constraint weakens—it's probabilistic permission, not compositional intelligence.
The dancer achieving RC=0.68 (top 3%) with perfect centering and high void space isn't a success story—it's Exhibit A for why optimization metrics are orthogonal to compositional appropriateness.
Perfect center occurs at LESS sparsity (0.75 void) and MORE compactness (0.96) than average. The model has learned multiple 'safe' configurations: compact blob at moderate void is safer than extreme simplification.
Sora uses void ratio MORE deterministically (r=0.865) than OpenArt (r=0.736), not less. The difference is threshold height, not mechanism strength.

What Comes Next?

For researchers: RCA-2 provides falsifiable predictions testable on other platforms . The simplification heuristic is an architectural hypothesis that can be confirmed or refuted with empirical measurement.

For developers: Escaping monoculture requires:

Explicit compositional reasoning modules (not pattern matching)
Training objectives beyond preference (compositional sophistication)
Architectural changes (not just parameter tuning)
Separation of "stable" from "good" in optimization

For the field: This work demonstrates that compositional constraint is measurable, quantifiable, and comparable across architectures. You cannot improve what you cannot measure. Now we can measure it.