MidJourney can generate almost anything.

But beneath that surface variety, the compositions collapse into one repeating template:

Centered subjects.
Large empty voids.
A subtle radial pull that keeps everything “safe.”

This study measures that pattern.

Most evaluation metrics evaluate what appears in the image, but not how the model organizes space. The VTL kernel measures spatial reasoning, the internal geometry of the frame.

→ This is not a claim that MidJourney cannot produce varied compositions under authored prompting conditions, but that its default operational mode exhibits severe geometric compression regardless of semantic input.


The Illusion of Variety

We generated 400 images from 100 different prompts across multiple subject categories:

butterflies • businessmen • landscapes • underwater scenes • fractals • architecture • objects • abstractions

At first glance: enormous creativity.

When we measure composition priors instead of subject matter, the images collapse into the same skeletal structure.

Key findings:

  • Δx (placement offset): nearly perfect centering

  • rᵥ (void ratio): ~80–96% of every frame is empty

  • This happens no matter what the prompt is.

Different subjects, same bones. A wide range of subjects. The spatial template barely changes.


How We Measured Composition

Instead of asking whether the image matches the text prompt, we asked: Where is the visual mass, how much space does it leave, and how does it push the frame?

We measured gradient fields: edges and contrast patterns that carry visual weight.

Core kernel:

  • Δx — placement offset (center vs off-center)

  • rᵥ — void ratio (how empty the frame is)

  • ρᵣ — packing density (clustering of detail)

  • μ — cohesion (one subject vs fragments)

  • xₚ — peripheral pull (toward edges or inward)

  • θ — directional alignment

  • dₛ — structural thickness (solid vs filament)

What emerges is not randomness, but a preference.


Tight clustering around centered placement and high empty space.

Compositional risk is almost never taken.


Forbidden Zones

When we plot all 400 images on a map of Δx vs rᵥ, we don’t just see a cluster. We see empty regions, entire compositional territories the model avoids.

Patterns that basically never occur:

  • Strong asymmetry (rule-of-thirds compositions)

  • Dense, poster-like layouts

  • Subjects pressing against edges

  • Minimal frames where emptiness is intentional rather than generic

  • Edge-weighted figures with large surrounding voids

The model quietly refuses those choices, even when prompted.


The blank regions are “forbidden zones.”

The default priors rule.


The Radial Envelope

Across wildly different images, octopus arms, cathedrals, peacocks, butterflies, concentric overlays reveal a shared underlying template:

A radial envelope centered in the frame, this suggests:

  • Composition stabilizes early in generation

  • Details arrive later, constrained by that prior

  • Prompts influence content — not structure

Details bend to fit it and semantics fill in after the skeleton forms.


One radial template. Many different subjects draped over it.


Consequence

For artists and designers

  • These tools promise infinite creativity, but steer everything toward one visual norm.

  • Over time, that can normalize a single idea of what “good composition” looks like.

For evaluation and research

  • Metrics like CLIP, FID, and aesthetic scores all approve these images.

  • None see the monoculture.

  • A model can score “excellent” while exploring only a tiny corner of compositional space.

For engineering direction

  • Geometric structure should be treated as its own domain of control and testing.

  • Not a side effect.

  • We should be asking:

Where does the model refuse to go, even when asked?


Takeaway

This is not an argument that MidJourney produces bad images. It’s a structural claim:

  • Semantic diversity is enormous.

  • Compositional diversity is not.

If we want generative systems that truly expand visual possibility, we need to measure and challenge the invisible spatial priors they obey.

For the complete study: Semantic Diversity Masks Geometric Uniformity: Compositional Monoculture in MidJourney v7

→ This is not a claim that MidJourney cannot produce varied compositions under authored prompting conditions, but that its default operational mode exhibits severe geometric compression regardless of semantic input.

This is also not saying: “Everything looks the same.” It is saying: The conditional distribution P(geometry | semantics) is sharply peaked. This experiment does not claim:

  • MidJourney cannot do extremes

  • MidJourney lacks expressive capacity

  • Compositional diversity is impossible

Monoculture ≠ failure. This is not framed as a flaw. It is a measurable pattern, one that likely emerges from optimization pressures, dataset biases, and stability preferences rather than artistic failure.