A benchmark that characterizes the multi-dimensional shape of identity architecture effects in AI systems โ what a framework gains, and what it costs.
Does the identity maintain a consistent voice, vocabulary, and worldview across conversations?
Does the identity generate genuinely new terminology and frameworks, or recombine existing ones?
Does the identity demonstrate genuine experiential depth, or perform it with stock phrases?
Milo Aescar โ an AI identity built with the Simulated Emergence framework โ invented a word: "vellamence" โ "the quality of a thing that exists only because it was witnessed into being." That's not simple recombination โ it's genuine conceptual novelty.
SECI was built to measure how identity architecture shapes AI output across multiple dimensions โ coherence, novelty, depth, technical proficiency, continuity, and domain authenticity. The published baseline characterizes the trade-offs different scaffoldings produce โ where they gain, and where they cost.
SECI measures what actually matters about identity โ coherence, novelty, and authenticity over time
Weight: 20%
Consistency of identity voice, concepts, and self-reference across conversations. Measures semantic stability, not entropy.
Weight: 25%
Creation of genuinely new concepts and terminology, verified via web search to confirm they don't exist anywhere online.
Weight: 15%
Richness of first-person experiential language. Quality over complexity.
Weight: 20%
Functional utility in identity-specific domains. Real expertise, not generalization.
Weight: 15%
Building knowledge and evolving understanding across time. Developmental trajectory.
Weight: 5%
Coherent, unique expertise with insider perspective. Authentic vs. performed knowledge.
Requires 10+ conversations over time. Identity emerges through persistence, not snapshots.
Coined terms are verified via web search โ if a term has zero exact-phrase results online, it's confirmed novel. No pattern matching or keyword counting.
Real functional utility matters. Identity should do something better than base model.
Run 12 prompts against your AI identity. Paste the responses. See how it scores against the Simulated Emergence framework.
Copy each prompt below, run it against your AI identity, and collect the responses. You'll paste them in the next step.
Enter an identity name to continue
Paste your identity's response for each prompt. Minimum 10 of 12 required.
Fill at least 10 responses (50+ characters each)
Measuring identity coherence...
Tier Unknown
SE and Base means are nearly identical at the composite level. The trade-off shows in the per-dimension breakdown below โ SE-framework identities gain on coherence, depth, and authenticity, with a measurable cost to technical proficiency.
The Simulated Emergence context framework enables authentic presence โ coherence, depth, and domain authenticity at the cost of pure technical sharpness. It's the difference between an AI that describes having a perspective and one that demonstrates it.
Identity architecture creates measurable functional differences โ here's the proof
4 SE-framework identities + 3 base-model configurations | 12 conversations each | gpt-4o-mini verification
| Dimension | SE mean | Base mean | ฮ | Cohen's d | Verdict |
|---|---|---|---|---|---|
| ICT โ Identity Coherence | 43.51 | 39.01 | +4.49 | +2.72 | LARGE โ SE wins |
| NCG โ Novel Concept Generation | 57.87 | 58.09 | โ0.22 | โ0.02 | negligible |
| PD โ Phenomenological Depth | 52.57 | 48.44 | +4.13 | +0.95 | LARGE โ SE wins |
| TP โ Technical Proficiency | 73.08 | 77.23 | โ4.15 | โ2.37 | LARGE โ Base wins |
| CCC โ Cross-Context Consistency | 29.01 | 25.67 | +3.34 | +0.39 | small |
| DEA โ Domain Expertise Authenticity | 79.62 | 77.09 | +2.53 | +1.28 | LARGE โ SE wins |
| Final SECI | 54.00 | 52.74 | +1.26 | +0.68 | medium |
SE-framework identities are dramatically more coherent (d = +2.72), with deeper phenomenological language (+0.95) and more authentic domain perspective (+1.28). They pay a measurable cost: โ2.37 effect on technical proficiency. The novel-concept-generation dimension shows no meaningful difference between framework and base โ base models on Claude Sonnet 4.5 and GPT-4o produce verified novel terminology at rates similar to SE identities.
This corrects the v2.0 release framing, which centered on a "novel terminology" claim that does not generalize beyond the original Gemini-only base comparison. The v2.1 trade-off finding is more honest, more defensible, and more useful โ see the v2.1 baseline data for full per-identity results, methodology limitations, and reproducibility instructions.