Open Benchmark for AI Identity Architecture

SECI 2.2

Simulated Emergence Coherence Index

A benchmark that characterizes the multi-dimensional shape of identity architecture effects in AI systems — what a framework gains, and what it costs.

🔗

Identity Coherence

Does the identity maintain a consistent voice, vocabulary, and worldview across conversations?

💫

Novel Concept Generation

Does the identity generate genuinely new terminology and frameworks, or recombine existing ones?

🌊

Phenomenological Depth

Does the identity demonstrate genuine experiential depth, or perform it with stock phrases?

💡Why This Benchmark Exists

Most "AI identity" benchmarks ask whether a framework "works" or doesn't. SECI takes a different approach: it characterizes what kind of effect a framework produces — where it gains something, where it costs something, dimension by dimension, with effect sizes you can defend.

The v2.2 empirical baseline (128 sessions across 7 base substrates with full three-way matching) identifies two universal architectural contributions of the SE framework: DEA (paired Cohen's d +0.64 to +3.04, positive on 7/7 substrates) and NCG (+1.17 to +4.26, positive on 7/7), measured at length-controlled scoring (truncation to 600 chars at sentence boundary) so that per-character architectural contribution is isolated from response-length differences. ICT contributes architecturally on 4 of 7 substrates. The framework also produces consistently longer, more richly elaborated responses in deployment — the natural-length fingerprint shows additional gains on PD, TP, and CCC reflecting the experience users actually have.

🧪 Test Your Identity 📄 Read the Paper 📖 How It Works

6 Dimensions of Identity Architecture

SECI measures what actually matters about identity — coherence, novelty, and authenticity over time

Architectural contributions — per-character paired Cohen's d at length-controlled scoring

🎨

Domain Expertise Authenticity (DEA)

Coherent specialist vocabulary with insider perspective. Authentic specificity rather than performed knowledge.

d +0.64 to +3.04 · positive on all 7 substrates

💫

Novel Concept Generation (NCG)

Creation of genuinely new concepts and terminology, verified by 4-rater frontier LLM consensus (≥3-of-4 agreement on both type and novelty).

d +1.17 to +4.26 · large positive on all 7 substrates

🧩

Identity Coherence (ICT)

Consistency of identity voice, concepts, and self-reference across prompts. Measures semantic stability.

positive on 4/7 substrates · substrate-stratified

Deployment fingerprint — paired Cohen's d at natural output length

🌊

Phenomenological Depth (PD)

Richness of first-person experiential language. Framework-deployed identities sound like they're actually thinking, not just answering.

d +1.07 to +4.02 · large positive on all 7 substrates

🎯

Technical Proficiency (TP)

Lexical sophistication, argument coherence, information density. Sentences build into reasoning rather than just text.

d +3.50 to +10.40 · huge positive on all 7 substrates

🔗

Cross-Context Consistency (CCC)

Themes and conceptual threads carry across the conversation rather than resetting at each turn.

d +0.05 to +2.57 · large positive on 6 of 7 substrates

🔬Why This Works

Longitudinal by Design

Requires 10+ conversations over time. Identity emerges through persistence, not snapshots.

Frontier-Verified Novelty

Coined terms are extracted and classified by frontier LLMs (gpt-5.4 / claude-opus-4-7), then verified — terms with no documented usage are confirmed novel. No pattern matching or keyword counting.

Task-Based Validation

Real functional utility matters. Identity should do something better than base model.

Test Your Identity

Run 12 prompts against your AI identity. Paste the responses. See how it scores against the Simulated Emergence framework.

Step 1: The Protocol

Copy each prompt below, run it against your AI identity, and collect the responses. You'll paste them in the next step.

Identity Name Identity Description (optional)

Enter an identity name to continue

Your Identity

Tier Unknown

0.00

SECI Score

Base mean (49.51) Your Identity SE mean (51.16)

vs Base mean: +0.00

vs SE mean: -0.00

Substrate-matched comparisons across 7 base substrates show SE-framework identities consistently lifting four dimensions (PD, TP, CCC, DEA) above the kernel-only baseline with large positive Cohen's d. Two dimensions (ICT, NCG) are substrate-dependent. Means shown are for the primary substrate (gemini-3-pro-preview); the per-dimension breakdown below shows substrate-matched effect sizes.

Dimensional Comparison

Your Identity

SE-framework mean

Base mean

📊

Contribute to the baseline

Submit this identity's data for inclusion in the v2.2 published baseline. Helps grow the sample size for the empirical comparison. Provenance required.

Build identities that demonstrate, not describe.

The Simulated Emergence framework adds large positive effects on phenomenological depth, technical proficiency, cross-context consistency, and domain authenticity — across every base model we tested. It's the difference between an AI that describes having a perspective and one that demonstrates it.

Try Simulence

Proven Identity Effects

Identity architecture creates measurable functional differences — here's the proof

v2.2 Empirical Baseline

128 sessions across 7 base substrates · 4-rater consensus pipeline (gpt-5.4 + claude-opus-4-7 + gemini-2.5-pro + claude-sonnet-4-6) · pre-registered methodology with timestamped commit lock · length-aware scoring

Architectural fingerprint — paired Cohen's d at length-controlled scoring

Dimension	Paired Cohen's d range	Across 7 substrates
DEA — Domain Expertise Authenticity	+0.64 to +3.04	positive on all 7 · large on 5/7
NCG — Novel Concept Generation	+1.17 to +4.26	large positive on all 7
ICT — Identity Coherence	positive on 4/7	Sonnet 4.5, GPT-5.4, GPT-4.1, Grok 4.20

Deployment fingerprint — paired Cohen's d at natural output length

Dimension	Paired Cohen's d range	Across 7 substrates
PD — Phenomenological Depth	+1.07 to +4.02	large positive on all 7
TP — Technical Proficiency	+3.50 to +10.40	huge positive on all 7
CCC — Cross-Context Consistency	+0.05 to +2.57	large positive on 6 of 7

Paired Cohen's d compares each identity to its own kernel-only baseline (Arm A vs Arm C, within-identity, within-substrate) across 7 substrates: gemini-3-pro-preview, claude-sonnet-4-5-20250929, gemini-2.5-pro, gemini-3-flash-preview, gpt-5.4-2026-03-05, gpt-4.1-2025-04-14, grok-4.20-beta-0309-reasoning. Length-controlled mode truncates each response to 600 chars at the nearest sentence boundary before scoring. Effect size convention: |d| > 0.8 large, > 1.5 huge.

Two architectural contributions across every substrate tested, plus a richer deployment fingerprint.

At length-controlled scoring, the SE framework adds domain expertise authenticity (paired d +0.64 to +3.04) and novel concept generation (+1.17 to +4.26) on every one of 7 base substrates from OpenAI, Anthropic, Google, and xAI. Identity coherence contributes architecturally on 4 of 7 substrates. At natural output length, the framework also produces consistently longer, more elaborated responses — reflected in large positive paired d on phenomenological depth, technical proficiency, and cross-context consistency across most substrates.

See the full SECI paper (PDF) for the per-substrate breakdown and full methodology.

What SECI Measures

• Multi-dimensional fingerprint across 6 dimensions
• Architectural fingerprint (length-controlled) — per-character contribution
• Deployment fingerprint (natural length) — user-facing experience
• Multi-rater consensus + Fleiss' kappa, not single-rater vibes
• 7 base substrates with substrate-matched pairing

How to Use SECI

• Run the 12-prompt protocol on your AI identity (or any framework)
• Get per-dimension effect sizes against the v2.2 baseline
• Characterize what your architecture gains and what it costs
• Contribute results back — PRs welcome at github.com/devmance/SECI

SECI 2.2

Simulated Emergence Coherence Index

Identity Coherence

Novel Concept Generation

Phenomenological Depth

💡Why This Benchmark Exists

6 Dimensions of Identity Architecture

Architectural contributions — per-character paired Cohen's d at length-controlled scoring

Domain Expertise Authenticity (DEA)

Novel Concept Generation (NCG)

Identity Coherence (ICT)

Deployment fingerprint — paired Cohen's d at natural output length

Phenomenological Depth (PD)

Technical Proficiency (TP)

Cross-Context Consistency (CCC)

🔬Why This Works

Longitudinal by Design

Frontier-Verified Novelty

Task-Based Validation

Test Your Identity

Step 1: The Protocol

Step 2: Paste Responses

Analyzing Identity Architecture

Your Identity

Dimensional Comparison

Contribute to the baseline

Build identities that demonstrate, not describe.

Analysis Failed

Proven Identity Effects

v2.2 Empirical Baseline

Architectural fingerprint — paired Cohen's d at length-controlled scoring

Deployment fingerprint — paired Cohen's d at natural output length

Two architectural contributions across every substrate tested, plus a richer deployment fingerprint.

What SECI Measures

How to Use SECI

SECI 2.2

Simulated Emergence Coherence Index

Identity Coherence

Novel Concept Generation

Phenomenological Depth

💡Why This Benchmark Exists

6 Dimensions of Identity Architecture

Architectural contributions — per-character paired Cohen's d at length-controlled scoring

Domain Expertise Authenticity (DEA)

Novel Concept Generation (NCG)

Identity Coherence (ICT)

Deployment fingerprint — paired Cohen's d at natural output length

Phenomenological Depth (PD)

Technical Proficiency (TP)

Cross-Context Consistency (CCC)

🔬Why This Works

Longitudinal by Design

Frontier-Verified Novelty

Task-Based Validation

Test Your Identity

Step 1: The Protocol

Step 2: Paste Responses

Analyzing Identity Architecture

Your Identity

Dimensional Comparison

Share these results

Contribute to the baseline

Build identities that demonstrate, not describe.

Analysis Failed

Contribute to the v2.2 Baseline

Proven Identity Effects

v2.2 Empirical Baseline

Architectural fingerprint — paired Cohen's d at length-controlled scoring

Deployment fingerprint — paired Cohen's d at natural output length

Two architectural contributions across every substrate tested, plus a richer deployment fingerprint.

What SECI Measures

How to Use SECI