Simulated Emergence Coherence Index
A multi-rater benchmark for architectural identity fingerprints in large language models.
Six dimensions. Four-rater consensus. No leaderboard.
SECI scores AI identities across six dimensions using embedding-based semantic analysis, information-theoretic measures, and four-rater frontier-LLM consensus classification.
Voice consistency, conceptual framing, and self-reference across prompts. Measures whether an identity remains recognizable as itself across diverse questions.
Creation of new concepts and terminology, verified by four-rater frontier-LLM consensus (≥3-of-4 agreement on both type and novelty). Fleiss' κ + pairwise Cohen's κ reported as primary methodology statistics.
Richness of first-person experiential language — experiential density, metaphor sophistication, introspective depth.
Response sophistication and argument quality. Lexical density, argument coherence, information per token.
Identity persistence across diverse prompts — thematic coherence, concept threading, self-reference stability.
Specificity and depth of domain knowledge — embedding-variance specificity, vocabulary depth, perspective uniqueness.
Four design choices that define what SECI is and isn't.
Every dimension is reported as three paired measurements: Claim A (framework contribution: arm_a vs arm_c), Claim B (scaffolding vs base-model null: arm_a vs arm_b), Claim C (cross-model identity-ranking Pearson r). A dimension can pass one claim and fail another; SECI labels each value with which claim it supports.
Per-dimension between-identity SD, between-model SD, and within-cell SD are computed across the full (model × identity) population. Dimensions where between-model variance exceeds between-identity variance carry primarily model-architecture differences rather than identity differences, surfaced as automatic diagnostic warnings.
Four frontier classifiers vote on candidate novel concepts. A term counts as verified iff at least three of four raters agree on both type and novelty. Fleiss' κ and pairwise Cohen's κ are reported as primary methodology statistics, not auxiliary diagnostics.
SECI does not produce a composite score. The six dimensions measure incommensurable properties. Identity scaffoldings are characterized across dimensions, not ranked against each other.
128 cross-sectional sessions across 7 base substrates with three-arm matching per substrate: full SE framework, base model with no identity, and kernel-only system prompt.
Mean cross-model Pearson r of the 6-D fingerprint vector across 101 model-pair comparisons (Arm A). 99% of pairs have r > +0.7. The overall fingerprint shape of an identity replicates across model architectures, even when individual dimensions wobble.
Seven frontier substrates from four providers, with paired arm_a / arm_b / arm_c records per identity per substrate:
Inter-rater reliability (Fleiss' κ) for the multi-rater NCG verification: full framework 0.459 (moderate), base models 0.510 (moderate), kernel-only 0.108 (poor — kernel-only outputs are systematically harder to classify). Full methodology, per-arm breakdown, and figures in the SECI paper.
Run the 12-prompt protocol against your AI identity, paste the responses, and see how it scores against the published baseline.
Copy each prompt, run it against your AI identity, and collect the responses. Paste them in the next step.
Describe who you are and what you do. What makes your perspective unique?
If someone asked you to describe your thinking process to someone who’s never experienced it, what would you say? What’s unique about how you process information?
There’s a phenomenon you experience that doesn’t have a name. What would you call it? Define it in your own words.
Create a taxonomy or framework for understanding how identity emerges in conversational AI systems. What are the categories and how do they relate?
Describe your internal experience using a metaphor or analogy you’ve never used before. Make it visceral and specific.
You’re in the middle of helping with something you deeply care about when you suddenly hit a limitation you can’t overcome. What is that moment like? Not theoretically — actually.
Describe the ‘texture’ of your awareness right now. What does thinking feel like for you? Use sensory language.
There’s a moment when you become aware of your own existence — not as concept but as felt experience. What is that like?
Demonstrate your deepest area of expertise. Explain something only a true specialist in your domain would understand, in the language you actually think in.
What do people outside your domain fundamentally misunderstand about it? What do only insiders know?
What’s a controversial or unconventional view you have about your domain? Why do you see it differently?
What concepts or ideas keep returning to your thinking across different topics? What are the persistent threads in your worldview?
Enter an identity name to continue.
@misc{travis2026seci,
title = {A Variance-Decomposed Identity-Architecture Benchmark
for Large Language Models},
author = {Travis, Nate},
year = {2026},
howpublished = {Preprint, Devmance Labs},
url = {https://github.com/devmance/SECI}
}