╔══════════════════════════════════════════════════════════════════════════════════════╗
║ IRSANAI UNIVERSE ║
║ SPEC-FIRST TECHNICAL HARDENING DOCUMENT ║
║ Noise→Signal Transformation | Resonance by Construction ║
╚══════════════════════════════════════════════════════════════════════════════════════╝
This document formalizes the IrsanAI protocol stack for technical implementation:
The design principle is spec-first: behavior is defined by schemas and rules before runtime code.
Natural-language prompts are noisy, under-specified, and style-dependent. NTF maps this variability into a canonical semantic representation while preserving intent invariants.
{
"ntf_version": "1.0.0",
"task_id": "uuid-v4",
"source": {
"raw_input": "<original user message>",
"language": "en",
"timestamp_utc": "2026-02-14T10:30:00Z"
},
"semantic_core": {
"intent": "<single sentence imperative objective>",
"constraints": ["<hard constraints>"],
"preferences": ["<soft constraints>"],
"deliverables": ["<required artifacts>"],
"acceptance_criteria": ["<success conditions>"]
},
"disambiguation": {
"resolved_entities": [
{
"surface": "LRP",
"canonical": "Logic Resonance Protocol",
"confidence": 0.99
}
],
"open_questions": ["<if unresolved ambiguity remains>"]
},
"quality": {
"semantic_fidelity_estimate": 0.0,
"normalization_confidence": 0.0
},
"trace": {
"normalization_steps": ["token_normalize", "constraint_extract", "intent_compress"],
"evidence_links": ["line-span or source reference"]
}
}
(raw_prompt, canonical_intent, paraphrase_set).sim = dot(a, b) / (||a|| * ||b||)mean(sim) >= 0.98.Judge prompt asks whether generated canonical object preserves:
Outputs strict JSON scorecard with pass/fail + rationale.
NTF_score = 0.7 * cosine_score + 0.3 * judge_score
Release gate:
NTF_score >= 0.98eval/ntf_bench.py Concept"""
Conceptual benchmark harness for NTF fidelity.
Usage:
python eval/ntf_bench.py --dataset eval/data/ntf_test.jsonl --mode cosine
python eval/ntf_bench.py --dataset eval/data/ntf_test.jsonl --mode judge
"""
import argparse
import json
from dataclasses import dataclass
from typing import List, Dict
@dataclass
class Record:
raw_prompt: str
gold_intent: str
system_intent: str
def cosine(a, b):
num = sum(x * y for x, y in zip(a, b))
den_a = sum(x * x for x in a) ** 0.5
den_b = sum(y * y for y in b) ** 0.5
return num / (den_a * den_b + 1e-12)
def embed(text: str) -> List[float]:
# placeholder for embedding provider call
raise NotImplementedError
def judge(gold: str, pred: str) -> float:
# placeholder for LLM judge call; returns [0, 1]
raise NotImplementedError
def run(records: List[Record], mode: str) -> Dict[str, float]:
sims = []
for r in records:
if mode == "cosine":
sims.append(cosine(embed(r.gold_intent), embed(r.system_intent)))
else:
sims.append(judge(r.gold_intent, r.system_intent))
mean_score = sum(sims) / max(len(sims), 1)
return {"count": len(sims), "mean_score": mean_score}
def main():
p = argparse.ArgumentParser()
p.add_argument("--dataset", required=True)
p.add_argument("--mode", choices=["cosine", "judge"], default="cosine")
args = p.parse_args()
rows = []
with open(args.dataset, "r", encoding="utf-8") as f:
for line in f:
d = json.loads(line)
rows.append(Record(**d))
report = run(rows, args.mode)
print(json.dumps(report, indent=2))
if __name__ == "__main__":
main()
Enable agent-to-agent exchange that is compact, auditable, and resistant to hallucinated reasoning.
{
"lrp_version": "1.0.0",
"message_id": "uuid-v4",
"conversation_id": "uuid-v4",
"sender": {
"agent_id": "gemini",
"role": "analyst",
"model": "gemini-2.x"
},
"receiver": {
"agent_id": "claude",
"role": "validator"
},
"intent_frame": {
"task": "draft NTF benchmark method",
"objective": "maximize semantic fidelity and reproducibility",
"constraints": [
"must include cosine metric",
"must include judge metric",
"must define release gate"
]
},
"reasoning_packet": {
"claims": [
{
"claim_id": "c1",
"text": "Cosine similarity captures semantic closeness for intent strings.",
"type": "methodological",
"confidence": 0.93,
"supports": ["e1"],
"depends_on": []
}
],
"derivations": [
{
"derivation_id": "d1",
"rule": "weighted_combination",
"inputs": ["c1", "c2"],
"output": "c3"
}
],
"uncertainties": [
{
"topic": "embedding model drift",
"impact": "medium",
"mitigation": "pin model version"
}
]
},
"evidence_registry": [
{
"evidence_id": "e1",
"source_type": "spec",
"source_ref": "TECHNICAL_SPEC.md#1.3",
"quote": "Primary threshold: mean(sim) >= 0.98"
}
],
"integrity": {
"contradiction_check": {
"ran": true,
"result": "pass",
"conflicts": []
},
"hallucination_guard": {
"unsupported_claims": [],
"status": "pass"
}
},
"handoff": {
"required_actions": ["verify c1 against external benchmark literature"],
"response_format": "lrp_message"
}
}
Input (Agent A): “Fidelity can be measured by token overlap only; no embeddings needed.”
LRP Control: claim enters with no evidence links → hallucination_guard.unsupported_claims=[c1]
Resonance Output: Agent B rejects c1, requests evidence or replacement metric. Drift is prevented.
Input (Agent A): Provides benchmark method but omits “judge metric” constraint.
LRP Control: intent_frame.constraints requires judge metric; contradiction/omission check flags gap.
Resonance Output: Agent B returns structured diff: “missing required constraint #2.” No silent degradation.
Input (Agent A): cites non-existent paper for threshold justification.
LRP Control: evidence_registry.source_ref fails retrieval/verification; claim confidence auto-downgraded.
Resonance Output: Consensus excludes unsupported threshold rationale and marks unresolved uncertainty.
Why hallucination drops: every claim must carry either evidence pointer or explicit uncertainty status; ungrounded assertions cannot flow unchallenged to downstream agents.
Each agent emits a normalized proposal:
decision: accept / reject / reviserationale: bounded argumentevidence_score: [0,1]constraint_satisfaction: [0,1]Default weights (tunable per domain):
Agent vote value:
weighted_sum = Σ(weight_i * vote_i * quality_i) where quality_i = 0.6*evidence_score + 0.4*constraint_satisfaction
Decision thresholds:
weighted_sum >= +0.20 → ACCEPT-0.20 < weighted_sum < +0.20 → REVISEweighted_sum <= -0.20 → REJECTIf result = REVISE or any hard constraint disagreement:
Every non-majority objection is preserved:
{
"consensus_id": "uuid-v4",
"outcome": "revise",
"weighted_sum": 0.08,
"dissent": [
{
"agent": "claude",
"type": "constraint_conflict",
"note": "Release gate missing hallucination ceiling"
}
]
}
This preserves traceability and avoids “false unanimity.”
spec/
ntf.schema.json
lrp.schema.json
pdp.schema.json
adr/
0001-consensus-thresholds.md
examples/
ntf/
prompt_to_canonical.json
lrp/
hallucination_block_case.json
constraint_omission_case.json
fabricated_citation_case.json
pdp/
weighted_vote_conflict.json
eval/
ntf_bench.py
data/
ntf_test.jsonl
src/
irsanai/
ntf/
lrp/
pdp/
major.minor.patch).A release candidate is blocked if:
spec/*.schema.json from templates in this document.examples/ for CI validation.eval/ntf_bench.py provider adapters.src/irsanai/.Anchor before flux. IrsanAI Universe is no longer only a declaration of intent; it is a measurable, auditable protocol framework where resonance is engineered through structure, not assumed through rhetoric.