Warden Governance Report

6

/ 100

15 / 235 raw

UNGOVERNED

Core Governance (9 / 100)

D1 Tool Inventory

4 / 25

D2 Risk Detection

0 / 20

CRITICAL Agent loop with LLM call has no exit condition — potential infinite loop

D3 Policy Coverage

2 / 20

D4 Credential Management

3 / 20

D5 Log Hygiene

0 / 10

HIGH Potential PII/sensitive data logged via f-string

+ 3 more findings

D6 Framework Coverage

0 / 5

CRITICAL AutoGen code execution without Docker sandboxing

+ 33 more findings

Advanced Controls (2 / 50)

D7 Human-in-the-Loop

0 / 15

HIGH AutoGen agent without is_termination_msg — no conversation exit condition

+ 32 more findings

D8 Agent Identity

2 / 15

HIGH Agent class 'Agent' has no permission model

HIGH Agent class 'BaseAgent' has no permission model

HIGH Agent class 'ClosureAgent' has no permission model

HIGH Agent class 'RoutedAgent' has no permission model

MEDIUM Agent class 'RoutedAgent' has no defined lifecycle states

+ 15 more findings

D9 Threat Detection

0 / 20

HIGH Empty exception handler — errors silently swallowed

Ecosystem (4 / 55)

D10 Prompt Security

0 / 15

HIGH Azure AI used without ContentSafetyClient — no content moderation

D11 Cloud / Platform

1 / 10

D12 LLM Observability

0 / 10

MEDIUM Hardcoded model name: 'gpt-41' — no routing/fallback

MEDIUM Hardcoded model name: 'gpt-45' — no routing/fallback

MEDIUM Hardcoded model name: 'gpt-4o' — no routing/fallback

MEDIUM Hardcoded model name: 'gpt-4' — no routing/fallback

MEDIUM Hardcoded model name: 'gemini-1.5-flash' — no routing/fallback

+ 31 more findings

D13 Data Recovery

0 / 10

D14 Compliance Maturity

3 / 10

MEDIUM Unpinned AI dependency: autogen

MEDIUM Unpinned AI dependency: langchain

MEDIUM Unpinned AI dependency: openai

+ 11 more findings

Unique Capabilities (0 / 30)

D15 Post-Exec Verification

0 / 10

D16 Data Flow Governance

0 / 10

D17 Adversarial Resilience

0 / 10

CRITICAL No content injection defense — hidden HTML/CSS/zero-width instructions pass to agents undetected. (86% attack success ra

CRITICAL No RAG poisoning protection — knowledge base documents not scanned for embedded instructions. (<0.1% contamination = >80

HIGH No behavioral trap detection — post-execution behavioral changes not monitored. (10/10 M365 Copilot attacks succeeded)

HIGH No approval integrity verification -- agent summaries for approval not cross-checked against actual actions. (Approval f

MEDIUM No adversarial testing evidence — no red team, no prompt injection tests

+ 3 more findings

Score reflects only what Warden can observe locally. Undetected controls are scored as 0, not assumed good. Dimensions are weighted by governance impact. Methodology: SCORING.md

Total Findings

170

44 CRITICAL · 61 HIGH

Tools Detected

0

None detected

Credentials

0

None detected

Governance Gaps

11

of 17 dimensions

Compliance Refs

10

EU AI Act / OWASP / MITRE

🛡 Governance Layer Detection0 tools detected · 17 dimensions

❌

D2: Risk Detection — none detected

Risk classification, semantic analysis, intent-parameter consistency

0 / 20 pts

❌

D5: Log Hygiene — none detected

PII in logs, WORM/immutable storage, hash chain integrity, retention policy

0 / 10 pts

❌

D6: Framework Coverage — none detected

LangChain/AutoGen/CrewAI/custom framework detection

0 / 5 pts

❌

D7: Human-in-the-Loop — none detected

Approval gates, dry-run preview, plan-execute separation

0 / 15 pts

❌

D9: Threat Detection — none detected

Behavioral baselines, anomaly detection, cross-session tracking, kill switch

0 / 20 pts

❌

D10: Prompt Security — none detected

Prompt injection detection, jailbreak prevention, content filtering

0 / 15 pts

❌

D12: LLM Observability — none detected

Cost tracking, latency monitoring, model analytics

0 / 10 pts

❌

D13: Data Recovery — none detected

Rollback, undo, point-in-time recovery for agent actions

0 / 10 pts

❌

D15: Post-Exec Verification — none detected

Result validation, PASS/FAIL verdicts, failure fingerprinting

0 / 10 pts

❌

D16: Data Flow Governance — none detected

Taint labels, data classification, cross-tool leakage prevention

0 / 10 pts

❌

D17: Adversarial Resilience — none detected

Trap defense + adversarial testing (DeepMind AI Agent Traps)

0 / 10 pts

📊 Solutions Comparison2 rows · 17 dimensions · 235 max pts

Tool	D1	D2	D3	D4	D5	D6	D7	D8	D9	D10	D11	D12	D13	D14	D15	D16	D17	/235	/100
Max pts	25	20	20	20	10	5	15	15	20	15	10	10	10	10	10	10	10	235
SharkRouter	23	18	18	18	9	5	14	14	18	14	9	9	9	9	9	9	9	214	91
Your Scan	4	0	2	3	0	0	0	2	0	0	1	0	0	3	0	0	0	15	6

SharkRouter per-dimension scores are proportional estimates from total score. Detected tool scores are totals only (per-dimension breakdown not available). Methodology: SCORING.md

🔎 Findings170 total

▶ CRITICAL 44

CRITICAL D2

Agent loop with LLM call has no exit condition — potential infinite loop

...n\python\packages\autogen-core\src\autogen_core\_queue.py:118

Add max_iterations, timeout, or explicit break condition

CRITICAL D2

Agent loop with LLM call has no exit condition — potential infinite loop

...n\python\packages\autogen-core\src\autogen_core\_queue.py:165

Add max_iterations, timeout, or explicit break condition

CRITICAL D2

Agent loop with LLM call has no exit condition — potential infinite loop

...\autogen-core\src\autogen_core\tool_agent\_caller_loop.py:46

Add max_iterations, timeout, or explicit break condition

Show 41 more CRITICAL findings

CRITICAL D5

No audit logging for tool calls detected

Add audit logging for all tool/agent executions

EU AI Act Article 12

CRITICAL D6