https://voicetest.dev/ https://github.com/voicetestdev/voicetest - test harness ...

https://github.com/voicetestdev/voicetest

- test harness for voice agents. - multi platform formats (Retell, VAPI, Bland, LiveKit) compile down to a unified AgentGraph IR - import from one platform, test locally, export to another - use litellm, DSPY to config models, if on a subscription use claudecode as a runner to avoid API call charges - metric judges produce continuous 0-1 scores instead of binary pass/fail since a 0.65 and a 0.35 both fail a 0.7 threshold but represent very different agent behaviors. - persist to DuckDB for querying across test history - adding auto-healing graph mutations where failed tests propose structural + prompt changes to the agent graph and validate against a regression suite

Wrote up the architecture here https://peet.ldee.org/general/2026/02/03/testing-voice-ai-ag...