Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.

Claude is now actually one of the better ones at instruction following I daresay.



In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.

This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.

Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: