For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.
This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.
Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.
Claude is now actually one of the better ones at instruction following I daresay.