Sleeping moves your memories from your working memory in your neocortex to your long term memory in your hippocampus. If you were an LLM, sleeping would basically move the contents from your adaptive system/memory prompt to the underlying model weights. It's weird that noone has really done that yet, but I can understand why the big AI chat corpos don't do it: You'd have to store a new model with new weights for each user if you don't want to risk private info spilling to others. If you have a billion users, you simply cant do that (at least not without charging obscene amounts of money that would prevent you from having a billion users in the first place). Current LLM architectures that start with a clean slate for every conversation are really good for serving to billions of people via cloud GPUs, because they can all run the exact same model and get all their customization purely from the input. So if we ever get this, it'll probably be for smaller, local, open models.
On a much simpler level, llm frameworks could re-summarize their context to keep relevant, use-case-specific facts, cleanup and also organize long and short term memory on some local storage, etc. So kind of like sleep. I think these examples are low hanging fruit to improve the perceived intelligence of LLM systems (so probably they're already used somewhere).
We already have that for a while. It works to some degree, but context tokens simply don't offer the level of compression that model weights do. At least with current approaches that keep the context human-readable.