Yeah, I have never had good results with refining ideas with models or really any interactions with models outside of rote task such as coding or analyzing document structures, I don't know why I was ever surprised by this as its obvious that LLMs just aren't capable of original thinking. I think part of the problem is that these things were marketed originally as chatbots when that is honestly their weakest use-case. I think even when I was expressly try to not anthropomorphize LLMs I still sorta did in early days, but the less I do so the more utility I get from them.