I’m of similar opinion. I keep seeing claims of smooth AI video at the level of existing image generation ‘coming soon’ but I haven’t seen any evidence that it’s even close. Few years? Sure.
I expect short video generation to be at the level of DALLE 2 in 1 year — e.g. able to get the gist of a prompt, but with lots of artifacts, requiring a lot of compute, and frequently ignoring large parts of the prompt.