My guess is it uses the same vocabulary size as llama 3.1 which is 128,000 different tokens (words) to support many languages. Parameter count is less of an indicator of fitness than previously thought.
That doesn't address the thing they're skeptical about, which is how much knowledge can be encoded in 3B parameters.
3B models are great for text manipulation, but I've found them to be pretty bad at having a broad understanding of pragmatics or any given subject. The larger models encode a lot more than just language in those 70B+ parameters.
Ok, but what we are probably debating is knowledge versus wisdom. Like, if I know 1+1 = 2, and I know the numbers 1 through 10, my knowledge is just 11, but my wisdom is infinite in the scope of integer addition. I can find any number, given enough time.
I'm pretty sure the AI guys are well aware of which types of models they want to produce. Models that can intake knowledge and intelligently manipulate it would mean general intelligence.
Models that can intake knowledge and only produce subsets of it's training data have a use but wouldn't be general intelligence.
Usually the problem is much simpler with small models: they have less factual information, period.
So they'll do great at manipulating text, like extraction and summarization... but they'll get factual questions wrong.
And to add to the concern above, the more coherent the smaller models are, the more likely they very competently tell you wrong information. Without the usual telltale degraded output of a smaller model it might be harder to pick out the inaccuracies.