Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We can’t even fully explain how our own brains work, never mind a system that’s completely alien to us and that would have to be more complex. We can’t even explain how current LLMs work internally. Maybe we’ll make some breakthrough if we put enough resources into it but if people keep denying the problem there will never be enough resources out into it


> We can’t even explain how current LLMs work internally.

You sure can. They are just not simple explanations yet. But that’s the common course of inventions, which in foresight are mind-bogglingly complex, in hindsight pretty straightforward.


You can explain the high level concepts but it’s really difficult to say “this group of neurons does this specific thing and that’s why this output was produced”, though OpenAI did make some progress in getting GPT-4 to explain what each neuron in GPT-2 is correlated to but we can also find what human brain regions are correlated to but that doesn’t necessarily explain the system as a whole and how everything interacts


> but it’s really difficult to say “this group of neurons does this specific thing and that’s why this output was produced”,

That's because that's not how brains work.

> though OpenAI did make some progress in getting GPT-4 to explain what each neuron in GPT-2 is correlated to

The work contained novel-to-me, somewhat impressive accomplishments, but this presentation of it was pure hype. They could have done the same thing without GPT-4 involved at all (and, in fact, they basically did… then they plugged it into GPT-4 to get a less-accurate-but-Englishy output instead).


When I said about a group of neurons I was talking about LLMs, but some of the same ideas probably apply. Yes, it’s probably not as simple as that, and that’s why we can’t understand them.

I think they just used GPT-4 to help automate it on a large scale, which could be important to help understand the whole system especially for larger models


> I think they just used GPT-4 to help automate it on a large scale,

No, they used it as a crude description language for Solomonoff–Kolmogorov–Chaitin complexity analysis. They could have used a proper description language, and got more penetrable results – and it would've raised questions about the choice of description language, and perhaps have led to further research on the nature of conceptual embeddings. Instead, they used GPT-4 to make the description language "English" (but not really – since GPT-4 doesn't interpret it the same way as humans do), and it's unclear how much that has affected the results.

Here's the paper, if you want to read it again: https://openaipublic.blob.core.windows.net/neuron-explainer/... Some excellent ideas, but implemented ridiculously. It's a puff piece for GPT-4, the Universal Hammer. They claim that "language models" can do this explaining, but the paper only really shows that the authors can do explaining (which they're pretty good at, mind: it's still an entertaining read).


While I agree with the other comment, I'd like to add one thing to help you see the false equivalency being made here: we didn't make the human brain.

Now, with that being understood, why wouldn't we understand a brain that we made? Don't say "emergent properties", because we understand the ermegent properties of ant colonies without having made them.


You seem to misunderstand the size of the complexity space we are dealing with here. We take simple algorithms feed massive amounts of data to the algorithms, and the algorithm makes a network connecting it together. Humans did not choose anything about how that network is connected... the data and feedback loop did.

>because we understand the ermegent properties of ant colonies without having made them.

I would disagree. We observe ants and can relatively classify their behavior well, but still have many issues predicting their behavior even though they are relatively simple agent types with a far lower complexity choice space than an LLM+plugin could/does provide.

This is the key, we could not predict ant behavior before observing it, the same will be true for AI agents of large complexity. At least at this point of understanding we could still ask it some unexpected question and get unexpectedly dangerous responses from it.


> You seem to misunderstand the size of the complexity space we are dealing with here.

I don't. The whole point of my job is to understand this exact degree of complexity. The rest of your comment rides on that assumption, and is moot. As I said, emergent properties are often unexpected, but once the observation has been made, it is possible to work back through the system to understand how the property came to be (because the system is still deterministic, though not apparently obvious.) Having a billion input features does not change this.

Frankly, understanding the oddness of a ML model is not half as difficult as people make it, and might even be easier than predicting ant colonies. Our inability to predict their behavior has nothing to do with our ability to understand what features influence which behaviors, only that measuring the current state of an ant colony is difficult because it's not like they're all housed within an array of memory. LLMs are, thus we'd understand them better if the most common implementations were not hidden away. Case in point, all the optimization techniques that have emerged from having LLaMa available.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: