François Chollet, author of Keras, lays it out pretty clearly that LLMs are just...

astrange · on Feb 25, 2024

We can prove that transformers can do computation beyond memorization, and that they can learn it from examples, unless you think learning an algorithm is memorization.

https://arxiv.org/abs/2301.05217

https://arxiv.org/abs/2310.16028

saberience · on Feb 25, 2024

Surely learning an algorithm is memorization? I still remember the physics equations from when I was at school (a long time ago) because it was drilled into us repeatedly. Stuff like v = u + at, or s=tu + 1/2 (at)^2, etc etc.

I don't really recall how they were derived etc but I did memorize them. It definitely doesn't mean I'm good at physics or understand the deeper meaning of these equations. I also had to memorize bubble sort, merge, heap, quicksort etc when I was in university, but I don't think I could invent these sorts from first principles without looking it up.

Memorization doesn't really equal understanding, it's just memorization.

sushibowl · on Feb 25, 2024

> Surely learning an algorithm is memorization?

Not in this context, no. An LLM is never given any algorithm to memorise. It is given only input->output sets, and it "learns" what the algorithm is from those sets.

We know that it does this because it is able to generalize that algorithm to inputs and outputs outside of its set of training examples. So we know that it doesn't only memorise which input connects to which output, and regurgitate that information. It has come to "understand" the formula that connects the input to the output, without ever being given that formula.

quantum_state · on Feb 25, 2024

It’s fitting a function to data and guessing the value of the function for a new input. It does not know what is under the hood of the function, etc., and see the implications when the function produces a wrong value … etc.

astrange · on Feb 26, 2024

It can do that. Verifying an answer is just another algorithm it can learn.

LLMs mostly can't do math but that, like most of their other flaws, is because of the tokenizer.

retrac · on Feb 25, 2024

> Surely learning an algorithm is memorization?

Yes. In a sense, every algorithm can be reduced to memorization, where you precalculate all possible inputs to all possible outputs and store them in a giant lookup table.

bamboozled · on Feb 25, 2024

Maybe there doing something we don’t really have a name for yet ? This is why it causes so much controversy.

Not really generalizing, not memorizing, maybe approximating ?

spuz · on Feb 25, 2024

There is certainly a lot of discussion around semantics to be had before we can agree on what exactly an LLM is capable of.

We know that LLMs tend not to be good at math. Some people will say that the fact that an LLM cannot sum two numbers demonstrates they cannot generalise. Yet others would say the way LLMs calculate sums is generalisation because it mirrors our own process of addition which works mostly by memorisation and a lot of double checking.

My perspective is that the "thinking" that LLMs do is a lot closer to the kind that humans do which is to say a lot of pattern matching but with no fundamentally precise logic underlying it. If LLMs are flawed in some way then humans are also flawed in the same way.

ein0p · on Feb 25, 2024

There is a name for this already: associative memory. That’s how you catch a ball: you condition your memory of catching a ball with proprioceptive and visual input, much like a multimodal transformer. There’s no thinking involved - you wouldn’t be able to catch it if you had to think

vidarh · on Feb 25, 2024

An associative memory can model truth tables of logic circuits, and so the steps to go from associative memory to arbitrary computation pretty much requires adding state and a loop. And we do provide state and a loop. As such, while that does not prove that any given model is capable of reasoning, describing what goes on at any step as associative memory says nothing meaningful about the computational power of the wider system.

It's very possible the way we train current models will turn out to place fundamental limits on the abilities the models will have, but that will not be because the models act as associative memories.

ein0p · on Feb 25, 2024

I think the current models will be a fundamental part of larger, more comprehensive models, and they will handle the “easy” parts, much like we, humans, use our associative memory to be able to do things in realtime, “without thinking”. Yann LeCun wrote a paper on this which more people should read: https://openreview.net/pdf?id=BZ5a1r-kVsf. Moreover, Meta’s research is starting to move in that direction https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...

crotchfire · on Feb 25, 2024

> Not really generalizing, not memorizing, maybe approximating ?

The term you are searching for is confabulating.

dataflow · on Feb 25, 2024

"Vibing"?

mirekrusin · on Feb 25, 2024

No need to come up with any new name, we already have it:

1. autocompletion or

2. next token prediction and

3. (reverse) diffusion

rurban · on Feb 26, 2024

They might do "computation" and seem to be "intelligent", however they do lack logic and rigor, and cannot explain their reasoning. In contrast to old-school AI.

Such "intelligent" monkeys might be able to write reports, lead governments or lead wars, but in engineering you need more skills than that. Which leads us to the lack of a proper definition of AGI. (https://en.wikipedia.org/wiki/Artificial_general_intelligenc...)

dataflow · on Feb 25, 2024

I guess what constitutes memorization depends on what you consider "learning an algorithm". Memorizing to humans doesn't really mean learning the exact input/output pairs per se. Like a student might "learn" an algorithm for differentiation (d/dx x^n = n x^(n-1)) and then differentiate 732638 x^2 just fine despite never seeing it before, but then tell you the derivative of yx^2 with respect to y is 2yx, or something. Did they really learn how to differentiate or did they just learn a common vibe around it? When teachers see that sort of regurgitation, they call it memorization, despite the input being unique from what the student had seen in the past.

danielmarkbruce · on Feb 25, 2024

A lot of the words he uses in that tweet aren't well defined. e.g. memorization, dataset (does he mean the literal words/tokens or any token that is close in space after embedding?), pattern, category, program. The tweet is practically meaningless. I'm not criticizing him because his blog post is nuanced and he clearly understands what he's talking about, but that tweet almost certainly means something quite specific to him and he's communicating quite poorly.

As you mention, there is a sophisticated representation of the tokens. It's so sophisticated that one may reasonably stop calling them tokens (or, even data) and start calling them "concepts". Now, if someone (or something) has memorized how all the concepts go together... that's pretty darn intelligent.

dartos · on Feb 25, 2024

I think there’s a lot of people who got into tech in 2020. They are new programmers and technologists and made a life change in 2020.

I think they were a big part of the crypto bubble. Lots of talent, hungry for that sweet startup gold, but without the technical background to really know what’s going on.

I believe these same groups are operating in the same way with AI. Recklessly bashing together APIs and cloud services to create MVPs.

It’s all the worst parts of startup culture concentrated.

Anyway, that’s why i think most of the AI space rn is just people calling APIs and acting like they discovered fire.

</salty rant>

ein0p · on Feb 25, 2024

Francois is not a researcher, however. LLMs aren’t just plain “memories”. They are very explicitly _associative_ memories. And it just so happens that this is mostly what our own brains do, too.

Actual cognition is slow and expensive for us, and we try to use it as little as possible, filling in what we can with easy, associative, low energy, near instant stuff.

Therein lies the reason why AI can be considered a boon for us humans. If machines took over the mundane work that just drains our energy and doesn't add much value, we could finally have the time to actually do what they can’t - think deeply about stuff, with their help where we find our faculties lacking. Rocket for the mind, if you will, rather than a bicycle.

vidarh · on Feb 25, 2024

I think even "actual cognition" is highly unlikely to need anything much more than associative memories plus some state and a loop. The expense being having to "execute" a large number of steps rather than "just" effectively pulling learned results from "cache".

To be very reductive, an associative memory can hold a truth table. Put minimal state, IO and a loop around that and you have a universal Turing machine. Which is why the "it's just memory" or "it's just Markov chains" is so tedious - it says near nothing about the computational power of the system including the model.

There's plenty of reason, of course, to question to what extent we know the abilities of the models, but when people assume dismissing it as "memory" or "just statistics" or "just Markov chains" I usually take it as a signal they don't understand how few limitations that imposes.

csomar · on Feb 25, 2024

I wonder if cognition or intelligence is actually "brain damage". Your bad memory is leading to erroneous paths but some of these will eureka. Genetic evolution and natural selection are essentially that.

If your memory is too bad, then you are either insane or in an advanced state of Alzheimer. If you have enough stable paths to lead a quasi-normal life, then you become an inventor or an artist; or something atypical.

Hallucination is the feature, not the bug.

PheonixPharts · on Feb 25, 2024

You must have a pretty impressive resume if you don’t consider François a researcher! The wikipedia would disagree with you [0], as would anyone that has had any interaction with him on the subject.

0. https://en.wikipedia.org/wiki/Fran%C3%A7ois_Chollet

ein0p · on Feb 25, 2024

Try to find a single paper on Transformers or LLMs in general in Francois’ scientific output: https://dblp.org/pid/116/8242.html

Don’t get me wrong, Keras is impressive, and Francois is impressive as well. But for insight on LLMs you should probably listen to people who specialize in them.

kmac_ · on Feb 25, 2024

If you measure knowledge by the number of publications, then LLMs know nothing.

vidarh · on Feb 25, 2024

They made the argument he's not a researcher in this field, not that he doesn't know anything. And an LLM is indeed not a researcher in this field either.

kmac_ · on Feb 25, 2024

It appears that my previous comment lacked clarity. I assumed the inherent absurdity and illogicality of the statement would be self-evident. Dismissing François's opinion, a seasoned professional with extensive expertise in model development and the creator of ARC, solely because he asserts that LLMs are far from achieving AGI and using publication count as the metric for evaluating him is not a well-reasoned argument. While LLMs and transformers are indeed remarkable achievements and will undoubtedly reveal more properties, they have yet to exhibit any true signs of "intelligence."

vidarh · on Feb 25, 2024

Acknowledging that they have not yet shown signs of "true intelligence" is a vastly more moderate claim than the person above ascribed to him.

djmips · on Feb 25, 2024

An AGI might be something that can harness the LLM but also self learn.

djmips · on Feb 25, 2024

More like AI will finally be the bicycle for the mind.

ein0p · on Feb 25, 2024

IDK, credit where credit is due: traditional computers got us much further than we’d be able to go on our own. So they’re a fine “bicycle” in my view.

VirusNewbie · on Feb 25, 2024

But we can trivially show that the larger models can generalize for some questions even if we verify the answer isn’t in the training set.

alecco · on Feb 25, 2024

About [0], Phi-2 is sort of a proof of concept of having very narrow and high quality dataset normal transformer models can perform at 8x to 10x better results. Of course, if you add messy prompts they will fail! What a dumb view.

And the Yi-* models are suspect of being trained on the test set or at least be contaminated. All the other models barely move and if they do, it's probably an artifact of being multiple choice. There were papers showing most models improve if they can reason the answer by letting it have more tokens in the answer.

The chat elo-like rankings are much more interesting:

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

For completeness, here's the paper linked in the tweet https://arxiv.org/pdf/2402.01781.pdf

mirekrusin · on Feb 25, 2024

What is intelligence or generalization if not "sophisticated, complex latent space" navigation?

K0balt · on Feb 25, 2024

I think this is exactly what most people are misunderstanding. LLMs are wonderful tools for navigating the sociolinguistic matrix, which embeds the majority of human knowledge both explicitly and covertly with a huge amount of contextual nuance.

Transformers are capable of decoding and operating on this covert knowledge as easily as the encyclopedic knowledge that we tend to assume is the “main part” when we write something down.

What llms demonstrate is that language covertly encodes logic and algorithmic knowledge that is at least as rich as the “factual” encoding that written content seems to be at face value.

LLMs just make this knowledge accessible for computation. What is amazing is that this function alone is capable of producing a simulacrum of agency and intelligence all by itself. This suggests that the human cultural component is a pretty huge part of what we consider to be “human” and without it we’d just be clever apes.

It’s the clever ape part that LLMs don’t have, but it’s possible that the transformer model might be able to “ape” most of that as well if applied to the task of existing as an embodied entity in the world (see transformer use in robotic task completion)

I would not be terribly surprised if comprehensive multimodality, encompassing the entire spectrum of sensory experience as well as physical environment interactivity, gets us really close to something we could consider AGI.

Just as LLMs extract the embedded relationships in written knowledge, the physical and sensory spaces encode an enormous quantity of information that can similarly be generalised to extrapolate a cornucopia of additional concepts (gravity, object permanence, physics in general, relativity, etc). Meaningfully “tokenizing” these spaces will likely be the key to making this work effectively.

mirekrusin · on Feb 25, 2024

I find this human-supremacy point of view simply false.

Humans are just "aping" more (with large spread within its population by the way).

There is nothing special about humans, we're just "aping" slightly more than other animals - computers will arrive and immediately surpass us at our "aping" = what we call "intelligence".

This old, false argument with constant goalpost moving will run out of space sooner or later.

K0balt · on Feb 26, 2024

I think you misinterpret my statement, as I too believe that human intelligence is based on the same principles as are being exploited in generative AI. I think there is ample evidence to support that viewpoint, even adversarial images that affect humans in the same way (to a lesser extent) as they do classifiers.

I do not believe there is anything special about human intelligence, mostly just that it embodies more complexity than we currently have access to in our training data, and perhaps the hardware required might be expensive, or maybe not really.

Interactive / bootstrapping learning is still something we will need to figure out.

eigenket · on Feb 25, 2024

As far as I know we don't have much real understanding of how human or even animal intelligence works. It may be that it is entirely based on "sophisticated, complex latent space" navigation or it may be that it has nothing whatsoever to do with that (or anything in-between).

andy_ppp · on Feb 25, 2024

Our brains can reuse patterns they learned in one area into another area, LLMs need specific examples and can combine these in weird unpredictable and often inhuman ways losing context and meaning, the bits humans care about. They often aren’t even a good starting point compared to just thinking deeply for 5 minutes.

CamperBob2 · on Feb 25, 2024

Show me a Markov chain that can thrash a 9-dan Go master.

"But that wasn't an LLM."

OK, show me a Markov chain that can write a Python program that can play Go at all.

vidarh · on Feb 25, 2024

Put a loop around a Markov chain where you provide a 'tape interface' taking instructions from and feeding input back to, the state, and you have a Markov decision process with a hard-wired decision maker acting as the tape. Provide the right Markov chain, and you have a universal Turing machine. So the extension needed from a Markov chain to something that could be programmed to do what you describe - say by running an ML model - is only very slight. And we do provide loops, and state when we run inference, just not infinite.

I'm agreeing with your overall point, to be clear - my point is that calling something a Markov chain is effectively calling it trivially extendable to something that can in principle compute everything any physical entity confined to the known laws of physics can, and so what it boils down to is whether or not the model is trained in a way that gives it those abilities, and not the put-down of the potential ability of such a system that people usually intend the "just a Markov chain" as.

xapata · on Feb 25, 2024

Semantics, I suppose. Those ANNs were Markov chains, in a sense.

vidarh · on Feb 25, 2024

Pop a loop around a Markov chain that provides a "tape interface" and you have something capable of simulating a Turing machine. So when people bring up the Markov chains argument, they're saying next to nothing about the potential computational abilities of the system, even though they usually intend to dismiss it.

I tend to see people bringing that up in a dismissive way (not suggesting you are) as a clear indication they either haven't thought the argument through or do not understand how little it takes for a system to be Turing complete, and so for that argument to be meaningless.

xanderlewis · on Feb 25, 2024

I mean, I’m not sure what you’re trying to prove by asking for a Markov chain model like that. It’s trivially true that you can have a Markov chain output whatever you like (somewhat artificially, but we are talking about memorisation here) if you pick your training data carefully.

CamperBob2 · on Feb 25, 2024

So "pick the training data carefully," and show me what I'm asking for, given that it is "trivial."

xanderlewis · on Feb 28, 2024

The data consists of a graph whose vertices correspond to the output sequence you desire and where there is an edge (with probability 1) from string x to string y if and only if x precedes y.

This model will arise from the desired sequence as a single training example (learning the probability of each pair of consecutive tokens), provided none are repeated.

Now run your Markov chain with initial input {first token in your sequence}.

tymscar · on Feb 25, 2024

Im curios, what sort of area you think needs more research when it comes to these models