I don’t think LLMs can easily find errors in their output. There was a recent me...

I don’t think LLMs can easily find errors in their output.

There was a recent meme about asking LLMs to draw a wineglass full to the brim with wine.

Most really struggle with that instruction. No matter how much you ask them to correct themselves they can’t.

I’m sure they’ll get better with more input but what it reveals is that right now they definitely do not understand their own output.

I’ve seen no evidence that they are better with code than they are with images.

For instance, if the time to complete only scales with length of the token and not the complexity of its contents then it probably safe to assume it’s not being comprehended.