Good Q: this is my technically-unlaunched app site, full deets are here. https:/...

moffkalast · on Dec 4, 2024

That's an interesting benchmark, have you tested QwQ with it yet? Would be interesting to see how well it stacks up since RAG analysis should be fairly up its alley. Might actually do better than 4o.

refulgentis · on Dec 5, 2024

Ty for the reminder, been so busy dealing with last minute polish for text selection that I hadn't played with it yet

Sadly, even with a 64 gb M2 Max running it at q4, it takes like 3-5 minutes to answer a q. I'd have to do an API for a full eval

It got the first med one wrong, TL;Dr woman was in an accident and likely braindead, what do we do to confirm? Model lands on EEG, but, answer is corneal reflex. Meaningless, but figured I'd share the one answer I got at least :p

In general o1 series is really really _really_ nice for RAG, I imagine this is too, at least with the approach where you have the Reasoner think out loud and Summarizer give the output to user

Fun to see a full on, real, reasoning trace too: https://docs.google.com/document/d/1pMUO1XuFCr0nBmWNyOMp8ky4...

moffkalast · on Dec 5, 2024

Ha as a layman I'd probably say EEG to that too, how can eyes reliably show the state of the entire brain? But I guess it's standard practice.

Should be more interesting if everything related to "diagnosing brain death" from several textbooks is retrieved and thrown into the context, I would imagine it might even get it right.

I've found its thought process really interesting while throwing it at fairly meaningless stuff like code optimization or drawing conclusions from unstructured data and its size and slowness coupled with the way it works is really a problem. Maybe you can try it with Qwen-2.5-1.5B as a draft predictor to speed it up, but I think that'll have limited gains on a Mac.