Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my mind, the pure reinforcement learning approach of DeepSeek is the most practical way to do this. Essentially it needs to continually refine and find more sound(?) subspaces of the latent (embedding) space. Now this could be the subspace which is just Python code (or some other human-invented subspace), but I don't think that would be optimal for the overall architecture.

The reason why it seems the most reasonable path is because when you create restrictions like this you hamper search viability (and in a high multi-dimensional subspace, that's a massive loss because you can arrive at a result from many directions). It's like regular genetic programming vs typed-genetic programming. When you discard all your useful results, you can't go anywhere near as fast. There will be a threshold where constructivist, generative schemes (e.g. reasoning with automata and all kinds of fun we've neglected) will be the way forward, but I don't think we've hit that point yet. It seems to me that such a point does exist because if you have fast heuristics on when types unify, you no longer hamper the search speed but gain many benefits in soundness.

One of the greatest human achievements of all time is probably this latent embedding space -- one that we can actually interface with. It's a new lingua franca.

These are just my cloudy current thoughts.



DeepSeek's approach with R1 wasn't pure RL - they used RL only to develop R0 from their V3 base model, but then went though two iterations of using current model to generate synthetic reasoning data, SFT on that, then RL fine-tuning, and repeat.


fwiw, most people don't really grok the power of latent space wrt language models. Like, you say it, I believe it, but most people don't really grasp it.


Image generation models also have an insanely rich latent space. People will be squeezing value out of SDXL for many years to come.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: