In my mind, the pure reinforcement learning approach of DeepSeek is the most pra...

HarHarVeryFunny · on Feb 7, 2025

DeepSeek's approach with R1 wasn't pure RL - they used RL only to develop R0 from their V3 base model, but then went though two iterations of using current model to generate synthetic reasoning data, SFT on that, then RL fine-tuning, and repeat.

danielmarkbruce · on Feb 6, 2025

fwiw, most people don't really grok the power of latent space wrt language models. Like, you say it, I believe it, but most people don't really grasp it.

ttul · on Feb 7, 2025

Image generation models also have an insanely rich latent space. People will be squeezing value out of SDXL for many years to come.