Hacker Newsnew | past | comments | ask | show | jobs | submit | lllllm's commentslogin

Swiss AI Initiative | https://www.swiss-ai.org/ | Hybrid/ONSITE (in Europe)

We are a young team, and the creators of the Apertus LLM, the currently leading open-data open-weights AI model.

Join us to work on cutting edge LLM training in the open. We do pretraining, alignment, reasoning, multilinguality and multimodality - all at the intersection of engineering and research.

This is a joint team between ETH Zurich and EPFL in Lausanne, running on the Alps supercomputer (one of the largest public institution GPU cluster). Visa sponsoring possible, work language is English.

https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-S...


The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages


common crawl anyway respects the CCbot opt-out every time they do a crawl.

we went a step further because back in old ages (2013 is our oldest training data) LLMs did not exist, so website owners opting out today of AI crawlers might like the option to also remove their past contents.

arguments can be made either way but we tried to remain on the cautious side at this point.

we also wrote a paper on how this additional removal affects downstream performance of the LLM https://arxiv.org/abs/2504.06219 (it does so surprisingly little)


"I didn't know to withdraw consent" isn't the same as "I consent". Thank you for doing the right thing.


Ah good points, thanks for the clarification.


martin here from the apertus team, happy to answer any questions if i can.

the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...

PS: you can run this locally on your mac with this one-liner:

pip install mlx-lm

mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"


Hi, your "truly open" model is "gated" on Huggingface, restricting downloads unless we agree to "hold you harmless" and share our contact info. Can you fix this please, either by removing the restriction, or removing the "truly open" claim?


We hear you, nevertheless this is one of the very few open-weights and open-data LLMs, and the license is still very permissive (compare for example to Llama). Personally of course I'd like to remove the additional click, but the universities also have a say in this.


This project looks awesome!

In the US, many state governments have anti-indemnify laws that restrict the state government agencies (including state universities) from agreeing to contracts and agreements with such language. I'd love to make this available to researchers at my university, but I'm not sure I can click through such an agreement (similar problems exist with other LLMs).

It is Apache 2 and I don't see anything that prohibits another contracting party from agreeing to the Apertus LLM Acceptable Use Policy and redistributing with just Apache 2 and without the AUP. Maybe this provides a solution? Unless I'm missing something?


yes this seems a good way to go. for example you can already find many quantized versions under https://huggingface.co/models?search=apertus%20mlx and elsewhere


Ok so why keep calling it "truly open" then? It's an obvious lie and nobody is forcing you to say it. It benefits your marketing, sure, but it harms everyone else by diluting the meaning of the term "open". So stop doing that please.


Great job! Would it be possible to know what was the cost of training such a model?


From their report:

> Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.


we didn't have time to write one yet, but there is the tech report which has a lot of details already


Report is packed with interesting details. Engineering challenges and solutions chapter especially show how things which are supposed and expected to work break when put through a massive scale. Really difficult bugs. Great writeup.


thank you!


posttraining codebase is here: https://github.com/swiss-ai/posttraining


we released 81 intermediate checkpoints of the whole pretraining phase, and the code and data to reproduce. so full audit is surely possible - still it would depend on what you consider 'practical' here.


benchmarks: we provide plenty in the over 100 page tech report here https://github.com/swiss-ai/apertus-tech-report/blob/main/Ap...

quantizations: available now in MLX https://github.com/ml-explore/mlx-lm (gguf coming soon, not trivial due to new architecture)

model sizes: still many good dense models today lie in the range between our small and large chosen sizes


Thank you! Why are the comparisons to llama3.1 era models?


we compared to GPT-OSS-20B, Llama 4, Qwen 3, among many others. Which models do you think are missing, among open weights and fully-open models?

Note that we have a specific focus on multilinguality (over 1000 languages supported), not only on english


How did it compare with Gemma 3 models? I’ve been impressed with Gemma 27b - but I try out local models frequently and I’m excited to boot up your 70b model on my 128gb MacBook Pro when I get home!


ah im sorry, I missed that - im not that blind usually..


this is what this paper tries to answer: https://arxiv.org/abs/2504.06219 the quality gap is surprisingly small between compliant and not


Yes this is an interesting question. In our arxiv paper [1] we did study this for news articles, and also removed duplicates of articles (decontamination). We did not observe an impact on the downstream accuracy of the LLM, in the case of news data.

[1] https://arxiv.org/abs/2504.06219


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: