Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MiniJinja: Learnings from Building a Template Engine in Rust (pocoo.org)
115 points by todsacerdoti on Aug 27, 2024 | hide | past | favorite | 66 comments


Thank you for this, I just used this to create a rust based cli tool to create j2 yamls. This was motivated because we want to not use ansible j2 yaml generation instead to a standalone binary for it.

Now it feel like helm but it’s j2 and outputs a directory of yamls with base and overlays in kustomization way. So they can be iverriden in gitops by sre if needed


> The first thing is that Rust is just not particularly amazing at tree structures.

As someone who doesn't write any Rust I'd like to understand this more. Is it the ownership model that makes trees weird? Is there something you guys do instead? I was working on an R-tree yesterday for something and I'm slightly surprised to read this but obviously Armin knows what he's talking about :)


Actually, Rust's rules really wants your data to look like trees. Requiring each value to have exactly one owner is not really different than requiring each node of tree to have exactly one parent.

The problem however comes when you want to reference the parent from a child. In the ownership/reference graph that would also count as an edge, and hence creates a cycle with the parent-to-child edge. There are workarounds like using weak reference counted pointers or indexes into arrays, but ultimately they don't offer the same ergonomics as other languages and it's better to design with only parent-to-child links.


Why you want to create cyclic reference from child to parent instead of a bare pointer?


Because dereferencing a raw pointer in Rust requires unsafe.


Everything in Rust requires unsafe code at some point. Encapsulate unsafe code under safe and sound façade, and hide it in a library. Or use an existing library for memory management: arenas, RC, or a Mark&Sweep GC (rust-gc, pgc), etc.


Encapsulating unsafe code behind a safe interface is not so easy (especially when cycles like these are involved) and there's always the danger of getting it wrong, so you generally want to avoid doing that.

In this case using `Rc` for the children and `Weak` for references to the parent does exactly this and is included in the stdlib, so I guess I'm missing your previous point of using "bare pointer". Did you mean that the safe interface should be built around the whole tree rather than just the parent reference?


> Encapsulating unsafe code behind a safe interface is not so easy (especially when cycles like these are involved) and there's always the danger of getting it wrong, so you generally want to avoid doing that.

Yes, it's true. Most of the time I had logic errors only in safe code and segfaults caused by unsafe code. But it's not a problem after C, where segfaults and memory leaks are norm.

> Did you mean that the safe interface should be built around the whole tree rather than just the parent reference?

Safe and sound (bulletproof) façade for the pointer and the tree. Safety is overloaded term in Rust. For example, when indices are used instead of references, it easy to make memory management mistakes invisible to rust compiler. Such code is "safe" in Rust terms, but unsound.


> Safe and sound (bulletproof) façade for the pointer and the tree

That doesn't really answer my question, the façade should be either:

- at the level of the pointer, meaning it offers the user all the functionality they need to code the tree data structure; - at the level of the tree, meaning the tree externally offers the user all the functionality they wanted from the tree, but the tree internally uses `unsafe` and maintains invariants that make its use sound.

> Such code is "safe" in Rust terms, but unsound.

You might not want to use unsound here, since that's overloaded too. The way I usually see and use those terms in a Rust context is:

- safe: code that does not use `unsafe` - sound: a safe interface (i.e. callable from safe code) that internally uses `unsafe`, but there is no way for the safe side to produce UB when calling it.

I would refer to the index code as "incorrect", "erroneous" or "buggy" instead to avoid the overload in this context.


UB and unsoundness are different things. "If it compiles, then it works (properly)" is true for sound code only. Direct indexing of arrays with `for` loop is unsound, because of typical errors with indexes, such as off by one. Iterators are sound.

I want to debug my code less, so I use sound interface to arrays: iterators. Other than that, both interfaces are equal.

If there is no ready to use method for my usage pattern, I can write unsound code directly, or I can write sound interface and then write sound code using it, so compiler will not allow me to write stupid mistakes. Unsound version will be faster to write, but may require more time to debug and support.


> UB and unsoundness are different things. "If it compiles, then it works (properly)" is true for sound code only.

In programming language theory soundness is a property of a type system for which the safety theorem hold. In other words, if some program is well-typed according to such type systems then its execution following the rules of the language abstract machine reduces to some terminal value. "Undefined behaviour" is the opposite situation, where some program reaches a non-terminal state where no abstract machine rules applies to make progress, meaning the abstract machine does not describe or define the behaviour of that program and state pair.

In Rust's case the claim is that this holds only as long as you use its safe subset, in other words "rust's type system is sound for `unsafe`-less programs". This can then be extended to some particular `unsafe` code, making the claim "rust's type system is sound for `unsafe`-less programs plus this particular function/code", which is what is generally meant with "this unsafe code is sound".

Most (all?) type systems don't try to define what it means for a program to "work (properly)", that is generally left to the programmer. For those that do, then yes, unsoundness would mean that an accepted program does not work properly. I find it hard to imagine such a programming languages, though surely I will expect this to be parameterized over some specification that the type system assumes to be the final desired property. In any case this is far from what Rust promises.


I agree completely, but I should point to a problem: it's easy to implement unsound virtual machine on top of sound language. When developers uses indices to simulate references, to circumvent borrow checker, they are doing exactly that. Then again, a sound façade can be built around unsound code, like `indextree` crate does:

> This arena tree structure is using just a single `Vec` and numerical identifiers (indices in the vector) instead of reference counted pointers. This means there is no `RefCell` and mutability is handled in a way much more idiomatic to Rust through unique (`&mut`) access to the arena. The tree can be sent or shared across threads like a `Vec`. This enables general multiprocessing support like parallel tree traversals.


it's a pain if you want children to reference their parents while their parents reference them

Similarly getting a mutable reference to a leaf while parents have references to children is awkward


I know just Rust basics, what specifically makes it awkward? I'm not a heavy Rust user, so I'm having a hard time following, do you have any simple pseudo-Rust code you could share? I'm also a basic CRUD app dev by day, hobbyist by night.


I'm not a rust expert by any means, but I believe there's a problem with lifetimes. There are many ways to implement double-linked lists (think C++ smart pointers), but when you try to squeeze performance and use references, then it's a fight against borrow checker, which ends with the need to use unsafe rust.

smarter sources: https://rust-unofficial.github.io/too-many-lists/ https://softsilverwind.github.io/rust/2019/01/11/Rust-Linked...


Borrow checker is just a checker, which ensures that programmer doesn't write obviously wrong programs. It's obviously wrong to create cyclic references, even in a language with garbage collector, such as JavaScript. Why you need to fight with checker?


Regarding the borrow checker, what you say is not correct. The borrow checker's rules are stricter than absolutely necessary. It is designed to reject all invalid programs, but it also rejects some valid programs.


Yes, rust compiler team improves borrow checker from time, to allow more valid programs to pass the checker. However, in practice, it's always possible to use unsafe code to do the job, and then build a safe façade for it.

In this particular case, borrow checker does it job as designed. So, why to fight it?


garbage collectors detect cycles. you may be thinking of reference counting. CPython includes a gc for cycle detection


Some GC detects cycles, some not (reference counting is example of GC which does not), but even those which detects cycles, may not able to detect all of them or it can be expensive as, for example, in typical Mark&Sweep GC, because it may require to stop the world to perform GC, which is unacceptable for system programming languages, like C, C++, Rust.


You can have completely pauseless GC tracking, also in Rust.


Why would it be obviously wrong?


When tree will go out of scope, their nodes memory will be reclaimed automatically ("dropping" in Rust terms) by automatic garbage collector built-in into compiler: `drop(root)->drop(leaf_a),drop(leaf_b)`. However, if leaf will have reference to its parent, then an infinite loop occurs: `drop(root)->drop(leaf_a)->drop(root)->drop(leaf_a)...`

The solution is to use an alternative automatic garbage collector instead of compiler built-in: arenas, RC with weak references, or Mark&Sweep GC. Arenas have better performance, because they drop all nodes at once. The easiest way to quickly implement an arena in safe Rust is to use vector (array) of nodes and used indices instead of direct references.


You've said it would be obviously wrong "even in a language with garbage collector, such as JavaScript." But, obviously, most modern GC can handle cyclic references just fine.


But even languages with Tricolour Mark&Sweep GC, which handles cyclic references just fine, it's still possible to make memory leak via a dangling reference to a node in a complex cross-linked graph, because language and GC allows that.

Rust by default forbids cyclic references, by forcing to use trees, which completely avoids the problem.


You can leak memory in any complex project, even if you only use safe Rust.

Linux kernel uses doubly linked lists, Redis uses doubly linked lists, V8 JS engine uses doubly linked lists. Have their authors chosen something obviously wrong?


Rust uses double linked lists, they are not harder to implement in Rust than in any other language. Moreover, built-in borrow checker will help to implement them properly, without memory leaks or use-after-free. What is your point?


My point is that it's not obviously wrong to create cyclic references.


OK, it's was obviously wrong to create cyclic references few years ago.


But doubly linked lists use cyclic references…


Double linked lists use double links.


Some people are trying to use references everywhere everytime, instead of (safely wrapped) pointers. It's a mental lock.


A templating language would be a great case for using an arena allocator since objects will only live as long as it takes to fulfill the request.


Yes! I did in fact try this but I was unable to find a good way to make the API work.


Interesting! I suspect a templating engine that uses arenas would be a holy grail for rust web perf, so I'm curious about your experience trying to use them. As for making the API work, do you mean that the arena API was difficult to use, or the object interface you built on top of it was difficult to implement in a way that was accessible to template execution?


It depends a bit on what you want to do. I made one attempt where I needed to carry a 'arena lifetime everywhere which was really tricky. A second attempt was to tag the Value internally with an ever increasing ID of arenas. However the latter becomes dangerous very quickly because nothing binds them together and you end up with a lot of unsafe code and I had no trust in my creation.


Yeah that makes sense. Maybe arenas will be viable once the working group makes progress on allocators. https://github.com/rust-lang/wg-allocators


To add some balance to everyone slating Jinja in the comments, I've personally found it great to use.

Sure you CAN write unmaintainable business logic spaghetti in your templates, doesn't mean you SHOULD (Most criticism appears to come from this angle).


I really don't like Jinja. The syntax is not for me and there's often three different ways of building the same thing. I find it a bit confusing.

Recently I started using JinjaX (https://jinjax.scaletti.dev/), which is weirdly amazing. I think the comparison on the homepage explains it quite well. It feels like what Jinja always should have been and and the multiple ways of doing the same thing converge into one syntax in JinjaX.

I really hope the project gains more traction.


I don’t think that Jinja is the right way to build HTML heavy anything. Look at JSX for that. However I do find it interesting how this is a constant back and forth.

I remember when the template engine that everybody in Python used was kid (precursor to genshi) which was XML based and looked a lot like vue/jsx does today. Back in the day people did not like it at all because designers found it hard to work with.

Now the situation is a bit different but what do I know where we land in 10 years. It feels like a constant cycle ;)


I think JinjaX was inspired by JSX. Obviously, there's a huge delta, but my "HTML heavy" problems are nicely solved with JinjaX


IDK. I always though Jinja is the best templating engine and was surprised by your complaints and the Jinja sample on that website. What doesn't he like? Looks totally legit...

After reading JinjaX sample, I see the point. I would agree that it's much cleaner... If not for the fact he uses capitalized tags to distinguish between templates and html. I hate it. First off, it just throws me off, I cannot tell at the glance anymore where is the real text, and where is templating voodoo-magic. How do I find the source files fot layout and pagination? Right, I just have to know the convention. Second, it's remotely justifiable and can be attributed to a matter of getting used to only if it produces html. Templating engines are not restricted to html. It could be producing markdown, it could be producing XML. If only the author of this library restrained himself to be less funky and not introduce any ambiguous idioms, it would be pretty perfect. But as it is, I don't think I could use it.


Different tastes, I guess. It's the same syntax that many JavaScript Frontend frameworks like react and Svelte use. I find it much cleaner.

When using Jinja with HTMX you also run into trouble that it doesn't nicely support partials. You could use the jinja_partials library but I just find JinjaX so much more pleasant.


For what it's worth MiniJinja lets you render individual blocks if you so desire: https://docs.rs/minijinja/latest/minijinja/struct.State.html...


> JinjaX components are simple Jinja templates. You use them as if they were HTML tags without having to import them: easy to use and easy to read.

…and impossible to understand and keep track of what goes where.

No, thanks. Components are a good idea, and no imports is a horrible idea.


Do I understand JinjaX correctly, that it's specifically for HTML templates, not text templates in general?


Yes


I wanted to make a cli tool for rendering templates with mini jinja and also for users to write their own little library of functions/filters etc in Lua using mlua and then be able to use those in their templates. Unfortunately didn't work. All kinds of issues with mini jinja and mlua interactions when trying to add_filter, add_function, etc.

This was awhile ago though. Maybe something changed and I should try it again


You cannot register filters with a non 'static lifetime. However it’s not super hard to work around. Usually an Arc<Mutex> is all you need.


ha man I hate Jinja. Nothing against this or Rust, just from working in Flask I have borderline nightmares. That might say more about my team that the framework though.


Every templating engine has to find a balance between doing what you need it to do, and letting you write business logic in the template itself. (Hey there, PHP.) That said, it's possible to write pathlogical templates in just about anything.

I've used Jinja a lot and always thought that it struck just about the right balance.

The docs are also way better than they used to be.


I’ve been using Jinja2 for years and years, probably around a decade now. I don’t remember any other docs?


Yeah the docs never really changed all that much. They look more of less the same for 15 years or so.

https://web.archive.org/web/20110727181325/http://jinja.poco...


The answer is not to use a templating engine. TSX is the gold standard here IMO. Using it with https://fresh.deno.dev/ is soooo much nicer than old school templates.


Could you elaborate? I think it’s fine? Although I have always subscribed to the “Two Scoops” philosophy and keep my templates as stupid as possible. Use a variable, simple conditions, etc. Anything more complex should be handled at a higher level.

Go templating on the other hand, that drove me bananas.


I got used to React components, and after that templates feel a bit cumbersome and un-ergonomic in places.

I’ve been using Django templates since 2006. Jinja was designed after Django templates of that era, fixing their most glaring problems. Don’t remember the exact year when I started using Jinja a lot. Maybe 2013.

For example, a pure existence of an {% include %} tag. People use it, implicitly passing many arguments to some other template. Which makes it very hard to understand and change later.

Macros are not that pleasant to use, there are less features in macros then in Python functions, and tools support is much worse too (static analysis, formatting, etc).

Filters… why do I need to write `value|string` instead of `str(value)`?

Etc.


The benefit of filters is that they don’t share the namespace with the template context and I would argue are more readable.


Not sharing the same namespace is a benefit, I agree. Not a huge one, as I don’t name my variables “map”, “string” and similar anyway. And would not look favorably if any of my employees did that.


I have the opposite memory :) I remember writing complex applications in Flask and it just worked right away, and as our performance / complexity grew it kind of kept up because it was very modular.


I worked on a project that implemented static type checking for dbt projects, which are made of SQL templated with jinja and some yaml configs. I'm not sure if I hate it, but it certainly was fun.



Wait until you have to work in Mustache/Handlebars, you'd love Jinja


IMHO Jinja was never worth salvaging. It's a bandaid for not being able to mix Python and HTML easily.

Super slow parsing. Unintuitive DSL (opinionated enough to be a PITA). Huge LoC vs the value it contributes to your project.

Adopt a different system.. see: ERB, PHP, EJS- all of these are extremely fast and intuitive (just the native language + an extra tag or two).

Edit:

Of course.. this is authored by the jinja guy.


Except no cultured adult programmer would use the original PHP syntax for templating, unless it's some 5-minutes worth development script. It's too powerful and too complicated. People use smarty or twig, which are basically the same thing as Jinja.


I always liked TAL, it kind of made sense to me that the templating engine itself understands the syntax of the language you are templating. It ought to be impossible to write invalid HTML/XML. It's like the pain you get into when templating YAML with jinja, but if the YAML template also were itself a valid YAML file it would be much cleaner imho.


I don't disagree but we ended up with systems like Jinja because that's what people adopted. I also built GHRML which was a syntax aware template engine for generating XHTML documents and it never really took off.

Sometimes it's the simple and dumb solutions that win out over the cleaner stuff. Nowadays maybe less so, JSX clearly won and if you build UIs today that's probably what you should go for. However there were many solutions like JSX a long time before and somehow the value proposition was not right for the software written at the time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: