The weird of function-local types in Rust

noelwelsh · on Aug 21, 2024

This seems like an oversight in the design of Rust. I would think that each function call should create a distinct function-local type, so the trick they use to extract the type from the function shouldn't work. I think what's needed is path-dependent types [1] as found in Scala.

[1]: http://lampwww.epfl.ch/~amin/dot/fpdt.pdf

GrantMoyer · on Aug 21, 2024

Rust generally uses lexical scoping, and each function/closure has a unique (possibly anonymous) type per definition, not a type per call. I would therefore expect local types to be per definition too, so the behavior seems fine to me.

tialaramex · on Aug 21, 2024

Why "possibly anonymous" ? I don't think we can ever name any of these types. Rust's Existential types exist so that we can say we return such a thing, without being able to name it.

GrantMoyer · on Aug 21, 2024

You're right, and I was being imprecise. All closure and "function item" types are unnameable; only function pointer types can be named, for example `fn(int) -> int`[1].

[1]: https://doc.rust-lang.org/reference/types/function-item.html

noelwelsh · on Aug 21, 2024

I disagree. If I define two modules

mod one { struct Cat { name: String } }

and

mod two { struct Cat { name: String } }

I have two distinct types called Cat which are not equivalent.

one::Cat != two::Cat // doesn't compile but illustrates the point, I hope

Similarly, when I call a function I create a new environment (think, stack frame) for each call which contains values that are distinct from all other calls. I would expect the same to hold for types defined within a function.

masklinn · on Aug 21, 2024

> Similarly, when I call a function I create a new environment (think, stack frame) for each call which contains values that are distinct from all other calls. I would expect the same to hold for types defined within a function.

Rust types are not runtime objects.

Also just because a function call creates a new environment doesn't mean everything is part of that environment. `static` items are singletons, even if defined within a function (which is a common case when the function should be the only thing directly interacting with the static).

GrantMoyer · on Aug 21, 2024

I don't think the analogy to modules is quite right. I think that maps better to:

  fn foo() {struct Cat;}
  fn bar() {struct Cat;}

and foo::Cat != bar::Cat. Whereas the a single function with a local type maps better to:

  mod foo {struct Cat;}
  mod bar {pub use ::foo::Cat;}
  mod baz {pub use ::foo::Cat;}

and bar::Cat does equal baz::Cat.

But maybe I only think that construct maps better because I'm predisposed to the interpetation I described. I do see what your saying, and agree that Rust could work that way; I'm just not convinced it's a bug that it doesn't.

The behavior you describe would be more surprising to me than the existing behavior, but clearly that's not a universal sentiment, and I'm not sure which behavior would be less surprising to most people.

noelwelsh · on Aug 21, 2024

Seems reasonable to me. I was thinking more of closures, which capture their environment, but those are of course a distinct type in Rust.

ulbu · on Aug 21, 2024

a stack frame is a runtime object, while a type exists only to the compiler. the suggestion to create it in a call just makes no sense. a type is a definition, not an instance.

dathinab · on Aug 21, 2024

> So there is just no way to refer to the User struct outside of the function scope, right?...

no matter what tricks you come up with, treat it as that (in case of it being associated to a type treat it as a anonymous type accidentally expose)

also please _never_ place a module in a function, for various subtle reasons it's technically possible but you really really should not do it

I mean in general limit what items (types, impl blocks) you place in function to very limited cases. If you have a type complex enough so that you need a builder defined in a function you are definitely doing something wrong I think.

> Does this mean generating child modules for privacy in macros is generally a bad idea? It depends...

IMHO if we look at derive like macro usage, yes it's always a bad idea.

Derive like thinks should mainly generate impl blocks, if it really really is necessary types and only if there really is no other way modules.

Furthermore they should if possible not introduce any of this in the scope. E.g. it's a not uncommon pattern to place all generated code in `const _:() = {/here/};` which is basically a trick/hack to create a new scope similar to a function scope into which you can place items (functions, imports, types, impl blocks) without polluting the parent scope (and yes that doesn't work for modules they are always scoped by other modules).

So does that mean the builder derive does it all wrong?

I don't think so sometimes you need to do bad decisions because there are no good solutions.

jerf · on Aug 21, 2024

I have found, across several languages I've used, that types embedded into functions are generally a bad idea, and I think the general principle is that types generally end up needing to be exposed to any code that will also test that code. So, for instance, it's fine to confine types to some particular module, as long as those types are internal-only, but confining them within functions generally becomes a bad idea.

I know the complaints many of you are gearing up to type, but my statement is a bit more complicated than you may have realized on first read; the key is the word "becoming", that I'm looking at the lifetime of the code and not a snapshot. The problem with embedding types into those smaller scopes is that while it may work at first... of course it does, it compiles, right?... they become an impediment to a number of operations over time. First, as I mentioned, testing is very likely at some point over the evolution of the module to want to either provide input or examine output, intermediate or otherwise, that exists in those types. Second, as the code grows, you want to be able to refactor things freely, and types embedded in functions form a barrier to refactoring because to refactor you'll have to do something to expose that type now to multiple functions. You do not want barriers to refactoring. Barriers to refactoring are a bigger expense over the long term than any small local gain from putting a type here instead of there, especially when anyone should have "Jump to Definition" readily available in this post-LSP era.

Considered over time, over the evolution of the code base, I've just never had any super-local types like this "survive". Every time I think I've found an exception, I've either had my test code or the desire to refactor force me to lift it to the module level. So I just start there now.

To the extent there is an exception, testing-only code may be. Testing-only code has very different constraints than production code anyhow. Even then, though, I still find that refactoring problem arises, and test code needs to be refactorable too.

On the plus side, while I label them "a bad idea", they are not a "bad idea" that destroys your code base or anything. On the grand scale of "bad ideas" in code, this is down in the "inconvenience" part of the scale. It is almost self-evidently not some sort of disaster and I am not claiming it is. You can always lift it out and move on. But it is one of the many little hygiene habits that add up that helps keep code fluid and refactoring always available to me at a minimum activation-energy cost, because that is really important.

(This applies specifically to types that you explicitly define. You can in Haskell, for instance, bash a new type together anywhere simply by creating a tuple (x, y). But this doesn't trigger what I'm talking about because any other bit of the code can bash the exact same type together simply by creating another tuple of the same type, and they'll unify just fine without having to share a type definition in common. No impediment of any kind is created by a new tuple type in that language.)

Joker_vD · on Aug 21, 2024

One of the somewhat useful (but still mind-boggling) uses of local types I've encountered was in Go, writing a custom unmarshaller:

    func (s *MyLovelyStruct) UnmarshalJSON(b []byte) error {
        err := json.Unmarshal(b, s)
        if err != nil {
            return err
        }

        return validateAndMassageMyLovelyStruct(s)
    }

I want to re-use the default "dump JSON key/values into the struct's fields" logic, then add something on top of it. But as written, this method will blow up with stack overflow because json.Unmarshal(b, s) will call s.UnmarshalJSON(b), if it can. So what you can do is this:

    func (s *MyLovelyStruct) UnmarshalJSON(b []byte) error {
        type IncognitoStruct MyLovelyStruct
        tmp := IncognitoStruct{}

        err := json.Unmarshal(b, &tmp)
        if err != nil {
            return err
        }

        *s = MyLovelyStruct(tmp)
        return validateAndMassageMyLovelyStruct(s)
    }

The IncognitoStruct, even if it has exact same fields as MyLovelyStruct (and is castable to it), does not have any of its methods, so json.Unmarshal(b, &tmp) does not recursively call this UnmarshalJSON() method.

But even that uses type aliases, not the proper, data-holding, types themselves. I never found any motivation to use those; the package-local types are quite enough.

oefrha · on Aug 21, 2024

Sometimes you just want to JSON-encode a []struct{<ad hoc stuff>} or something like that, so it’s entirely reasonable to use a func-local named type rather than repeating the anonymous struct.

And to gp: if your func local type ends up observable and even testable, of course it shouldn’t be func local. Otherwise you’re describing testing implementation rather than behavior, indicating you’re writing bad tests.

jerf · on Aug 21, 2024

"Otherwise you’re describing testing implementation rather than behavior, indicating you’re writing bad tests."

Yeah, people have been threatening me for decades with the claim that if I write tests to test internals I'll have to refactor like crazy someday. I'm still waiting for someday to come. Meanwhile, it has caught a lot of bugs.

I'm open to the possibility that there's something different about the way I write code that causes me to not have this problem. Stay tuned to my blog over the next couple of months if that intrigues you. In the meantime, as reality fails to correspond to theory, I go with reality.

masklinn · on Aug 21, 2024

I don't know how common it is in the wild, but deserializing to a function-local type is routinely used by Serde's documentation for examples e.g. https://serde.rs/deserialize-struct.html

nkozyra · on Aug 21, 2024

Wouldn't generics in the validateAndMassageMyLovelyStruct() function avoid this kind of workaround?

jerf · on Aug 21, 2024

No, because the method infinitely recurses before then.

jerf · on Aug 21, 2024

Yeah, that's one in my bag-of-JSON-tricks. (Hopefully my bag-of-JSON-tricks gets at least a little less populated with json v2.) This is also useful if you want to selectively override a particular Unmarshal for any other reason, which is what I needed it for. But then I needed to customize the Unmarshal and back to a top-level type it went. :)

But this sort of thing is why I tried to emphasize at the end that I'm not trying to "ring the alarm bell" or anything. When it works, it works, and it's not like I would call what you have there Bad Code or anything. I personally would have pulled it to a top level type immediately and that's just a preference, not something I'd go to bat over in a code review or anything.

tazu · on Aug 21, 2024

> First, as I mentioned, testing is very likely at some point over the evolution of the module to want to either provide input or examine output, intermediate or otherwise, that exists in those types.

I have not found this to be true at all. I frequently have very long (300+ line) pure functions that do one thing, and the tests are designed to be oblivious to whatever intermediate representations are used. In fact, I think it's an anti-pattern to pull types out just for testing: tests should not be so granular that they affect how you design functions.

For example, a function that takes a JSON string containing multiple objects and returns an SQL string for a batched INSERT operation. I can easily achieve 100% coverage with a table-driven test just checking inputs and outputs.

I frequently use function-local types and function-local functions that are reused within the function. Testing has never been a problem.

> Second, as the code grows, you want to be able to refactor things freely, and types embedded in functions form a barrier to refactoring because to refactor you'll have to do something to expose that type now to multiple functions.

This hasn't been my experience either. When refactoring, I'm usually doing more encapsulation, not less. On a first pass, I write everything to have access to everything else. Only after I have a clearer idea of boundaries do I refactor.

chowells · on Aug 21, 2024

Sometimes I think I want local types in Haskell. I'm creating some new type and some instances for it solely for this function, why does it need global scope?

Then I get around to remembering module scope is the actual important thing in Haskell, make sure that type isn't exported from the primary API modules, and get on with life.

saghm · on Aug 21, 2024

Yeah, I think it's pretty rare to actually need a "local" type as opposed to just making it private to a module in Rust as well. The use case the article gives is one of the few where it I could see it being useful; if you're using a macro in the body of a function and want it to generate a type, there's not anywhere else you can define it (without an additional macro invocation outside the function, which often defeats the purpose of trying to wrap up all the boilerplate into one concise place).

3np · on Aug 22, 2024

One major distinction with this is between structural types (if the shape is right, it fits) and nominal types (only the same reference will do).

For nominal types, you also want to pay an eye to only ever refering back to the same one-true-definition and not duplicate them (example: TS enums and classes). Whereas with structural types (example: TS interfaces) you can get away with being a lot more ad-hoc without having to enforce a globally unified structure. This highlights the value of the age-old principle of always exposing interfaces rather than implementations in your typings (every exposed class should have at least one corresponding interface).

This is why the Haskell example works out fine since Haskell tuples are structural.

epage · on Aug 21, 2024

> To the extent there is an exception, testing-only code may be. Testing-only code has very different constraints than production code anyhow. Even then, though, I still find that refactoring problem arises, and test code needs to be refactorable too.

For me, I avoid defining anything within a function except when that thing being defined is what is being tested in a test, e.g. https://github.com/clap-rs/clap/blob/87647d268c8c27e3298b2c0...

edflsafoiewq · on Aug 21, 2024

I usually see them used for drop guards in Rust, ie. an alternative to try-finally.

lesuorac · on Aug 21, 2024

In general sure.

However, in a specific case, I find local types useful to do that for code that makes JSON requests to another service. You could care that the request has a certain intermediate structure but serialization is fairly deterministic so I don't see an advantage of that over just testing you send a HTTP request with a valid (String) body.

If you control that endpoint then it would make sense so share that type between them but if you don't then might as well make the type scoped to just that method.

monocasa · on Aug 21, 2024

I have used function local enums for the states of a function local state machine pretty successfully.

jerf · on Aug 21, 2024

Once you pass trivial that would definitely pass into the realm of thing I'd want to expose to my tests. "Starting from this state and given this set of inputs do I get to this state?" is a pretty basic test, and maybe there's a bunch of people way smarter than me who can naturally run complex state machines in their head, but I find myself frequently surprised by at least one thing they do and I do not generally just splat them down into the code correctly on the first try.

monocasa · on Aug 21, 2024

In the cases where I do this, it's when I try really hard not to actually expose the state machine to tests anyway because there's a better way to hit them in tests with their expected inputs and outputs rather than an implementation detail of the state machine itself.