Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Semantic Import Versioning (swtch.com)
80 points by SamWhited on Feb 21, 2018 | hide | past | favorite | 37 comments


From the "Avoiding Singleton Problems" section:

> Another problem would be if there were two HTTP stacks in the program. Clearly only one HTTP stack can listen on port 80; we wouldn't want half the program registering handlers that will not be used. Go developers are already running into problems like this due to vendoring inside vendored packages.

This is only a problem if you allow nested vendor/ directories, which "dep" (you know, the "official experiment" that suddenly got discarded to the surprise of its developers) doesn't have because it recurses through the entire dependency tree and reduces it to a single vendor/, just like many (most?) other languages.

The whole post reads like the author thinks Go has a very unique dependency management problem that no other language ever had which somehow necessitates a completely unorthodox solution. Three blog posts into "vgo", I still don't see why...


Go doesn't have a unique dependency management problem. This problem is shared by many other languages that have all solved it poorly. I haven't written much Go and am not a big fan of the language, but I am watching this discussion eagerly because a successful solution in Go will be an example for other languages to follow.


Curious about what languages you think solved it poorly, and why.


I guess now that I think of it, all the solutions I've seen break down into two techniques.

First, loading different versions of the library under the same name to provide to different parts of your code. This has different risks depending on how and when symbols are looked up or linked. In some languages you can end up with the two versions accidentally calling into each other or using each other's symbols. In other languages you run into the "expected Foo but got Foo" type errors mentioned by munificent. That's what happens when you use classloader tricks as a half-assed way of isolating "components" in Java.

Second, loading different versions of the library under different names. This requires hacking the compiled code or the source code; convenience and reliability will depend on the quality of the tools you're working with. Sophistication ranges from using sed to munge source code to using tools like objcopy that can read and rewrite compiled artifacts. Java "shading" (not "shadowing" as I said earlier) relies on rewriting class files.


A solution arguably has been found and it's cargo. I still don't get why it wouldn't work with go. My understanding is that the dep tool worked (works?) very similarly in principle.


Context is important here. Just one or two sections up it describes why having two different major versions of a package in a program can be a good thing (which is something dep don't provide, as far as I know).


I understand, but the implication is that somehow all the other languages out there with collectively decades of experience in dependency management somehow all made the wrong tradeoff and Go is rightfully correcting their mistake. Basically, I think these blog posts should be 80% justification and 20% explanation for vgo, instead of vice versa.


In the first example, it shows how a other package managers may handle it (multiverse, clone arguments). What rsc is suggesting is that we learn from our past.


Can you expand on "you know, the "official experiment" that suddenly got discarded to the surprise of its developers" and where that announcement was made?


From sdboyer, developer of dep: https://sdboyer.io/blog/vgo-and-dep/

"vgo, as currently conceived, is a near-complete departure from dep. It was created largely in isolation from the community’s work on dep, to the point where not only is there no shared code and at best moderate conceptual overlap, but a considerable amount of the insight and experience gleaned from dep as the “official experiment” is just discarded."

"Now, maybe the benefits of vgo’s model will be so profound that these losses, and the difficulties of more experimental churn will be justified. i can’t prove that won’t be the case. And, if it does turn out that way, it would be stupefyingly hypocritical of me to protest, having spent the last year and a half asking the authors of other Go package management tools to gracefully bow out in favor of dep. All i can say for sure is that i’m sorry: to expose the community to yet more churn, and to anyone who feels like i’ve misled them about the process, and especially to dep contributors, who may find this announcement to feel rather like a punch in the gut. For whatever it’s worth, none of that was ever my intent, and this is not the path i believed we would be walking."


This article actually made the issue finally click for me: Go is having trouble solving this problem because it has more going on in the global environment, like its ancestor languages, and unlike more modern languages. They need to come up with a complicated framework to reign in their spooky action at a distance, because they have lots of implicit global relationships where we might prefer more explicit and local ones.

Yikes.


> where we might prefer more explicit and local ones.

could you expand on this? who is "we" and what are those explicit and local relationships? are you talking about an opensource ecosystem or a private enterprise?


In this case I mean open-source communities and the companies that heavily rely on them.


I'm not sure I understand your point. It seems you're implying (please correct me if I'm wrong) that ecosystems of modern open-source languages are more local, more tightly coupled and having people working in concert towards a common goal.

In my experience this is often what happens when a new community is spawned. E.g. when nodejs started it was there was always one single implementation of that thing I needed and it was reasonably up to date with the rest of the things around it (including the runtime environment, of which there was only one or perhaps two mayor versions of). As time went on more people started to contributing, often with different levels of commitment.

Often "modern" gets conflated with "young". Almost by definition, young communities don't develop the same kind of problems of mature communities, yet.


I think the main example in the article is talking about incompatible types, not global variables, so the issues apply more or less the same in any language (Java, Python, C++, etc), since there is generally no guarantee that one version of a library can work with objects of another version of a library at run time (since the private fields may be different between the two versions).

There's a brief note on singleton problems, so that might be an issue with code that uses them more, but the example given (listening on port 80) is also more or less language independent, since there is only one port 80 whether your code uses singletons or not.


I like this quiet a lot, it fits very nicely in the go stack.

Absent this article is a discussion of alternatives to renaming. Reading the article would hint that a semver major bump would just leave you high and dry! This is certainly true in some ecosystems like Ruby or Java < 9. But other ecosystems have solved this problem at the language level. Javascript, Rust, and others allow you to import multiple versions of a module so long as you don't expose types from that module in your public interfaces (not enforced by JS -- but by convention). You still reach the problem the article references once you have those types in your public interfaces or they use singletons. That means that your langauge's package manager needs to handle these dependencies differently (peer dependencies) giving you more flexibility at the cost of additional complexity in package management.


Basically some motivation and how to use go + semver.

But is there a way to statically-compile dependencies? Is that even a thing?

(I'm not a go user so forgive me if I need a good RTFM session.)

It seems like a lot of these problems come from two dependencies wanting different versions of a third dependency.

Instead of just depending on a dependency as a semver string, I could (theoretically I think) depend on a statically-compiled version of the dependency so it's free to call whatever libs it wants - effectively eliminate the concept of transitive dependencies.

For projects with large dependency graphs, you may end up with relatively large binaries since you end up with lots of duplicate object-code for common libraries, but I wonder how much of a problem this actually is (and if simple de-duping may solve a huge chunk of it).

We spend lots of engineering effort to resolve dependencies as source just to end up compiling them into our executable anyway.

I'm sure this isn't a new concept, but it struck me as odd that go is fighting with it so much recently considering statically-compiled, "library-free" executables is definitely in go's wheelhouse - why not extend it to libraries?


Just to briefly jump in:

statically compiling dependencies: roughly equivalent to "I want to link against a Go lib"? AFAIK the standard answers are "use cgo" (which has problems[1]) and "maybe plugins?"[2] but I don't know how plugins are working out.

static compilation solving transitive dependencies: only if you have no singletons. In that case, yes! But the outside world is a singleton - anything on disk, through network, etc, anything with external state now has to deal with all past and future versions of the code simultaneously. Possible, but difficult and/or wasteful. For pure code though, yeah, this is a great solution, and it's part of why npm's "use all the versions" works out for micro-libs so well - they tend to have their state injected, which allows you-the-user to control how versions interact (if at all). Unfortunately you're left with node_modules[3] (as you mentioned).

[1]: https://dave.cheney.net/2016/01/18/cgo-is-not-go + you just inherited a multi-threaded lib, which may cause forking problems later.

[2]: https://golang.org/pkg/plugin/

[3]: https://pbs.twimg.com/media/C3SOI-_WAAAM4Js.jpg


This could still become a problem (if each dependency statically bundles its own dependencies) if any of the types are exposed in an API. For example say there is a Twitch and Facebook SDK that both expose a SetUserProfile(Image) function, and Image is provided by a third library (that is used by both SDKs).

If both SDKs use different versions of the Image library, the internals of the Image struct may have changed (maybe they added support for black and white images). Even though the change may be backwards compatible at a source code level, an Image struct v1.15 wouldn't work as an Image struct v1.16 unless it had the same internal private fields (and they were used in the same way).

Now, your code that loads an Image from disk can't work. It will either use v1.15 and be compatible with Twitch, or use 1.16 and be compatible with Facebook, but not both. A shared dependency works around this issue as long as the changes are backwards compatible - in this case Go would see that the minimum required version is v1.16 and both SDKs (and your code) would use that (or a newer version if you list it as a requirement).


The problem also happens in static linking, this is as old as the library concept.

For example, the public symbols of the libraries might collide, they might have side-effects that misbehave because there are multiple versions, they can rely on yet on another library that can actually only be linked exactly once, ....


Isn't that just a 'bug' in the algorithm for code-layout - that the symbol names weren't unique enough or something? This would require some breaking-changes in the way libraries are laid out and linked against, but basically you "should" be able to statically compile a dependency and then effectively hide all the symbols from its dependencies so nothing else knows how to link against them directly.


Am I over simplifying things in the article to say

"At some point you cannot transparently support multiple target platforms"

we are all used to different builds for intel, ARM, 32/64 etc. why should we be surprised to see Azure and AWS as fundamentally incompatible.

I mean I know i was horrified when I grepped my nose dependancies to find 900+ packages, but i was pleased to find i had a clear version number on each of them. (Yes I am hand-waving what the depenancy manager resolves which is I guess the point of this post in some manner, but the post here seemed to be saying when you have incompatible requirements you are stuffed. And yes, that's true. So don't have transitive errors - this is only a problem for package maintainers not for developers, and so i suspect is a package aggregation problem?


> "Incompatible changes should not be introduced lightly to software that has a lot of dependent code. ..."

I certainly agree that “incompatible changes should not be introduced lightly.”

This is agreeing with a sentence that the semver authors didn't write. The clause "that has a lot of dependent code" isn't in there arbitrarily.

What everyone in an ecosystem wants is high quality, easy-to-use, stable packages. In a perfect world populated by programming demigods, v1 of every package would be all three of those. In practice, human software engineers do not design usable APIs and write robust bug-free code without feedback from users. In order to act on that feedback, they need to change their code, which sacrifices stability.

The way this works in other healthy package ecosystems is that packages have a lifecycle. Early in the package's lifetime, it is undergoing rapid, breaking change while it finds its way. It can do that relatively easily because there are a small number of users harmed by the churn. If it gets popular, that implies it has found a good local optimum of design and quality. At that point, stability takes precedence and the package's evolution slows down.

The path to a great library is usually through several versions of a kinda-shitty one. A good package manager supports both maintainers and consumers working on packages at all stages of that lifecycle.

> Able to predict the effects on users more clearly, authors might well make different, better decisions about their changes. Alice might look for way to introduce the new, cleaner API into the original OAuth2 package alongside the existing APIs, to avoid a package split. Moe might look more carefully at whether he can use interfaces to make Moauth support both OAuth2 and Pocoauth, avoiding a new Pocomoauth package. Amy might decide it’s worth updating to Pocoauth and Pocomoauth instead of exposing the fact that the Azure APIs use outdated OAuth2 and Moauth packages. Anna might have tried to make the AWS APIs allow either Moauth or Pocomoauth, to make it easier for Azure users to switch.

Those decisions are only "better" because they route around a difficulty the package manager arbitrarily put in the first place.

There is already plenty of essential friction discouraging package maintainers from shipping breaking changes arbitrarily. Literally receiving furious email from users that have to migrate is pretty high on that list. I don't see value in explicitly adding more friction in the package manager because the package manager authors think they know better than the package maintainer how to serve their users.

> To be clear, this approach creates a bit more work for authors, but that work is justified by delivering significant benefits to users.

Users don't want all of the work pushed onto maintainers. Life needs to be easy for maintainers too, because happy maintainers are how users get lots of stuff to use in the first place. If you push all of the burden onto package maintainers, you end up with a beautiful, brilliantly-lit grocery store full of empty shelves. Shopping is a pleasure but there's nothing to buy because producing is a chore.

Good tools distribute the effort across both kinds of users. There's obviously some amortization involved because a package is consumed more than it's maintained, but I'm leery of any plan that deliberately makes life harder for a class of users, without very clear positive benefit to others. Here, it seems like it makes it harder to ship breaking changes, without making anything else noticeably easier in return.

> They can't just decide to issue v2, walk away from v1, and leave users like Ugo to deal with the fallout. But authors who do that are hurting their users.

Are they hurting users worse than not shipping v2 at all? My experience is that users will prefer an imperfect solution over no solution when given the choice. It may offend our purist sensibilities, but the reality is that lots of good applications add value to the world built on top of mediocre, half-maintained libraries. Even the most beautiful, well-designed, robust packages often went through a period in their life where they were hacky, buggy, or half-abandoned.

A good ecosystem enables packages to grow into high quality over time, instead of trying to gatekeep out anything that isn't up to snuff.

> In Go, if an old package and a new package have the same import path, the new package must be backwards compatible with the old package.

This doesn't define for whom it must be backwards compatible. Breaking changes are not all created equal. Semver is a pessimistic measure. You bump the major version if a change could break at least one user, in theory. In practice, most "breaking" changes do not break most users.

If you remove a function that turned out to not be useful, that's a "breaking" change. But any consumer who wasn't calling that function in the first place is not broken by it. If maintainer A ships a change that doesn't break user B, a good package manager lets user B accept that change as easily as possible.

As far as I can tell, the proposal here requires B to rewrite all of their imports and deal with the fact that their application may now have two versions of that package floating around in their app if some other dependency still used the old version. That's pretty rough.

What you'll probably see is that A just never removes the function even though it's dead weight both for the maintainer and consumer. This scheme encourages packages to calcify at whatever their current level of quality happens to be. That might be fine if the package already happens to be great, but if it has a lot of room for improvement, this just makes it harder to do that improvement.


Well, the article calls for a v0, which seems to be exactly for the use case you describe? There are no import path changes, undergoing "rapid, breaking change" is allowed, and if you ever find a good local optimum you can graduate to v1 without any import path change either. I don't see any requirement to ever move to v1, although users may understandably prefer libraries that do. I don't quite understand what additional support you are looking for from "a good package manager".

I'm also not sure this makes it harder to ship a v2. Sure, users will have to change their import paths, although I'm sure tooling like GoLand can easily automate this. But this also frees library maintainers to do extensive API redesigns, without worrying about breaking everything or hanging their existing users out to dry. In particular, the ability to make v1 depend on (and become a wrapper for) v2 is quite nice. Not only does this pattern not break existing code, but it even allows users who have not yet migrated to the new API to benefit from the active development on the latest branch. And of course there is the potential for some degree of automated migration, through inlining wrapper functions as mentioned in the article.


> This doesn't define for whom it must be backwards compatible. Breaking changes are not all created equal. Semver is a pessimistic measure. You bump the major version if a change could break at least one user, in theory. In practice, most "breaking" changes do not break most users.

I think this mean API breaking usually resulting in packages that won't even compile.


This is an excellent post, with many well-reasoned points. Thank you for writing it.


Has something like gcc symbol versioning been talked about? I can probably sense the sneers from some, but I'd imagine there could be an evolution/"go way" to implement it.


This is about software (source code) versioning. Shared library symbol versioning is a completely orthogonal concept.


I feel like the article made its case pretty well, but I really dislike idea that I need to duplicate my library into a "v2/" directory (or a different top level git repo) in practice. Maybe I'm misunderstanding something, but this seems to be exactly what branches are for. If I'm not able to specify a branch name in the package "path" then there's something really wrong.


It wasn't obvious to me either, but apparently vgo translates that into the appropriate git tag, it's not actually a separate directory.


Yeah, it's just the import path that it's changed, but it's still ugly as sin and makes the mapping between import paths and filesystem paths non-trivial.


While I still think SemVer is crap (because of the edge cases) this seems to be a reasonable approach to library versioning.


Mind to expand about SemVer?


It says only majors should break the API, but bugs do it all the time. So that rule is just wrong and gives a false feel of safety.


And the alternative is?


Simply treat every new version as a major release, everything else is a lie.

Sure, you can structure it by "intent" but don't pretend a bugfix can't break your API


I am guessing my view of versioning being the fundamental abstraction for constructing software system is not well shared.

I did not see anything that isn't an approximation of versioning with added semantic tailor to more focused use cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: