I don't think I've learned anything from this article. It's interesting how quickly the keywords pick up on HN (Uber and Go).
More than 3 years ago we've implemented a reverse geocoder web-service that indexed complete Census TIGER dataset. The service handled over 15K reqs/sec on a 2011 macbook, doing exact in polygon search(no approximations). We implemented an optimized (for in polygon search) R-tree data structure for the lookups.
Assuming Uber's geofence lookups would rely on a much smaller dataset than TIGER, I think you could have come up with a much more efficient implementation requiring a fraction of the resources.
In all fairness there are many languages (generally statically typed) which perform better than Javascript. That go is on the list shouldn't be a surprise to anyone.
To store the RTree entries. Say you want to switch between 32 bit and 64 bit floating point types in your implementation, without re-writing the entire thing.
What do you actually need generics for that you can't do with interfaces? Not to mention that compile-time generics (read: generating code for each type) make testing with mock types much, much harder.
This article is a little bit hand-wavy for me. Essentially they could use any languages to implement this, also there is no proof how channels could not be used. I bit more details would be essential to make this a good read, at least for me.
Glad I am not the only one who feel this way. Some benchmarks between load ptr/store ptr and the read-write lock would be helpful. Also simplified code-blocks for each of the approaches and more details into pro/cons of each approach could have helped for better understanding.
Agreed. CSP is one of the selling points of Go, in my view. I expect some overhead to be the price of a simple and easy concurrency model. Some conservative use of locking etc is okay in some cases, and as a non-Gopher those are the cases I'm most interested in.
> Agreed. CSP is one of the selling points of Go... and as a non-Gopher those are the cases I'm most interested in.
This makes sense, and when I first started writing Go, I felt the same way, but almost four years later, concurrency is one of the least important reasons I still write Go.
I wouldn't say all Gophers feel this way, but I know it's a very common experience for Gophers - "you come for the concurrency, but stay for the interfaces"[0].
[0] I've alternatively heard things like "readability", "tooling", and "robustness" used in the place of "interfaces" in this quote.
Wait a minute, is this a troll post? Interfaces? I get the tooling, maybe, and CSP out of the box, I don't understand what's great about Go's interfaces (compared to plain old boring Java, for instance).
Have you had much experience with Go? Go interfaces and Java interfaces are different beasts... Go's interfaces are incredible...take a look at Reader and Writer from the io package as an example - they're two of the most powerful interfaces in the language, and extremely flexible.
> Go's interfaces are incredible...take a look at Reader and Writer from the io package as an example - they're two of the most powerful interfaces in the language, and extremely flexible.
Yup. I literally just gave a talk at two Go conferences in the last two weeks (GopherCon India and GopherCon Dubai) about the io.Reader/io.Writer interfaces specifically. They're deceptively simple on the surface, but they're insanely powerful once you 'get' them.
It would be interesting to know more about the power you seen in `io.Reader` and `io.Writer`. They seem to say, effectively, that every type implements serialization from bytes and serialization to bytes. Generic serialization is not so unusual but maybe in the context of Go can be so much more?
Interfaces in golang are error prone. It's easy to inadvertently implement an interface you didn't mean to. It's also difficult without the aid of some tooling, to determine which set (or whether at all) of interfaces a given structure implements.
There was a blog post a while ago about how golangs's interfaces caused issues in production because they're implicit.
Check out type classes for a superior way to solve this issue (e.g. what Scala or Rust do).
Static ducktyping is a very interesting approach to interfaces that is very different from Java. I won't say how well it works as I've only dabbled in Go, but it struck me as very unique.
It's interesting that they're doing a spatial database without using a 2D spatial data structure. MySQL and PostGres both have data structures for this. But maybe they don't have that many geofences per city.
They have an N readers, one writer problem. It's possible to do that without blocking the readers. There was a YC article on a lockless solution to that yesterday. This is an easier problem. Each city's geofences can be updated by making a copy, updating the copy, and atomically replacing the single reference to the copy. Any searches still running on the old copy continue to run on the old copy. Eventually, the GC deletes the old copy.
I believe that's exactly what they were doing with the StorePointer/LoadPointer primitives. However it is generally agreed in the Go community that the sync/atomic is to be avoided in favor of the sync/mutex for this type of problem. The RWMutex is designed for a 1 writer/N readers scenario. The code is more readable and you don't offload any work to the GC.
> However it is generally agreed in the Go community that the sync/atomic is to be avoided in favor of the sync/mutex for this type of problem.
Why? Concurrent data structures are extremely useful.
> The code is more readable and you don't offload any work to the GC.
You shouldn't write concurrent data structures yourself; you should use a library. And the amount of time spent in the GC for this is going to be negligible assuming writes are infrequent compared to reads. It's rare in a GC'd language that your write operation won't be creating garbage somewhere along the way, so it ends up being amortized.
"Instead of indexing the geofences using R-tree or the complicated S2, we chose a simpler route..."
"While the runtime complexity of the solution remains O(N), this simple technique reduced N from the order of 10,000s to the order of 100s."
It seems odd to me that they're posting about the performance of Go, yet they deliberately chose a less optimal algorithm because the better ones were "complicated". Or, if their simpler approach did in fact run faster than R-trees or S2, it would have been nice to see some benchmarks and a clearer explanation for why. Choosing the best algorithm seems more important than the language in a case like this one.
Sometimes a worse O algo is way better in real life, due to things such as memory alignment and page size. The more time you spend writing high performance code, the more you learn that the kind of algorithm analysis you learn in school is crap in the real world. If my O(N^2) also is 3 lines and smoking in real benchmarks your 600 line O(n log n) algo, guess which one I am using?
Fun bit about Radix is it really scales well if your dataset fits in memory. Last time I benched my lame Java implementation it was about 2-5x faster than the built in Arrays.sort() on native values(float, int) and 20x faster when you started putting it up against Comparable. Trended that way well up to 65k+ entries.
It gets even better if you can drop down to a proper native language that lets you prefetch.
Absolutely. But the impression I get from the article isn't that they benchmarked their solution vs R-trees or S2 and it proved quicker, rather it seems to me they didn't bother trying the other approaches at all because of their perceived complexity. I might have the wrong end of the stick because the article doesn't go into enough detail to truly know either way. If they really didn't benchmark any other algorithms then talking about the performance of the language seems somewhat moot.
I'm not sure that your example shows that the analysis you learn in school is "crap in the real world".
Using O analysis is for seeing how an algorithm scales with respect to the input size. If anything, the fact that someone believes a O(nlogn) algorithm is always faster than O(n^2) is a failure on that person's part in properly understanding what O is used to measure.
Raycasting is a pretty expensive operation since you have to check every edge of the polygon. Honestly I'm guessing they implemented the brute force method and called it good enough. Otherwise they would have posted comparisons as to why they stuck with brute force.
Well they had a clearly defined goal (which is rare!). 99% of requests in <= 100ms. So if brute force was a very simple algo and met the goal - hey why not?
If you can come up with a goal of what you need the response times to be, I think the easiest (in terms of readability, testability, length, etc) solution to meet the goal should win.
I've found that BSP trees are a quite elegant, albeit, more complicated alternative for handling point-in-polygon queries compared to raycasting. Essentially, it decomposes an arbitrary non-convex polygon into a binary tree of half-spaces. A point-in-poly test then just becomes a binary search.
That being said sometimes brute force is good enough.
Yep, I do not get that. Indexing by city first or another single dimensional key (quadkey/geohash) and having an rtree of your geometries inside of that is simple enough. Maybe if they're dealing w/ < 100 geometries per city it would be negligible but I'd assume NYC has more than that.
Also don't know why they chose raycasting over winding number.
I've been working on geo stuff in golang and building the geometry libraries for spatial indexing was easy enough to do in my spare time, so I don't know why they couldn't at least try.
FYI, searching the R-tree also has runtime complexity of O(N) in the worst case. I believe there is some R-tree variant which actually provides asymptotically better performance than the naive case, but it is very complicated and tends to perform worse on average in practice.
Source: just spent two months implementing R-trees in Rust
I was surprised to not see PostGIS at least mentioned. I've been using PostGIS professionally for the last few years and only have good things to say, the toolchain is very mature.
We make pretty extensive use of R-Tree indexes in PostGIS, tables with 10,000,000+ rows and shapes that roughly match streets, parcels, blocks, zip codes. While I don't have 99th percentile data offhand average query time is <1ms. Perhaps there are slow responses at the 99th percentile however I would be very surprised by this.
I would be interested in comparing the performance of Uber's first pass "city-fencing" vs a pre-computed 1D array.
EG:
Subdivide a flat projection of earth into n^2 squares.
Create an array of length n^2.
Set the value of each element in the array to a list of canidate geofences(which have area in that square).
Scale lat and long between 0 and 1. Then you can index directly into it with PrecomputedArray[floor(lat*n+long)]
This is trading space for time, may as well choose space here.
> While historically Uber has been mostly a Node.js and Python shop, the Go language is becoming the language of choice for building many of Uber Engineering’s new services.
It feels like this type of narrative is becoming more common.
Its more of a application life-cycle thing rather than Node.js/Python being bad.
One creates the first version in Node.js/Python as its quick to build and iterate an app. At this stage you are still hammering out exactly how things need to be done and accessing the actual needs of your users. Speed of iteration is the most important attribute for any choice made during this stage as generally you don't have enough scale for choices to matter much and it prevents wasted effort on things that end up on the cutting room floor.
Once you have built the app, and it gets stable major feature wise, you then have plenty of data to drive decisions about what the pieces actually need to be and the tool best suited for them. Go happens to be well suited for swapping out various generic back end pieces with something custom to the problem at hand.
Another way to think about it is the Node.js/Python version is the next step up from diagramming the app out on paper. Its more like the previsulazation stage of a movie (roughly animated, with intern voiceovers). This is important as it can save you a bunch of time on things that are unneeded and can help identify problem areas you should focus on first.
"One creates the first version in Node.js/Python as its quick to build and iterate an app."
That's the narrative, but I don't find Python (I have minimal experience with Node) faster to write in any way but library availability, and very painful to refactor without the ability to reach for a type checker.
I was surprised that they felt Node's performance was slower for computation, given the effort V8 has put into optimization. Particularly he called out "interpreted" and "dynamic-typed". But V8 compiles and optimizes frequently used functions, and recompiles running functions when it detects that they are being called with the same runtime types repeatedly. This runtime optimization can't be quite as fast overall as Go, but I'd be surprised if well-designed JS couldn't run 60-80% as fast. Maybe that's enough difference for them to make the switch.
I work at Uber. We've discovered that in terms of average response time, our node services have remarkable performance. For example, some of our fastest node services have 1-5ms p95 response time. When you start diving into the p99 latency, garbage collection pauses can spike response time to tens or hundreds of ms, and eliminating these spikes is incredibly nontrivial.
Usually, financial companies will use one of the commercial realtime jvms - it's not just some tuning. Javascript being singlethreaded would certainly simplify having a pauseless GC but it's still a nontrivial problem and I'm not sure how much that would benefit the browser use case.
I think the bigger issue here is that Node is blocking while a computationally intensive algorithm is running whereas Go can have multiple goroutines still serving requests while running the algorithm.
You just take the computationally expensive work and run it elsewhere, like a worker pool. This is a standard pattern for this kind of work and what you would do with go too, except the worker pool might just be a number of goroutines.
It's chickens*t to down vote my comment, when you could argue a fact, or explain something to me that I don't understand.
My question isn't silly. I just now wrote two short programs, one in Go and one Javascript, to run a vector multiply and add on two length 1000 vectors, a million times. The JS program is faster.
Result (in seconds):
$ node vmadd.js
0.984
$ go run vmadd.go
1.141044662
Go run first compiles the code and builds an executable. You are mostly benchmarking for both systems the compilation time. Building an executable in Go takes most of a second. A significant benchmark would check the run times.
No, if you examine the code, you'll see the timer times only the accumulated execution time for the loop. The code has been compiled. See below to eliminate doubt -- I used go build and got similar numbers.
Go is lighter than the other two. It can run on lesser hardware (or resources) and is pretty much reliable. The code is easy to write and simple. I don't get to "enjoy" too many smart hacks from others in Go. Its a nice language for what it was meant to do: systems programming. Although Elixir is really shaping up to be a solid competitor. Elixir's syntax and idioms are easier to read, IMO.
It is indeed fast. I'm just waiting for it to mature a little bit more to use it in production. My goal is to start working on production system after summer 2016. By the, more information about gotchas and issues should have arose.
"Instead of indexing the geofences using R-tree or the complicated S2..."
"... we first find the desired city with a linear scan of all the city geofences"
Why not use a spatial index? It's not hard, and you wouldn't need to worry about rebuilding your index because your city geofences are not likely to change frequently.
The bottleneck here isn't rooted in I/O but a better algorithmic approach to the problem.
I'm not terribly familiar with geo algorithms but I'm curious why it's worthwhile taking the two layer approach.
Can't you precompute the X and Y minima and maxima for each geo fence and throw out 99% of candidates extremely quickly? In other words store the bounding rectangle for each geofence. Even with 100,000s of entries we're talking 4 primitive type comparisons per entry to exclude all non-possible candidates. I would have a hard time believing this is slower than ray casting to city geofences.
It would still be faster with a two layer approach.
Check 100 000 entries (4 comparisons each) - > filter out 1% to do expensive calculations on.
Vs
Check 25 cities (4 comparisons each) - > filter out 1/25
Do another easy run on the filtered 4000 and get the same areas to do expensive calculations on as above.
That's 25+4000 easy calculations against 100 000.
But yeah, I'm not sure why they seem to go the hard route from the beginning.
Agreed, this seems silly to build your own which likely performs worse than PosgGIS and has way more bugs.
At least the article could have addressed why that would not work for their use case.
edit: also now they have to classify polygons into groups that wholly fit into some "city" polygon. That seems like it becomes a pain to maintain since you may want to expand a region in the future and it happens to go outside of your previously defined "city". Or maybe you want a region that isn't in a "city", like a "country" or "mid-atlantic" region.
Most things at Uber were first built in the natural "why don't you just" way. Python/Flask monolith, Postgres, haproxy, etc. They are being replaced with fancier distributed systems (microservices, Schemaless, Ringpop, etc) by necessity.
Has the transaction rate surpassed what you can get out of Postgres? If not, what is the necessity for the change? Just curious. Thanks for any insight.
Increased latency, probably. They said they wanted 99% of queries to be 100ms or less. When I use the pgAdmin and run EXPLAIN ANALYZE SELECT 1; I get 0.025ms execution time, but it takes 62ms to send and recieve that.
Or MongoDB geospatial indexing? I used it for a project that required a similar approach (point in polygon). It was quite easy to setup and use, at least at a small scale.
CPU intensive workload. Geofence lookups require CPU-intensive point-in-polygon algorithms. While Node.js works great for our other services that are I/O intensive, it’s not optimal in this use case due to Node’s interpreted and dynamic-typed nature.
That's misleading, math in V8 can be VERY fast (aside from the current absence of vectorization). See the benchmarks here: http://julialang.org
Can be. Based on those particular benchmarks, LuaJIT is even better than V8, and I've frequently run into situations where doing math in C++ is 5-10x faster than in LuaJIT.
I suspect the lack of an integer primitive is a big deal in both Lua and JavaScript.
> I suspect the lack of an integer primitive is a big deal in both Lua and JavaScript.
Not as much as you'd think. All JavaScript JITs speculate that numbers that have been observed to be integers remain integers and compiles them to use integer operations. As long as your code is hot enough to enter the optimizing JIT, the overhead is solely in whatever overflow checks couldn't be optimized out.
How often does the fencing data change? If it's infrequent I wonder if you could load the data on start up and simply restart nodes to refresh the data. That would avoid a mutex lock entirely.
However, a mutex lock/unlock should take <100ns. It doesn't seem like that should be a bottleneck. It'd be interesting to hear more about the data, requirements, and response times (median, 90%, etc).
I've done something similar. I used the GPU (actually WebGL) and made several images of my geofences where each one was simply drawn as a bitmask (black and white). Then, I wrote a fragment shader to determine if a point is within a geofence by looking the pixel value. I rounded the lat/long pairs to a precision that matched the resolution of my image.
Can't remember the exact results but throughput was much higher than what I was able to achieve using highly optimized PostGIS queries.
really interesting to see that go channels weren't used due to performance concerns. channels is the often advocated tech for concurency in this language so having to fall back to mutexes seems like a downpoint. I would have loved to see a benchmark in the article.
This is very common. Sharing memory is extremely important whenever you're doing anything related to CPU parallelism/concurrency; often if you don't use it you end up worse than the sequential algorithm.
It's surprising to hear you say this. I thought a design goal of Rust was highly discouraging shared-state concurrency. Has this design philosophy evolved somewhat over time?
> Has this design philosophy evolved somewhat over time?
Yes, it has. Post-1.0 Rust's design philosophy is about taming shared state concurrency: that is, disallowing data races statically, making sure you take locks or use atomics where possible, and having a robust set of generic concurrent data structures ready for use so you don't have to implement your own.
This article, for example, rightly observes that atomics are difficult to get right, making it not scalable to have every program implement concurrent data structures from scratch. The benefit of having a rich set of generic concurrent data structures readily available is that it mitigates this problem.
Makes sense! It's definitely a compelling story to offer the same abstractions that exist in other languages, but with static guarantees of correctness. A mutex that you can't mess up, that's a great thing.
Not at all. Rust gives programmers the ability to use shared state or channels at their discretion. The traits Send and Sync were made specifically for these use cases. Rust just guarantees that you can't use these primitives in an unsafe way.
Rust has very good support for shared-state concurrency. In fact, the borrow-checker is used to enforce invariants about locks and mutices that make shared-mutable-state a compile-time error.
The sync package's RWMutex is perfect for this problem. It will perform faster than channels and is much easier to reason about. Since it can support multiple simultaneous reads and one write the performance impact of a lock is minimal.
Channels are "just" mutexes as well. They are also not very fine grained because they have to deal with the general case. You can get much more precise writing your own locking code in Golang.
I built a route finding algorithm in Node.JS for one of my projects. I also do some geofence checks, but I use a geo-index (Mongodb).
Similar to other comments, the article is a bit lacking in detail. Regardless of language used, I would have liked to see a comparison of query times using a spatial index and not.
From my limited experience with PostGIS, there are some inside polygons that perform poorly on large polygons, where Mongodb has done better, and vice versa with linestrings.
"...service handled a peak load of 170k QPS with 40 machines running at 35% CPU usage on NYE 2015. The response time was < 5 ms at 95th percentile, and < 50 ms at the 99th percentile."
It's ~4k per instance. I would like to know what the R/W ratio is. And how often the index is updated is the key, aka locking operations, which should be mentioned in the article.
Also this article should give more comparison on other implementations, if possible. At least compare with Uber's monolithic implementation.
A reliable runtime is listed in the three main benefits of using Go. Are the runtimes for the alternatives really any less reliable?
I can buy that the language is easy to learn since the feature set is very limited, but is productivity in the first week/month really what to optimise for?
They also list performance as a benefit.
Go is a reasonable choice, but I doubt that the result would have been much different if they'd used any other modern statically typed language such as Rust, Scala, D, or even Nim.
>Because Node.js is single threaded, background refreshing can tie up the CPU for an extended period of time (e.g., for CPU-intensive JSON parsing work), causing a spike in query response times.
I'm not the biggest node fan, but that is utterly false. How do you think asynchronous system calls get processed in libuv (the event loop that node uses.) ? threads. Of course you'll park the background thread while you're waiting in case of IO, but if it's CPU intensive then you bet that you can use all cores with background work. You just need to know how to write C++ and read the documentation on the bindings
They already have node which is a mix of javascript, C++ (v8) and c(libuv.) Go is a fine language, and it's ok to want to write infrastructure code in go, but they shouldn't be giving false reasons.
Sure, node.js is implemented in multiple languages. But when people talk about writing code in node.js, they mean writing javascript. It's a javascript runtime, after all; it says so right on the front page of https://nodejs.org/en/ and even has js in the name. :-)
Saying "of course node.js supports multiple threads of concurrent computation, you just have to write in c++ instead of javascript" is missing the point of why people use it.
More than 3 years ago we've implemented a reverse geocoder web-service that indexed complete Census TIGER dataset. The service handled over 15K reqs/sec on a 2011 macbook, doing exact in polygon search(no approximations). We implemented an optimized (for in polygon search) R-tree data structure for the lookups.
Assuming Uber's geofence lookups would rely on a much smaller dataset than TIGER, I think you could have come up with a much more efficient implementation requiring a fraction of the resources.
And again, use of Go; irrelevant.