Smashing the state machine: the true potential of web race conditions

josephg · on Sept 20, 2023

The core problem here is that web application developers don't understand or use database transactions. I've seen this myself at almost every web shop I've ever worked at - queries / updates would be made to the database in series - "fetch this, then update that, then update this other thing". Your code works fine locally, but when multiple sessions interact with that data all at the same time, who knows what will happen.

I hadn't thought of using these race conditions to trigger security vulnerabilities. That makes all sorts of sense, and I'm going to use this work to justify my persnicketyness in future projects.

It doesn't help that popular ORMs either don't expose transactions at all or they consider transaction handling to be an advanced feature. It looks like mongodb only (finally) added transactions in mongo 4.2.

The general rule for how to avoid this problems isn't locking. Its using a database transaction per HTTP request, or "operation" - however you define that. Transactions are perfect because if any part of the request fails you can rollback the entire transaction. And if you really need to, you can emulate something similar yourself by having a "last modified" nonce on your record (a random number will do). If you ever do a read-write cycle - say from a web frontend - then the read should include reading the nonce, and the write should fail on the server if the nonce has changed since whatever value you previously read. (And you can quietly fix this by simply retrying the entire operation).

From a security point of view, this is a fantastic find. I anticipate a massive amount of code out there is at best buggy, and at worst vulnerable to these sort of attacks.

mickeyp · on Sept 20, 2023

> The core problem here is that web application developers don't understand or use database transactions. I've seen this myself at almost every web shop I've ever worked at - queries / updates would be made to the database in series - "fetch this, then update that, then update this other thing". Your code works fine locally, but when multiple sessions interact with that data all at the same time, who knows what will happen.

I mean, you're right, and it's a bugbear of mine, too. But let's not pretend that SELECT FOR UPDATE and out-of-DB manipulation of said data fixes everything. (Nor that a direct UPDATE statement would.)

Ultimately, either your updates are linearisable or they are not. Changing someone's first name with or without this methodology does not alter the fact that "last one to change wins". Which change is the right one is in this case in the eye of the beholder and the domain you work in.

But, yes, updates that build on the existing values to derive new ones are common data races because people, indeed, do not understand databases, transactions or isolation levels. There are easy fixes for this type of problem.

But it's never quite as simple as making everything SERIALIZABLE and FOR UPDATE and expecting that to resolve version conflicts because two different things have different expectations of what a row value should be.

josephg · on Sept 20, 2023

> But it's never quite as simple as making everything SERIALIZABLE and FOR UPDATE and expecting that to resolve version conflicts because two different things have different expectations of what a row value should be.

Can you give some examples where making everything serializable is a poor choice?

I think most of the vulnerabilities and bugs happen because people don't think about isolation / atomicity at all. And to me, SERIALIZABLE is probably the right default for 99% of regular websites. The example that comes to mind for me is realtime collaborative editing, though you can build that on top of serializable database transactions just fine. In what other situations do you need a different strategy?

klysm · on Sept 21, 2023

I absolutely agree that most apps should be using serializable transactions. It’s unlikely they are at a scale where it matters from a performance perspective.

dontlaugh · on Sept 20, 2023

SELECT FOR UPDATE is a much better default, though. Updates end up ordered by which did the update first, which tends to match intuition.

Of course the callers may be surprised at being out-raced, so you still need to deal with that. Often it's enough to return the updated value.

tzs · on Sept 20, 2023

Are there are any relational databases, or add-ons or frontends for relational databases, that allow for transaction sharing?

What I mean by "transaction sharing" is for separate connections to be simultaneously using the same transaction.

Consider something whose logic is like this:

  begin_transaction()
    select_something()
    update_something()
    compute_something()
    update_some_more()
  commit_transaction()

where compute_something() includes calling a service running in another process or on another server to return some data, and that service uses the same database, and we want it to see the data that update_something() updated.

With transaction sharing it might instead work something like this:

  tid = begin_shared_transaction()
    select_something()
    update_something()
    compute_something(tid)
    update_some_more()
  commit_transaction()

and in the service that compute_something(tid) calls it could take tid as a parameter and it would do something like

  open_shared_transaction(tid)
    ...
  close_transaction()

josephg · on Sept 20, 2023

I'm not sure if many databases offer this, but you can do it in foundationdb like this:

   version = txn.get_read_version().wait()
   compute_something(version)

Then on the remote computer, you can create a new transaction and call:

   def compute_something(version):
      txn2 = db.create_transaction()
      txn2.set_read_version(version)
      # ... Then use the transaction as normal.

This will ensure both transactions are looking at the same snapshot of the data in the database.

Docs: https://apple.github.io/foundationdb/api-python.html#version...

rockostrich · on Sept 20, 2023

I mean, it's possible to build something like that but why would you ever want to do that? It's so much simpler to just use a transaction per request and let the database's transaction logic handle which requests succeed and which ones fail. Clients should have retry logic to handle the failure scenarios. It's a lot simpler to reason about than trying to turn a distributed system into a single threaded one.

josephg · on Sept 20, 2023

I presume the idea is to make microservices able to work with database transactions. Microservices: Proof that not all ideas can be winners.

rockostrich · on Sept 20, 2023

Not really sure what the point is here. Microservices work great with database transactions.

dontlaugh · on Sept 20, 2023

No, they don’t. You can have a function in a library, to which you pass a transaction. You can’t pass an open transaction to another service, in the general case. The service boundary forces multiple separate transactions.

klysm · on Sept 21, 2023

This is in general not a very tractable problem considering it as a distributed systems problem. Once you add in an asynchronous network layer inside that transaction you lose a ton of guarantees. This is why I’m baffled when people choose to split up their data across multiple databases.

flir · on Sept 20, 2023

Hmm. I'd solve this by marshalling all the data in the parent process, and passing that, rather than the tid, to the compute process.

Maybe you could do it with a database proxy. The whole approach feels unclean, though.

johnnyworker · on Sept 20, 2023

Thanks for that! It seems trivial to enhance a DB wrapper to begin a transaction on the first query, and commit when the program finished without error. Like honestly, unless I totally missed something, why is that not built into everything it could be built into? Or put differently, is there ever a good reason to not use transactions?

josephg · on Sept 20, 2023

To be clear, you want to begin a transaction at the start of each http request and submit the transaction just before sending the response. (And error or retry if the transaction fails, depending on how your database works).

But generally I agree. This pattern should generally be built into everything.

Whenever I’m setting up an application server I usually whip up some middleware which takes care of this for me. The code is database specific - the semantics can differ slightly depending on what database you’re using. And it only matters for requests which issue multiple commands to the database before returning. (But this happens all the time in many applications. Especially if you’re using an ORM.)

robocat · on Sept 20, 2023

I presume that a DB lock for the length of the request processing time would cause contention when the number of concurrent session requests increases (with user growth).

The article mentions this example:

  Some data structures aggressively tackle concurrency issues by using locking to only allow a single worker to access them at a time. One example of this is PHP's native session handler - if you send PHP two requests in the same session at the same time, they get processed sequentially! This approach is secure against session-based race conditions but it's terrible for performance, and quite rare as a result.

josephg · on Sept 21, 2023

This is a misconception about how transactions are implemented by the database. Unlike php, a lot of modern databases implement transactions without locking out other concurrent queries from being executed at the same time. In serializable (linearizable?) databases, everything happens as if the queries happened one at a time, but using magic tricks like MVCC they don’t need to lock the database to do it. It’s cool stuff! So long as your transactions are short lived it usually performs way better than you’d expect.

Modern databases are incredible feats of engineering.

krackers · on Sept 21, 2023

Also optimistic locking

aidos · on Sept 20, 2023

Hadn't thought about the possibility of using http/2 in order to generate race conditions. Nice. Interesting trick.

Years ago, Stripe had a capture the flag related to distributed systems (I believe it was the second one they ran). One of the final levels involved attacking a system that would make calls to 3 other systems in turn that each held part of the key before calling back to your webhook. You could tell how many systems it had spoken to by watching for incremented port numbers between the requests.

I got killed by the jitter, but I seem to recall someone saying there was a way to stream your requests at a lower lower level on the same socket. Caveat, 10 years ago, details hazy.

Found the HN thread from the time though https://news.ycombinator.com/item?id=4453857

socketcluster · on Sept 20, 2023

This is an excellent article.

>> Every pentester knows that multi-step sequences are a hotbed for vulnerabilities, but with race conditions, everything is multi-step.

This is something that we spent a lot of time thinking about and why we decided to upgrade SocketCluster (an open source WebSocket RPC + pub/sub system https://socketcluster.io/) to support async iterables (with for-await-of loops) as first-class citizens to consume messages/requests instead of callback listeners.

Listener callbacks are inherently concurrent and, therefore, prone to vulnerabilities as adroitly described in this article. It's very difficult to enforce that certain actions are processed in-order using callbacks and the resulting code is typically anything but succinct...

Some users have still not upgraded to the latest SC version because it's just so different from what they're used to but articles like this help to confirm our own observations and reinforce that it may be a good decision in the long term.

For all of its benefits, though, one of the gotchas of a queue-based system to be aware of is the potential for backpressure to build up. In our case, we had to expose an API for backpressure monitoring/management.

imetatroll · on Sept 20, 2023

What I love about this is that http/2 which is non-trivial compared to http/1.1 allows this to happen. The same http/2 that, in my opinion, takes away from the beauty of the web.

albinowax_ · on Sept 20, 2023

At least HTTP/2 pretty much kills request smuggling (assuming there's no downgrading behind the scenes)

ishjoh · on Sept 20, 2023

Great article and I had no idea what to call 'time of check, time of use' (TOCTOU) which is a great name instead of me having to describe the situation.

I'm building an app on top of django where I have to worry about this, if you're using django check out there support for select-for-update, and if you're database supports it nowait=True can be a great thing that will fail a read if a select for update is already run:

https://docs.djangoproject.com/en/4.2/ref/models/querysets/#...

Also worth mentioning optimistic locking if you're looking to solve the issue in a different way, there is more involved from the application side but it has some advantages as well. I tend to prefer select for update with nowait=True since it's simpler on the application side, but I have used optimistic locking in the past with great success and some systems support it OOTB. Here is a description from AWS for those curious:

https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

lifeisstillgood · on Sept 20, 2023

Is there a good shortish course / book / site out there that would take a developer from "I get the ideas and happily use burp" to "so that's how pentesting works"?

It seems a valuable way to spend the rapidly smaller training budgets

yao420 · on Sept 20, 2023

The portswigger courses from the creators of burp and the employers of this researcher are the best. Free too.

lifeisstillgood · on Sept 20, 2023

Makes me look a bit silly now ...

thank you

rob-olmos · on Sept 20, 2023

Post on ACIDRain as an example of the ecomm time-gap issue back in the day: https://news.ycombinator.com/item?id=20027532

1vuio0pswjnm7 · on Sept 20, 2023

HTTP/2 is nightmarish.

josephg · on Sept 20, 2023

The problem isn't HTTP/2. HTTP/1.1 is also vulnerable to these attacks; they're just harder to find, reproduce and ultimately fix.

robocat · on Sept 20, 2023

From article:

  Finally, use the single-packet attack (or last-byte sync if HTTP/2 isn't supported) to issue all the requests at once. You can do this in Turbo Intruder using the single-packet-attack template, or in Repeater using the 'Send group in parallel' option.

1vuio0pswjnm7 · on Sept 21, 2023

Does HTTP/2 use more resources per connection than HTTP/1.1.

rewmie · on Sept 20, 2023

> HTTP/2 is nightmarish.

Why do you think that?