Hacker News

lifthrasiir · on Feb 21, 2019

You misread the rationale. He is arguing that, with all conditions same, the difference between binary formats and JSON would be in the noise. It is often the case that the object construction is more costly than the JSON parsing, and you can't fix that with binary formats.

As a minimal and extremely non-scientific benchmark, I've constructed a simple fixed data structure that encodes to JSON (using Python `json` module) and simple binary formats (that would be an ideal case for Python `struct` module). Decoding the same simple value 1,000,000 times in CPython 3.6.4 took...

    Format  Size  Iters.     Speed
    ------  ----  ---------  ----------
    JSON      28    205,000   5.75 MB/s
    Struct     6  2,400,000  14.4  MB/s

Of course YMMV, but even the `struct` module was only 2--12 times (depending on what you care about) faster than the `json` module in this particular case. And this is really minimal, you need an (slow) interpreted code for more complex binary formats. Right, you can use PyPy for the JIT compilation or binary modules for sidestepping the interpreter overhead! The point is that, it of course matters, but not quite drastic improvements you'd imagine.

md5person · on Feb 21, 2019

> "It is often the case that the object construction is more costly than the JSON parsing, and you can't fix that with binary formats."

What.

  typedef struct _some_struct_t
  {
      unsigned long some_long;
      unsigned long some_other_long;
  } some_struct_t;
  
  ...

  {
      some_struct_t foo = { 0 };

      foo.some_long = 1;
      foo.some_other_long = 2;
  }

Is somehow comparable to using JSON?

lifthrasiir · on Feb 21, 2019

C is one of extreme cases; that's why Cap'n'proto works pretty well in C++ and its cousins for example (it amortizes the decoding cost to accessors, and accessors are really cheap in those languages). There are many languages and implementations where decoding cost is not as significant.

md5person · on Feb 21, 2019

> "C is one of extreme cases"

I would say it's the other way around.

We've had the knowledge and tools to build performant, scalable and highly maintainable systems for a while now. The learning curve is there, but that's part of the trade. We've been too occupied with reducing the entry barrier though - the end result being people shoving JSON into places it should have never been in.

JSON can absolutely be a part of a text editor's architecture - with areas that don't necessarily require near real time performance (think configuration, metrics). Anything beyond that - C structs would be a great way to go, and I don't see why there's a debate here.

osmarks · on Feb 21, 2019

Because the idea of Xi is that it can support different frontends for different platforms, and that probably wouldn't work out to well if they all had to be in C.

md5person · on Feb 21, 2019

The Xi backend is already written in Rust, a relatively low-level language with a somewhat C-like FFI/ABI. The choice to use JSON in time-critical code, when more performant alternatives are available, seems to me like a mistake.

sanxiyn · on Feb 21, 2019

The whole point is JSON is not in time-critical path.

orbifold · on Feb 21, 2019

This is a super flawed argument. Clearly flat buffers and even protocol buffers are faster to serialize and deserialize than json, regardless of what you benchmark in python.

rapsey · on Feb 21, 2019

And for the amount of messages that are being sent, the speed difference is irrelevant.

This is the same conclusion sqlite developers came to. They tested turning JSON column types to binary and the speed difference was not large enough to warrant maintaining that code so they kept the data in JSON.

IshKebab · on Feb 21, 2019

If the speed different is irrelevant, why are they struggling with it?

rapsey · on Feb 21, 2019

Because most implementations are reasonably efficient. Swift default one is apparently not.

paulfurtado · on Feb 21, 2019

Python might be the one language that isn't true for. In my Python experience, the Google protobuf library is frustratingly slower than the built-in json module for any data structures I've cared about, which is why things like pyrobuf exist to solve that performance problem: https://github.com/appnexus/pyrobuf

lifthrasiir · on Feb 21, 2019

So you claim that decoding flat buffers and protobuf is faster than decoding with `struct`? I'm pretty much aware of various flaws and even stated some, but I barely buy that claim without a separate benchmark (which I really welcome by the way).

At least I fully understand what the `struct` module actually does under the hood---it sorta compiles to a list of fields and "interprets" the dead simple VM in C. Oh, of course I've used the precompiled `struct.Struct` for that reason (but it was only 20% faster). Anyway, this arrangement is typical for most schematic serialization formats in any language: a bunch of function calls for gluing the desired format, plus a set of well-optimized core functions (not necessarily written in C :-). Henceforth my justification that this is close to the "bare-bones" serialization format.

dkersten · on Feb 21, 2019

> with all conditions same, the difference between binary formats and JSON would be in the noise.

But, seemingly, in this case the conditions aren't the same.

IshKebab · on Feb 21, 2019

Are you using the slowness of Python's `struct` module to prove that binary formats in fast languages are slow?

I've benchmarked Capnp Vs JSON for Modern C++ in C++, and Capnp was something like 8 times faster.

If you're struggling with JSON performance how is moving to a binary format like Capnp (or Flatbuffers etc.) not a better solution?

e98cuenc · on Feb 21, 2019

It seems they're getting parsing times 1,000x slower than any other parser, 10,000x slower than simdjson. The complaint is understandable, but ironic :)

raphlinus · on Feb 21, 2019

These numbers are not quite right for a variety of reasons (performance measurement methodology is hard), but to do something more of an apples-to-apples comparison, it's about 50x slower than serde in Rust. That's still a lot, obviously.

md5person · on Feb 21, 2019

But... how else are the people that have never seen a byte array or had to flip endianness will be able to write plugins for my text editor?