Benchmark results of Kubernetes network plugins over 10Gb/s network

tgraf · on Nov 30, 2018

[Disclaimer: I'm one of the Cilium authors]

We have been trying to reproduce the performance results ever since the article was published as this is not in line at all with what we measure daily in our CI. We can easily do a multiple of these numbers.

There are some obvious flaws in the benchmarking scripts [0] such as using the "used" column of `free` without taking into account cached file buffers.

However, it does not explain why HTTP and FTP are worse compared to the TCP benchmark which is doing wire speed at ease. None of the Cilium datapath is HTTP or FTP specific unless HTTP specific security policies are in place in which case HTTP traffic is actually parsed.

We have requested more information on the scripts used by the author and continue to investigate. We will publish results as soon as we can reproduce this.

As stated by other commenters as well, most of these benchmarks are measuring the same Linux kernel code except for Weave (OVS) and Cilium (BPF). However, at the specified MTU of 9000, the bottleneck for all plugins will not be the forwarding datapath but the actual client and server code copying the data in and out of the kernel as there are very few packets actually being created and forwarded.

[0] https://gist.github.com/AlexisDucastel/ebb884831aeec5827e4df...

nicolast · on Nov 30, 2018

> most of these benchmarks are measuring the same Linux kernel code

This, 1000x this. I'm afraid too many people treat their CNI plugin as 'magic' whilst many of them really aren't. 'Host' versus Calico is basically benchmarking the impact of a Linux bridge device, and maybe some more iptables rules than the host has (depending on whether the host benchmark has iptables enabled at all, whether there are K8s network security policies in place and enforced by Calico,...).

Also, configuration details are lacking. E.g. in the Calico benchmarks, was ipip enabled or not?

krakrnews · on Nov 30, 2018

Yes most solutions are using Linux kernel, so what's being measured is indeed the impact of the way the kernel is being configured to achieve container networking. But that doesn't make those design choices, and the tests, meaningful. Calico, for example, contrary to your assumption, uses neither a Linux bridge device nor iptables for packet forwarding. (It does use iptables for policy enforcement, but that's not being tested here.)

nicolast · on Dec 3, 2018

I'm aware it doesn't use iptables, except to implement network policies, hence the reference. Good call about the bridge usage, my bad, makes sense that's not being used given Calico is L3...

tyingq · on Nov 30, 2018

"However, it does not explain why HTTP and FTP are worse compared to the TCP benchmark which is doing wire speed at ease"

Do you have some more detailed info on the configuration and commands you used? Nginx, for example, doesn't have sendfile() turned on by default (just one example of a configuration that might change benchmarks).

fooz0 · on Nov 30, 2018

See https://gist.github.com/AlexisDucastel/1b287428db8d40f528f1b...

stevenacreman · on Dec 1, 2018

As somebody who has created a product in the past and also reviewed quite a few I've given up doing performance comparisons. This is quite sad as comparisons help people save time and money and cut through the marketing which technical people hate.

Every time I've done a performance comparison an expert pops up and says the result is invalid because of X. It takes 10 seconds to write the comment but perhaps a few hours to redo the tests and update the blog contents.

The blogger doesn't want an inaccurate blog and the software authors don't want bad benchmarks left up which constantly crop up in search results. As a blogger you feel a little duty bound to work on updating a blog you know probably won't be re-read by the majority of people who have already opened it anyway.

My conclusion is that fault should fall on the side of the software developer in most cases. Having created a startup I understand the time pressures and motivations driving the roadmap. There is a natural tendency to work on the differentiators and high value complex features. Blogs like this should act as a reminder that there is massive value in prioritising sane defaults, tests, documentation and building logic into the application that makes incorrect settings that effect performance unlikely.

From reading this blog I get the sense the author is quite technical. A positive public relations move would be to spend your time replicating the results and then when the problem is found make it difficult for the next person to have the same issue. Preferably with logic in the software, but worst case scenario with some bold text towards the top of the readme so it's not buried somewhere obscure.

marmaduke · on Nov 30, 2018

> this is not in line at all with what we measure daily in our CI

But you setup your CI. This guy’s numbers are a lot closer to what I or another CNI n00b would get trying to set something up.

OTOH if you’re already a CNI expert, you wouldn’t be reading this article.

As someone wondering what CNI to choose I found this article helpful

zaphar · on Nov 30, 2018

I wonder what the latency stats are for each. Bandwidth per second is useful but in my experience latency is the more important statistic for the applications I manage.

throwaway2048 · on Nov 30, 2018

Packets per second aswell, speed testers generally give you the absolute best case full MTU sized packets for 100% of the duration.

Real world traffic has lots of nasty small packets.

shaklee3 · on Nov 30, 2018

You might want to see the comment I just made above. The two dpdk based plugins will give you the best latency and throughput. Vpp is a full-blown virtual switch, so I'm not sure if you need something that extensive.

ravedave5 · on Nov 30, 2018

Exactly I don't need to transfer a ton of data in my rest calls, I need them to not have latency.

scurvy · on Nov 30, 2018

Any plans to run some packets per second benchmarks with latency? Everything today can run at line rate, especially with jumbo frames. These existing benchmarks don't show much. Networking people want to see how many PPS you can push, at what latency, and at what packet sizes can you achieve line rate?

nsteel · on Nov 30, 2018

Some IMIX benchmarks would be a good start. https://en.wikipedia.org/wiki/Internet_Mix

adamtulinius · on Nov 30, 2018

I find it slightly weird to test different protocols like that, especially the SCP test, where the author states:

"With SCP, we transfer the 10GB random file over scp using OpenSSH server and client. We can clearly see that even bare-metal performance are much lower than previously."

SCP is just notoriously slow, and it should be expected to not be able to max out the connection with it.

chrismeller · on Nov 30, 2018

No one is saying you should use SCP for anything or that it's an alternative to any of the others - none of the tested protocols are practical alternatives to each other. It was included to provide a comparison point for the encrypted version of WeaveNet.

Encrypted WeaveNet handled plain TCP traffic at 1,330 Mbit/s. Without the SCP comparison there's no way to tell whether that's really really good or really really bad. Seeing that the unencrypted version of WeaveNet handled SCP at 1,594 Mbit/s shows that it's not horrible, but could probably be better. It's good for WeaveNet folks to know they have some room for improvement and it's good for you because you might want to consider alternatives if you require encrypted traffic and are very bandwidth sensitive.

bboreham · on Nov 30, 2018

[Note I work on Weave Net]

When we ran a similar test we got 3.75Gbit/sec; the figure in this article is closer to what we got in our "slow mode" which is used as a fall-back when it can't get the in-kernel path to work.

https://www.weave.works/blog/weave-net-performance-fast-data...

stevenacreman · on Nov 30, 2018

For those interested in a feature comparison I did one here:

https://kubedex.com/kubernetes-network-plugins/

gtaylor · on Nov 30, 2018

This is pretty neat. Would be interested to see aws-cni-vpc in there, too!

stevenacreman · on Dec 1, 2018

Added as the final column in this sheet.

https://docs.google.com/spreadsheets/d/1qCOlor16Wp5mHd6MQxB5...

nicolast · on Nov 30, 2018

As mentioned here, the article doesn't specify whether the Calico setup uses ipip or not which could have a measurable impact.

In the Calico-without-ipip case, you're basically comparing host networking (let's assume with some iptables enabled) with host+iptables+bridge networking (yes, there may be more iptables rules involved). Let's assume the impact of iptables is the same between both, then it'd be interesting to measure the impact of the Linux bridge being used. As a colleague of mine mentioned, this may be barely noticeable on a 10Gb interface, but could be on faster networks. How about running these tests on a 20Gb or 40Gb network? These are quite common in datacenter networks, which is where you'll be using these CNIs (unlike GKE, EKS,... where you can integrate with the 'native' SDN).

Finally, I'd be interested to get some results when using the macvlan CNI plugin (though then one loses network policy support, sadly enough :( )

barbecue_sauce · on Nov 30, 2018

I don't work in the networking space, is ipip synonomous with IP-in-IP tunneling? (I can't find a definitive answer in my cursory attempt).

X-Istence · on Nov 30, 2018

Yes.

https://docs.projectcalico.org/v3.2/usage/configuration/ip-i...

This is used when there is source/dest filtering happening on the network (such as in certain cloud providers where you can't send packets from 10.0.0.1/32 to 192.168.1.2/32 because 10.0.0.1/32 is not valid, so you pack it into another IP packet that makes the source seem like 192.168.1.3/32).

dharmab · on Nov 30, 2018

https://docs.projectcalico.org/v3.2/usage/configuration/ip-i...

nicolast · on Nov 30, 2018

Exactly.

krakrnews · on Nov 30, 2018

see my prior comment - your technical understanding for Calico (host+iptables+bridge) is incorrect

gweeed · on Nov 30, 2018

Hm, running iperf3 stream is pretty synthetic benchmark, what about real-life workloads? iptables should be pretty damn slow the more rules are added as they are walked linearly, I doubt this has been properly evaluated in the blog post above.

shaklee3 · on Nov 30, 2018

For those looking to go above 10Gbps, Intel makes a dpdk plug-in, and Cisco makes the VPP virtual switch with dpdk as a back-end.

hacknat · on Nov 30, 2018

Flannel has encryption. It specifically has a documented IPSec mode. I’ve actually run it in production.

sascha_sl · on Nov 30, 2018

it also has a very very simple wireguard plugin that "just works" in my experience. You can read the entire thing in a minute.

https://github.com/coreos/flannel/blob/master/dist/extension...

throw2016 · on Nov 30, 2018

Things like CNI take away from the development and understanding of general networking technology to scripts and json/yaml wrappers that not only add another layer of complexity but also prevent users from understanding what they are actually using.

For instance in this post itself we learn about all these CNI wrappers but learn nothing about Vxlan, BGP and Linux networking tools that these are using underneath. And those are actually easy to use directly. This does a disservice to both developers and users.

For instance true game changers in this space is tech like Wireguard that can deliver encryption at near line speed and would kill these benchmarks. Everyone can benefit from Wireguard today. Initiatives like CNI encourage wrappers and those then wrapping tools like Wireguard end up getting the recognition and success while the underlying tools like Wireguard fade into the background. This does not seem like a healthy or sustainable development model.

wmf · on Nov 30, 2018

K8s is all about encapsulating devops practices into code, so it's kind of philosophically totally opposite to learning all the details of virtual networking and configuring it from scratch.

If Wireguard doesn't write the wrapper themselves they have no one to blame. People need to understand that the hard/cool part and the valuable part aren't the same and if you are too cool to create value then you can't complain about other people taking it.

throw2016 · on Dec 1, 2018

This is a nice idea about abstraction but without knowledge you cannot run anything professionally with any degree of reliability or confidence. What happens when things break? Google, Facebook and any company dependent on delivery have full fledged networking and ops teams.

All these wrappers still require time investment in knowledge, setup and configuration. The cost for users knowing the wrappers but not the tech is being locked in a platform, if they know the tech even if Kubernetes crashes and burns their knowledge and time investment remains valuable.

Before Wireguard it was extremely difficult to get near line speed with encryption, now accessible to all. If this is not 'value' what is? Given this post is about networking benchmarks this is a huge deal. If people stop creating these kind of networking tools and technologies then there is going to be nothing to wrap.

polyomino · on Nov 30, 2018

In many cases these containers are running on the same physical machine. Could the network stack be optimized so much so that writes from one program inside a container can write directly into pre-allocated memory of another container using our existing networking interfaces?

tgraf · on Nov 30, 2018

https://www.slideshare.net/ThomasGraf5/accelerating-envoy-an...

Slide 21-25

kklimonda · on Nov 30, 2018

Was calico ipip encapsulation enabled or disabled?

sofaofthedamned · on Nov 30, 2018

I'm impressed that all non-encrypted methods are so close to bare-metal. Wonder how much CPU was impacted at those rates though?

sascha_sl · on Nov 30, 2018

In the case of flannel I know that not a lot actually happens that'd influence that. It's signaling and orchestration. flannel allocates a subnet and throws up a VXLAN interface. That's it.

kitd · on Nov 30, 2018

CPU usage was the final test, near the end of the article.

sofaofthedamned · on Nov 30, 2018

Ah missed that, ta!

jamp897 · on Nov 30, 2018

I have to admin I’ve never quite understood the need for the overlay network, esp with such a large performance impact. Security has been moving higher up into the app layer anyway with Beyond Corp style approaches. What are the uses cases driving it?

dharmab · on Nov 30, 2018

In our case, service mesh and network isolation (including for potentially untrusted code). Being able to ensure that pods can't talk to each other by default and managing whitelists for service to service communication. Managing that without an overlay is a pain once you're past a handful of nodes.

wmf · on Nov 30, 2018

Most of the CNIs tested here are not overlays.

But in general most companies can't do automation or self-service in their physical network so overlays are the only way to get any network agility. And BeyondCorp is much harder than that.

scurvy · on Nov 30, 2018

Also, is Romana still actively developed? It seems like they had a lot of momentum, then suddenly things went radio silent on their Slack and git repo.

godelmachine · on Nov 30, 2018

Has not https://news.ycombinator.com/item?id=18564340 achieved the same result?

ravedave5 · on Nov 30, 2018

I wish there was a similar comparison of proxies.

londons_explore · on Nov 30, 2018

All these 'plugins' are simply scripts and tools which configure the linux kernel iptables and network forwarding rules right?

None of them actually handle packets, connections, routing etc directly I assume?

Given that, it seems disingenuous to benchmark them... You're simply benchmarking the Linux Kernel vs the Linux Kernel.

whalesalad · on Nov 30, 2018

At the end of the day you’re benchmarking a strategy. It doesn’t matter that Linux is the underlying thing executing that strategy. All of these tools have branding and marketing around them... It’s important to see which ones live up to claims.

detaro · on Nov 30, 2018

They do not all work the same way, and using them clearly produces in different results, so why is it disingenuous to compare what performance you get with each of them, even if all they did was "configure the kernel" (which afaik isn't the case)?

gauravphoenix · on Nov 30, 2018

No. This is not all true. Many tools use encapsulation of packets and do not modify netfilter or routing tables.

dharmab · on Nov 30, 2018

No. For example, Cilium uses eBPF for L4 routing/filtering and Envoy for some L7 filtering.