Kubernetes on Google, Azure and AWS Compared

GordonS · on Oct 9, 2018

> Azure is something I’ve avoided since using it for a few months last year. I was working as a Microsoft partner so it was unavoidable back then. Parts of it are alright but the user experience coming from an Amazon background is worlds apart.

Wat? IMO, the Azure portal is amazing to work with, especially compared to the old-fangled, inconsistent UI that AWS provides.

> Show them how some things on Azure need to be done in a clunky web UI

Years ago, this was true, but it hasn't been for a long time - the Azure web UI of today is fast, consistent and looks great.

> some things on Azure need to be done in a clunky web UI, other things need Powershell and other random stuff uses the CLI

I've been using Azure for years, and I'm not aware of anything that can only be done in the UI. I also don't believe there is anything that only works with Powershell, or only works with the CLI.

> how that effects the design of DevOps pipelines and automation in general. Yes, you can make it work, but why make life hard for yourself

Eh? Azure DevOps pipelines are great to work with, and there's a huge library of tasks available.

After reading this, it doesn't sound like the author has actually worked with Azure recently, so I'm really not sure why he bothered including it in this article.

sofaofthedamned · on Oct 9, 2018

Agree completely.

AWS portal is bizarre, it's like each portion is written by a different company. In ECR for example, clicking into a repo is fine, but then going back doesn't do anything. Why? Even simple favicons don't look like they're from the same company. When you've got 40 tabs open you want an idea that that particular tab is from AWS or not.

Azure is nicer, but it's not reliable (see my other post on this thread). It's almost like Azure have done too much lazy-loading which persists through login sessions, you can't rely on a refresh to give you the truth - for all the problems with AWS, you can.

And yes I know this isn't CLI stuff, but it's important. Even on fully automated / scripted deployments it's sometimes nice to go into the UI and see what's going on. If it doesn't represent reality, there is a problem.

FWIW GC doesn't have any of these problems, but it is slow.

GordonS · on Oct 9, 2018

You only mention Azure DevOps in your other comment - have you used Azure in a wider sense?

As another datapoint of one, I've been working with Azure for years, and have never found the UI to be unreliable.

virgilp · on Oct 10, 2018

I've been working with it for months (still do), and have never found it to be good or reliable. The endless-horizontal-growth is maddening. Horizontal scroll-in-scroll is very hard to do for users, why they thought it'd be a good idea is beyond me; you can't easily select stuff (e.g. CosmosDb database names, that get truncated if they're long) - and can't right-click to "inspect element" either, you have explicitly open "developer tools". Occasionally Firefox will go mad when Azure is open and spin up my fans like crazy. There are errors that you can get that are completely cryptic (forgot to fill in the "subnet" in HDInsight? Well sorry, your cluster is stuck in "validating" forever). Storage options are confusing - in AWS you have S3, here you have ADLS Gen1, Gen2, Blob storage which is actually "binary blob/page blob/ append blob" but is presented as "tables/ queues/blobs" and oh BTW ADLS Gen2 is based on it but Gen1 is not (you provision a gen2 acount from "storage accounts" but Gen1 has its own UI section).

I could go on and on. It's not like Azure is completely bad, but I far prefer the developer experience of AWS. It's not perfect, but IMO it's much better.

GordonS · on Oct 10, 2018

Wow, seems like your experience is very different from mine or that of my co-workers!

Regarding copying stuff, if you hover over just about any text element, a little 'copy' icon will appear. I like this feature.

virgilp · on Oct 10, 2018

> if you hover over just about any text element, a little 'copy' icon will appear. I like this feature.

Think I didn't try that? Doesn't work on CosmosDB databases. Yes, for a lot of items it's true that "copy" is easy.... but when it's not easy, it's crazy hard.

I'm guessing that your company doesn't have hundreds of Azure subscriptions (and the proportional number of other resources to manage, in them). When there are a lot of elements that you need to manage/ browse through/ look for, you start to appreciate the simplicity of the AWS ui, and the fact that it prioritizes "snappy" over "fancy".

Azure might be great for an SMB, but I think for large enterprises, AWS is still unchallenged.

GordonS · on Oct 10, 2018

I work for a megacorp (250k+ employees). We have a lot of subscriptions, and a lot of resources :)

Just about everywhere I'd want to filter stuff in the Azure portal, there's a search box or/and filters. Honestly, I find it to be really snappy for me.

sofaofthedamned · on Oct 9, 2018

I have occasionally but not in anger, and not for a while. It's a good point you make, but my experience recently with Azure DevOps is similar or worse to that I dealt with previously on Azure.

Worst thing about it is that I was originally impressed with Azure DevOps - the onboarding was awesome, almost combined the best bits of GitLab / GitHub with none of the crap of Jenkins. It was only after going back a couple of days later it let me down.

vtbassmatt · on Oct 9, 2018

As a PM on Azure Pipelines, I'm really interested to learn how we let you down after a few days. If you're willing to share, I'm mattc at xbox.com. Or post here, whatever works.

sofaofthedamned · on Oct 10, 2018

Thanks! I'm going to play around with it a bit more and if I still have problems i'll hit you up. The GUI stuff was all a precursor to automating it with Ansible anyway so unless that's broken I think it'll be okay.

Azure Devops is awesome though. Told my wife i'd found a modern Microsoft product I really liked, she nearly fainted.

dekhn · on Oct 9, 2018

The term you're looking for with the AWS Portal is "they're shipping the org chart". Different teams build plugins for the portal, with a bunch of latitude to define the custom experience.

hrktb · on Oct 9, 2018

I only tried Azure for a few weeks, but I think it's underlying concepts and foundations are way too windows/.NET centric and still half baked in surprising areas when coming from a non Microsoft background. I wouldn't be surprised if it was the case for the author.

I know a lot of things have changed at Microsoft, but and also the language I was using (ruby) was not the best match perhaps. But at a lot of steps:

- documentation was directed at VS ("you have this nice integration to do that in 30s. Other people go this way, poor some jugs of coffee you'll need it.")

- docs and tutorials are old and deprecated. Even articles written by staffers a few months ago were explaining stuff that disappeared from the portal interface, and no update was available. There seems to have been a huge system transition, and it was painful to be caught in the middle, especially when after that it means the new system has a risk of being beta quality for a while.

- anything remotely complex means moving to Kudu, when there's already another set of CLI tools to deal with.

Basically it felt like I needed to invest 6 months of my life to really get it, and then I'll be ready to use it professionally. AWS or even GKE don't have as much as a learning curve for people coming from the linux world in my opinion.

GordonS · on Oct 10, 2018

I'm surprised you think this, as I've been surprised in the other direction :)

I'm mostly from a Windows and .NET background, but I'm reasonably comfortable in Linux too, and I've been surprised at the breadth of support for Linux and traditionally non-Windows languages, such as Python and Node.js.

As an example, take the docs for what must be one of their most used services, App Services: https://docs.microsoft.com/en-gb/azure/app-service/

Right on the front page there are quickstarts for .NET, sure - but also Node.js, PHP, Python, Java...

Some of their newer services are even Linux first, such as AKS!

paranoidrobot · on Oct 9, 2018

I havn't used the Azure portal for a few months now, but every time I have used it, it's driven me up the wall.

AWS's UI might be inconsistent, even old fashioned - but it's extremely quick to navigate around and find things.

With Azure I'm either having to spend 10 minutes hovering over an array of icons trying to figure out which one will take me to the service I want, or I'm fighting with their panels which hide information I actually wanted.

When it comes to other basic functionality like updating billing details - Azure only lets that be done by the root account, which I really don't want to hand over to our accounts department, and they don't want me to know the card details they use for it. So we're at this stale-mate where I have to log in as root, navigate to the right area, then let them enter the card details.

cyptus · on Oct 10, 2018

try again, there has been an update a few weeks ago :-)

maximilianburke · on Oct 9, 2018

I've been in the process of migrating from AWS to Azure to take advantage of start-up credits. Previous to this I was working at a large company on AWS-only infrastructure. I feel like I've got enough experience with both at all levels. In Azure we've been using Kubernetes for a number of months now.

I, too, think the Azure portal is amazing to work with. It feels much more coherent than the AWS portal. I like the Azure command line package as well, in preference to the AWS CLI. I've never touched the Azure PowerShell tools and have never felt like I had to.

Creating resources via the portal or command line is something I only do as a last resort anyways; we use Terraform for all of our cloud resource creation purposes.

ladzoppelin · on Oct 10, 2018

We are trying to be all Terraform but its not moving fast enough. We still need the the Azure/AWS consul/cli to do much of the work. I switched from an AWS shop to a "migrating to Azure from AWS" shop. The new job is all Kubernetes work so I love it but I cannot get comfortable with Azure. I was so much more productive and confident with AWS its crazy. I miss S3, EBS, Snapshots, Route53, IAM and even fucking AWS role policy. I understand MS realizes this which is why they change there API every 4 months but its getting ridiculous we can't even use AKS (instead of ACS) because for some reason they don't role out services to the US North Central Region. WTF?

maximilianburke · on Oct 10, 2018

Yeah, there's a definite hurdle to get over when moving to Azure. I've been finding analogs to most of the services I need, like Blob storage instead of S3, managed disks instead of EBS.

There are some things I definitely prefer in Azure to AWS too; I find the AD-based authentication to be much easier to understand and implement compared to IAM. The SMB shares (File storage) are great, integrate well with Kubernetes, and are a lot faster than EFS.

Non-premium VM IO performance is abysmal though, and I really wish I could store SSH public keys in Azure AD.

thezultimate · on Oct 10, 2018

AKS is good until your custom resources are slow to react, and there is no ETA when the issue will be fixed. Check the following issues: https://github.com/Azure/AKS/issues/522, https://github.com/Azure/AKS/issues/676

halbritt · on Oct 10, 2018

Give GKE a whirl.

This is the same advice I gave to the AKS product manager.

GhostVII · on Oct 10, 2018

I've only used Azure for a little while, but it just seems way to slow compared to AWS. It takes forever to spin up and shut down a server, or reconfigure something, and the website in general just feels slow. The AWS portal is less pretty, but feels fairly quick.

GordonS · on Oct 10, 2018

> It takes forever to spin up and shut down a server, or reconfigure something, and the website in general just feels slow

I don't find the website to be slow - they seem to have updated it ~6 months ago, and it seems to be really responsive since then.

I'm with you on time to create some resources though - starting up a new VM takes anywhere between 5 minutes and (on occasion) an hour. I'm working a bit with Azure Batch these days too, and it generally takes 15-25 minutes to start a small pool, which is really irritating. I wish they'd introduce a containerised version of batch, with jobs starting in seconds.

partiallypro · on Oct 9, 2018

I had some problems with Azure (and every other cloud provider,) but by far and above the Azure portal is vastly less confusing than AWS or GCS. Using the GUI on those two can quickly becoming a dizzying mess.

regnerba · on Oct 9, 2018

I don't think I have logged into the Azure web UI in like... 3 years or so. However I have used AWS and GCP pretty extensively. I have always found the GCP web UI to be light years ahead of the AWS one. Curious for why you think the Azure one is ahead of GCP and AWS.

thejosh · on Oct 10, 2018

I've worked pretty decently with azure, but VSTS (now azure DevOps, oh boy) is horrible for hosted solutions... It's just so dang slow. Slow to start, slow to run anything that touches disk, etc

Also, the storage with azure is very, VERY slow. Spin up a VM and give it a go. ultra ssds might change things, depending on pricing.

gant · on Oct 10, 2018

We only use Azure for AD and still we get the leaky cloud everything is on fire screen once a week at least.

outside1234 · on Oct 9, 2018

It sounds, in fact, like he had the story he wanted to tell, and then came up with the story to tell it.

diddid · on Oct 10, 2018

One thing I couldn’t do through the UI is change the expiration of JWT tokens as issued from ActiveDirectory, I needed to do some Powershell for that. Things in preview often times end up Powershell only first, which I’m OK with.

jiveturkey · on Oct 9, 2018

> I've been using Azure for years, and I'm not aware of anything that ... only works with Powershell

AAD password policy can only be set with powershell

rconti · on Oct 9, 2018

I don't know, right in the second sentence he said it's the most honest comparison.

gaius · on Oct 9, 2018

Ditto. Hell Google just dropped out of the JEDI pitch mumbling something about their tech not being up to the requirements. Azure and AWS are so clearly in the lead. Google’s offering is, well, a bit better than Oracle’s, but that’s as far as I’ll go.

ariwilson · on Oct 9, 2018

Wasn't JEDI ghostwritten by Amazon lobbyists/engineers? No surprise there.

mcqueenjordan · on Oct 9, 2018

From Google, "we determined that there were portions of the contract that were out of scope with our current government certifications."

puzzle · on Oct 9, 2018

So it's a certification issue, not a technology one as the OP said? I wouldn't be surprised if it has something to do with e.g. security clearances of employees.

joshuamorton · on Oct 10, 2018

The Jedi dropout was iiuc, because Google doesn't have a data center fully staffed by people with security clearance, more or less.

sofaofthedamned · on Oct 9, 2018

Interesting post. I'm a Linux guy but i'm in a new job so having to ramp up on Azure DevOps. Looked good at first, I actually praised Microsoft but:

1. Creating a project timed out after 5 minutes or so. This was in the gui, but had no adblocker or anything similar. Refreshed page = no project. 2. Went to a different machine, logged in, no project still. 3. Went to original machine - refreshed - no project. Logged out and back in again - project was there.

Azure DevOps was awesome at first but I can not trust something like this when I work with it all day.

switch007 · on Oct 9, 2018

I've had similar experiences. Can't recall a similar issue with AWS, ever. And those panes needing horizontal scrolling: wat?

illvm · on Oct 10, 2018

XBox Dashboard designers moved to Azure :)

toomuchtodo · on Oct 9, 2018

I refuse to work with Azure because of issues like this.

sofaofthedamned · on Oct 9, 2018

I don't have a choice unfortunately.

BTW never had a problem bringing up an instance quickly on GC. AWS sometimes just sat there for a while, depending on the instance type and region. Azure just stops because it feels like it. I'm sure GC isn't any better than the other two, it's just less utilised.

The UI is my biggest issue. Yes, we don't depend on it because we all script our stuff. But after spending an hour fapping at an SSH terminal it's nice to look at the UI and see stuff happen. Azure is the worst at this, with AWS a close second.

That reminded me of my favourite bug with AWS UI / Container Registry - you can order the list of container images any way you like, but it'll only order those it's loaded. So i'm looking for a new image called 'zzzzz' but it'll never appear, I need to search for it. Did anybody involved ever use this?

toomuchtodo · on Oct 11, 2018

> I don't have a choice unfortunately.

I won't say you always have a choice, but I'm happy to hook you up with my network if you want to passively seek out a new role. Email in my profile.

Sidenote: I have seen the ECR UX shenanigans you mention.

inscrutable · on Oct 9, 2018

Fully agree with his comments as someone who's used GKE, AKS, EKS and Hetzner for kubernetes clusters.

GCP's UX is so nice... e.g. compare the equivalent as command line option in the UI vs the automation script that Azure gives you.

halbritt · on Oct 10, 2018

Jumping on this bus.

AWS UI was usable. Mostly stuck to automation or the CLI though. Azure was... I didn't use it a lot, but I thought it was terrible.

With GCP, I find the CLI to be more intuitive, but even still, I find myself using the UI more often than not, it's that good.

I just wish the GCP console people would talk to the Gsuite admin people.

GordonS · on Oct 9, 2018

Azure actually gives you several ways to do things, of which automation scripts (ARM templates) are just one:

  1. Web UI
  2. Powershell
  3. CLI
  4. REST API

I know what you mean about ARM templates though - the auto-generated ones in particular are extremely verbose!

krn · on Oct 9, 2018

> [...] as someone who's used GKE, AKS, EKS and Hetzner for kubernetes clusters.

How does Hetzner compare here? In terms of both, general experience and costs.

cardine · on Oct 9, 2018

>How does Hetzner compare here? In terms of both, general experience and costs.

We use Hetzner for this purpose. It's hard to compare it with the other providers since with Hetzner you are renting unmanaged servers, not turning up cloud instances.

But it is absolutely the best value you can get. Our bill would be at least 10x higher if we were using Google/Amazon/Microsoft.

Considering we spent a lot on servers as is, that makes it well worth it for us. It might not be if your product is very computationally light.

krn · on Oct 10, 2018

> It's hard to compare it with the other providers since with Hetzner you are renting unmanaged servers, not turning up cloud instances.

Hetzner Cloud is already a thing[1] and is a direct alternative to Amazon EC2, if you choose dedicated vCUP.

[1] https://www.hetzner.com/cloud

segmondy · on Oct 10, 2018

I don't know your location, but if you are in the US and running your own k8s on baremetals, I would reckon you should have at least 1 full time person on staff to handle all of it. Let's say that person costs only $50k/yr. Will your workload on Google cost more than $50k? Google pretty much really does all of the management for GKE. Just deploy your app and run.

cardine · on Oct 10, 2018

>I would reckon you should have at least 1 full time person on staff to handle all of it.

What exactly would that person be handling? We probably spend no more than a couple hours per week handling everything involved. That's the point of something like Kubernetes in the first place.

>Will your workload on Google cost more than $50k?

Our sever bill with GKE would far exceed $50k/yr (it would exceed $50k/mo!)

As a result, Hetzner is a no-brainer for us. As mentioned before, that might be different for you if you are doing something that is computationally light.

Drdrdrq · on Oct 10, 2018

Well if you are running k8s you already have that person. The real question is, is it worth putting additional load on that person in exchange for savings?

mijamo · on Oct 10, 2018

That is not my experience with GKE. And 50k/y is super low with Google Cloud. It's really easy to get at least 5 times higher even with a small application.

Funnily I spend more time administrating GKE than I do with applications on VPS.

stevenacreman · on Oct 10, 2018

Author here.

I can say that this has pretty much gone as I'd expected.

Microsoft have 80,000+ developers. Their partner ecosystem is absolutely massive. I've watched them hire hundreds of developer advocates that talk at events. It's therefore quite hard to write anything online that's critical without at least half of the comments coming from a biased source.

It is interesting to see the gap between views in the comments here.

Edit: I watch this comment go up and down a lot as the vote battle between those with an agenda and those without click against each other.

Weirdly, I did criticise AWS a bit for their EKS offering in the blog but I've not had anywhere near the toxicity from people about that.

raesene9 · on Oct 10, 2018

Just want to say, I just downvoted you but it had absolutely nothing to do with preference for or against Azure (I don't really have a strong opinion on that either way)

I downvoted due to your statement that implies that anyone disagreeing with your view has "an agenda" whereas those upvoting you don't.

People have different opinions and turning debates into an adversarial situation "if you're not agreeing with me you're against me" isn't helpful, in my opinion.

GordonS · on Oct 10, 2018

> It's therefore quite hard to write anything online that's critical without at least half of the comments coming from a biased source

Some stuff is perceptual, other stuff is factual. The issue I personally took with your post is that you stated things that are blatantly untrue, such as:

    some things on Azure need to be done in a clunky web UI, other things need Powershell and other random stuff uses the CLI

And now I also don't like how you are basically saying that anyone who disagrees or downvotes you is some kind of shill for Microsoft!

stevenacreman · on Oct 10, 2018

The spreadsheet linked in the blog is almost entirely factual and I made quite a point of adding comments with links to various sources.

If you want to be pedantic about the sentence you took offense at it is actually true. There are operations that can only ber performed using Powershell. People have given examples in this thread. But that isn't the point.

What I should have written for that part was that most people google for solutions. I've done this in the past on Azure and you get a random assortment of answers. Click here, Powershell this there, do whatever. It's disjointed and takes you out of mental context. I could update the blog with more clarity but I don't think anyone will change their mind.

I've used Azure, GKE, AWS. It was played down a little in the blog, but I have used the non AKS parts of Azure pretty extensively. Have you tried GKE? My suspicion is that you've been stuck in a Microsoft world for a while.

Not everyone is a paid shill but some people have a lot invested in the Microsoft ecosystem and I can only assume that's the reason for some of the comments here. I do 100% believe the Microsoft PR team has posted in here at least once though :)

bk24 · on Oct 10, 2018

If you honestly believe that there is such a thing as "unbiased" and that this article is an absolute objective demonstration of that, I don't know how much higher your horse could get. When someone gets paid or invests time in anything, whether it is to build, use, market, or sell a product, then that person has become biased. Anyone who can convince themselves that he/she is immune from this is delusional.

sjellis · on Oct 10, 2018

> Not everyone is a paid shill but some people have a lot invested in the Microsoft ecosystem and I can only assume that's the reason for some of the comments here. I do 100% believe the Microsoft PR team has posted in here at least once though :)

I've worked on both sides of these fences (AWS/Azure, Windows/Linux), and think that you are correct. The Microsoft partner programme is basically all about Azure these days, so if you are a Microsoft professional then Azure has been made part of your job, and nobody wants to think that they are doing something second-rate.

detaro · on Oct 10, 2018

> Not everyone is a paid shill

https://news.ycombinator.com/newsguidelines.html

> Please don't impute astroturfing or shillage. That degrades discussion and is usually mistaken. If you're worried about it, email us and we'll look at the data.

detaro · on Oct 10, 2018

It's extremely low to automatically attribute disagreement to other commenters being dishonest. (even if we ignore the fact that AWS likely has a larger network of people highly invested into it...)

stevenacreman · on Oct 10, 2018

Any more of this nonsense and I'll get some Azure credits and start doing Youtube videos of the slowness and the crashes :)

romanovcode · on Oct 10, 2018

Please do.

stevenacreman · on Oct 21, 2018

Done. https://kubedex.com/project-dolos-testing-kubernetes-on-goog...

_fx6v · on Oct 9, 2018

I tried Azure lately – it felt like I booted into Windows 98. I couldn't stand it. A lot of the CLI UX was buggy for me too.

AWS works. A lot of the UI is dated but the new designs are nice.

GCP is the best. Google has done a great job on Dev UX both CLI and Web Console.

amf12 · on Oct 10, 2018

> I tried Azure lately – it felt like I booted into Windows 98. I couldn't stand it.

Care to elaborate?

_fx6v · on Oct 10, 2018

The sidebar immediately took me back aesthetically. UX wise I felt it was terrible drilling down into resources I was looking for.

Of course UX is also based on my bias (AWS and GCP).

Nonetheless I was surprised at the presentational bar for the azure UI. It felt very dated. In the end I got a hang of it but again, it still felt like using an old pc.

CLI wise I had a lot of errors spinning up resources. My support experience getting it figured out was sucky. I had to abandon azure as a decision quickly with that experience. Too risky.

bk24 · on Oct 10, 2018

a man of many subjective complaints

_fx6v · on Oct 10, 2018

a person lacking constructive comments

bsaul · on Oct 9, 2018

A little of topic because it’s not a comparison : Recently had my first experience using gke ( and kubernetes in general), and although i managed to get something working in the end i would say it’s still pretty rough...

Documentation is a mess : the general layout for google cloud is really a pain to read and navigate, but in addition to that you often have to jump between kubernetes doc and google doc, with some information being on both. Don’t do that please : either make it obvious that people need to get familiar with certain chapters of kube doc, or provide all the info ( me preference would actually go to the first option..)

It’s quite hard to guess what you can do on the gke web interface and what you can’t. You can feel it’s meant for people that really know kubernetes, and not people who discover both gke and kubernetes at the same time.

And for the life of me i couldn’t get the load balancer manage https. I’ve read this was possible, but never saw the actual page explaining how. I ended up using cloudfront but lost the ability to see end user ip in my logs in the process. ( also logging is a real beast to tame on its own, with no obvious way to know what is available by default, what is a paid option, what should be configured on the stackdriver website, and what should be coded )

alpb · on Oct 9, 2018

TLS with GKE with BYO cert isn't very clear I agree. It's basically in Kubernetes Ingress docs: https://kubernetes.io/docs/concepts/services-networking/ingr... For GKE specific features, multiple-certs per Ingress explained here: https://cloud.google.com/kubernetes-engine/docs/how-to/ingre..., and LetsEncrypt here: https://github.com/ahmetb/gke-letsencrypt

If you use GKE ingress, you can access end-user IP in X-Forwarded-For header.

bsaul · on Oct 10, 2018

but then does that mean you’ll need to manually log incoming ip from the header ? or is there a way to see them in the default logs of either the cluster or the load balancer ?

craig_asp · on Oct 10, 2018

"As it stands today I’ve personally used EKS, and AWS in general a lot. I’ve used GKE a bit but only with my own personal credits doing Kubernetes The Hard Way and spinning up a very quick GKE test cluster a while ago."

"I’m being serious when I say this: if the company I’m working for decided to migrate to Azure I’d find a new job."

"It needs to be fast and bug free so that I can build cool automation on top. Working on something like Azure, especially after having worked on AWS for years, would be extremely depressing."

If the article is supposed to be an _unbiased_ comparison between cloud hosted Kubernetes providers, I'd say it's a bit of a fail. For some it would be completely different experience because they have experience with Microsoft technologies. And those people might as well quit if their company moves to AWS or a non-Azure platform.

sjellis · on Oct 10, 2018

> If the article is supposed to be an _unbiased_ comparison between cloud hosted Kubernetes providers, I'd say it's a bit of a fail. For some it would be completely different experience because they have experience with Microsoft technologies. And those people might as well quit if their company moves to AWS or a non-Azure platform.

Azure services can be flaky, and slow, and the "blades" UI is a bold design choice that doesn't really work in practice. No amount of experience with Microsoft technologies can help you with those problems, unfortunately.

bk24 · on Oct 10, 2018

Well said, this article is the furthest thing from unbiased. OP obviously spent alot of years being paid to maintain infrastructure running crappy software and is still feeling the scars. I'm no fan of MS myself but I do acknowledge that they've improved alot from the duct-tape and glued NT days of old.

geezerjay · on Oct 11, 2018

...yet you've addressed none of the points made in the article, and only resorted ro personal attacks.

wskinner · on Oct 9, 2018

> Networking is the other reason. Google is miles ahead of everyone here. Similar story with HA and scaling.

Does anyone know what the author is referring to with this claim? I don't see anything in the sheet to back this up. At least from a high level, all three options support network policies via CNI, and GKE and EKS use the same one, Calico.

stevenacreman · on Oct 9, 2018

Hi, there's a comment on the cross region networking for GKE.

"Each cluster receiving an IP range for nodes and another for the containers inside, which are directly routable across your private network, other clusters, and regions."

AWS and Azure don't have a flat global network. You have to setup VPN's and complicated overlay networks.

halbritt · on Oct 10, 2018

Couple things:

Cross-region networking. To this day, I'm flabbergasted this isn't something that AWS has. All the other networking bits seem more sensible to me as well.

Another, is 2Gbps per core up to 8 cores which is way more throughput than I've seen on any AWS or Azure instance.

On workloads where I care about Network IO:CPU ratio, I'm using 8 core nodes and seeing 16gbps throughput between them consistently.

gazoakley · on Oct 10, 2018

You can do inter region VPC peering: https://aws.amazon.com/about-aws/whats-new/2017/11/announcin...

ti_ranger · on Oct 10, 2018

> Cross-region networking. To this day, I'm flabbergasted this isn't something that AWS has. All the other networking bits seem more sensible to me as well.

AWS tries to avoid letting customers create multi-region or global failure modes, like:

* https://status.cloud.google.com/incident/compute/18005 22-hour incident with GLOBAL impact resulting in many VMs getting duplicate IPs (this no network connectivity) on 2018-06-15

* https://status.cloud.google.com/incident/cloud-networking/18... 3-hour (seemingly GLOBAL) impact to load-balancers (no updates or creations) on 2018-01-03

* https://status.cloud.google.com/incident/compute/16007 - 18-minute GLOBAL outage on 2016-04-11

* https://status.cloud.google.com/incident/compute/15055 - 5-minute GLOBAL outage on 2015-08-04

* https://groups.google.com/forum/#!topic/gce-operations/fynnX... 43-minute multi-region packet loss due to multi-region configuration deployment on 2015-03-07

* https://groups.google.com/forum/#!topic/gce-operations/1uw-q... unscheduled reboot of 28% of instances in multiple regions in ~1.5h on 2014-09-17

There have been recent changes in many AWS services (e.g. DynamoDB, S3, Aurora) to allow cross-region replication without introducing any multi-region dependency, precisely to make it easier to implement multi-region infrastrucure to tolerate a single-region failure (however rare that is on AWS compared to global outages on GCP).

> Another, is 2Gbps per core up to 8 cores which is way more throughput than I've seen on any AWS or Azure instance.

Which instance types did use on AWS? Many recent instance types (r4, i3, c5, r5, z1) support up to 10Gbps with once core (2 vCPU instances). However, you may need to use placement groups ( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placemen... ) in large regions in order to get the full throughput on low TCP connection numbers. The only reason I can think of that would explain why GCP doesn't have this problem is they don't have any regions anywhere near as large as large AWS regions ...

Edit: white-space only formatting changes

halbritt · on Oct 11, 2018

Your point about global failure modes is well-taken. Even still, one could reasonably expect inter-region networking not to be so difficult.

As for the instances I tested, they were m4 and r4. To get full 10GE out of either, I needed to use m4.10xl and r4.8xl, both roughly around $900 per month.

By comparison an n1-standard-8 is about $60/mo and the n1-highcpu-8 is $50/mo. I've tested the former and got something like 15.8 or 15.9gbps using iperf.

IMHO, that's pretty significant.

shshhdhs · on Oct 9, 2018

> AWS and Azure don't have a flat global network. You have to setup VPN's and complicated overlay networks.

AWS has cross-region VPC peering, which is simple to setup and makes it feel “flat”. There’s no need for VPNs or overlay networks.

nodesocket · on Oct 9, 2018

Until you have to reference security groups across the peering connection. You can't reference a different region's security group id as a source, thus your forced to just whitelist the entire subnet/cidr.

Also, while setting up peering is a 10-20 minute task there are lot of constraints, such as the VPC's have to have different CIDR ranges. Oops, if you go with the AWS default, probably all regions using 172.31.0.0/20.

Google got networking right with a flat global any region setup. If you need segmentation then just create a new GCP project. If it turns out projects need to communicate after all, GCP has VPC network peering.

p0rkbelly · on Oct 10, 2018

You can use PrivateLink for this instead of peering. That way you just hit a local IP address/ENI in your VPC. CIDR Ranges don't matter with that. (not sure about the x-region story)

Two different design philosophies, there are been a global GCP networking outages due to having the single flat network. A few this year. All providers have had their issues, but, global blast radiuses concern me.

dilyevsky · on Oct 9, 2018

It supports addressable pods natively on vpc. Others (i think) do not. This simplifies ingess options a great deal.

Also in general GCP networking is better than aws ime (ever tried to have multiple projects in aws share a network?). Don’t know anything about Azure but I suspect msft still sucks at networking things so it’s not great.

marcinzm · on Oct 10, 2018

EKS pods use the VPC networking layer so I think they meet the criteria. They use secondary IPs attached to instances rather than virtual IPs. I’ve been able to route to them from non-EKS instances in the same VPC.

dilyevsky · on Oct 10, 2018

Ah I see aws has their own cni that does that via eni. I never realized that it allowed for native pod addressing (and it was never brought up during their pitches). Might be worse performance for big clusters because it’s still using calico (iptables)

anything4dogs · on Oct 10, 2018

> Might be worse performance for big clusters because it’s still using calico (iptables)

I'm curious why you think that Calico's use of iptables will have performance impacts for large clusters, and what type of performance impacts you expect (bandwidth / latency / cpu / something else?).

From my experience, Calico performs rather well in large clusters (e.g. 2k nodes, appx 100 pods / node, several hundred network policies).

gtaylor · on Oct 10, 2018

I think Calico is only used for network policy on the AWS VPC CNI plugin.

regnerba · on Oct 9, 2018

I am not the author and haven't had time to fully review the spreadsheet they provided.

After an initial look I think he is referring not to the kubernetes specific networking support but the GCP networking.

From the spreadsheet alone GKE has support for cross region load balancers. The author also points out cross region networking.

I don't know the exact advantage the author sees with cross region load balancers since you can only deploy a GKE cluster to a single given region. The GCP LB's are really nice though in that traffic enters Google's network at the closest location to the user and then travel within Google's network, which is wicked fast.

I have not played around with federated clusters in multiple zones so don't know how that would play out either.

They may also just be referring to just general networking from the providers. With the first two points they at least make mention in the spreadsheet of it though.

tl;dr: not sure what the author meant but here here is some food for thought on it.

andyfleming · on Oct 9, 2018

I'm curious how the digital ocean offering compares. I think it's in limited availability now.

kubenaught · on Oct 9, 2018

My company is in the process of migrating off of a managed kubernetes provider. Sure, it's nice to have someone else manage the operations of the master. At the same time, a single customer is entirely insignificant to them.

We've experienced multiple outages from forced upgrades. We usually can find out the reason through github, but it may not be a priority for them to provide the fix. If they do, it could take days or weeks for it to become available. Much revenue has been lost because they can't do something at the speed which we could do it.

api · on Oct 10, 2018

We are preparing to move to Google. Our app is CPU intensive and a killer feature was the ability to spec nodes with a lot of cores and little RAM. We save a lot of money by not paying for RAM we dont need. Google also had a location in LA and we are in LA. 5ms to our cloud is nice.

Arnavion · on Oct 9, 2018

>Show them how some things on Azure need to be done in a clunky web UI, other things need Powershell and other random stuff uses the CLI.

This, at least, is wrong. Everything related to Azure's hosted Kubernetes can be done via the CLI without ever touching the web portal or PS.

drewmassey · on Oct 10, 2018

After several weeks of swimming upstream on a greenfield EKS project I switched gears to GKE. The paradigm just felt way better. For large enterprises that need IAM integration on EKS I suppose that wouldn’t have been an option but GKE just feels way more paradigmatic, for obvious reasons.

Did I mention we are hiring devops engineers? If you are a kubernetes guru email me :-)

paxy · on Oct 9, 2018

The service that launched on Azure in 2017 was ACS (Azure Container Service), which is very different from what AKS is today.

capkutay · on Oct 9, 2018

Slightly unrelated, but what would be the best way to migrate a kubernetes application from one cloud to the other?

scirocco · on Oct 10, 2018

OpenShift is a Kubernetes-based container platform that can run on AWS, Azure, Google...

halbritt · on Oct 10, 2018

If you write your application correctly, it should be portable between kubernetes clusters.

Figuring out how to manage kubernetes on different providers can be a little tricky.

_euvw · on Oct 10, 2018

Anyone here want to share their experiences running k8s on IBM Bluemix (formerly Softlayer)?

po · on Oct 10, 2018

I use and like IBM Cloud (Softlayer) for k8s stuff... I find they are making improvements to it (including docs) on a regular basis and I can often find and talk directly to their devs in a slack channel. Some parts are a bit clunky and I feel like I can see their legacy stuff poking through the abstractions but overall I find it to be good.

I tried Azure and had some issues and didn't really like it. I haven't tried AWS or GKE for k8s yet.

webwanderings · on Oct 10, 2018

Try IBM Cloud Private, it is free.

nbevans · on Oct 9, 2018

His hatred for Azure ruins the article. There seems to be a massive correlation at the moment between Kubernetes and hyped up magpie egomaniac developers.

FlorianRappl · on Oct 9, 2018

I thought 100% the same. While it is known that GKE yields the best managed Kubernetes experience I thought the whole Azure bashing was not only pointless, but also flawed with many inaccurate and even false statements. So much to "honest".

halbritt · on Oct 10, 2018

It correlates with my experience pretty closely.

Granted, I haven't really used Azure in six months.

bk24 · on Oct 10, 2018

The Azure bashing had very little substance to it, just alot of previous MS experience-based dogma.

lawrence143 · on Oct 9, 2018

Good post, Steven. Keep going..

frenchman99 · on Oct 9, 2018

For "Maximum pods per node", it shows GKE at 100, AKS at 110 but still puts GKE in green. How come?

cobookman · on Oct 9, 2018

Does anyone even have more than 20 pods / node? I've not personally seen anything near the 100 pods / node limit in the real world.

regnerba · on Oct 9, 2018

We don't have 100, but definitely more than 20. A quick look at a random node from our cluster:

- 3x ingress pods (1 for an internal load balancer, 2 for an external load balancer)

- 1x cert-manager

- 2x production version of internal app01

- 2x staging version of internal app02

- 1x fluentd

- 1x elastalert

- 1x kubewatch

- 1x prometheus node exporter

- 1x redis for sentry

- 1x sentry web

- 2x sentry worker

- 4x sourcegraph language servers (there are 9 language servers running across the 4 nodes, this node seems to especially like them)

- 1x thelounge

- 1x staging version of app02

- 2x production version of app02

- 1x kube proxy

- 1x kubedns

- 1x couchdb (small footprint just for testing some things)

the node is just over half provisioned and we are due to add another node to the cluster soon.

deepsun · on Oct 10, 2018

What does prometheus node exporter collects from within a pod? I was under impression that to collect node stats you absolutely need to be "outside" any container, i.e. using prometheus-k8s integration so that it pulls nodes stats from k8s api, not nodes themselves).

halbritt · on Oct 10, 2018

The kube-state-metrics service runs as part of the kubernetes API and provides container metrics. It replaces heapster.

The node exporter runs as a daemonset on each node and provides node specific metrics like CPU, memory, disk IO, network IO, etc.

The node metrics comes from the node itself.

regnerba · on Oct 10, 2018

Not an expert, and this may have changed since I deployed this, but I based most of our deployment on the Prometheus helm chart.

The helm charts include this: https://github.com/helm/charts/tree/master/stable/prometheus...

sofaofthedamned · on Oct 9, 2018

I didn't have 100, but I did have around 60 of my own plus all the associated K8s pods (about 10 of them).

It's a bad idea to have that many if they're all Apache or whatever. But when they're simple Golang containers that handle APIs it's ok.

knoxa2511 · on Oct 10, 2018

I've worked with 200-300 K8s customers and 30-40 openshift customers.

On K8s I typically see somewhere in the low 20s as the number of pods per node.

Openshift I'll see high 20s or low 30s as the average number of pods across most openshift customers. But we're seeing some crazy numbers for some larger enterprise customers. 100,400,2500 pods per node. This seems to be driven by the way openshift is licensing

The latter is an absolute nightmare to support, and they seem to have trouble organizing internally as well.

smarterclayton · on Oct 9, 2018

A lot of large clusters I see in the enterprise are on big VMs or metal (48 core boxes) and so can range from 100-400 for a lot of common medium sized workloads. But at that density you also have to have a workloads with low IO requirements and think about network density. Also, very dense multi tenant dev clusters can easily get above 100 for test environments or when lots of people have created demo apps that mostly sit idle.

halbritt · on Oct 10, 2018

I have >100 development environments, each of which is comprised of ~23 pods or so.

I think I'm running 70 16-core nodes.

halbritt · on Oct 10, 2018

My org has hit the 100 pods/node limit and it bugs me immensely. We have a fair number of idle pods and we're using n1-highmem-16 nodes. I'd rather be using much larger nodes, but I'd have a hard time warming the cores and getting my money's worth.

It'd be nice to use larger nodes and have more overhead, but I suspect kubelet would have a hard time.

diek · on Oct 10, 2018

Try explaining to people that Kubernetes may not be the best option for large-scale hosting. I'm currently dealing with people who want to have a 'one pod per site' deployment model for 500,000 customers. Do the math on how many nodes that is with a 30 or 100 pod limit per node.

regnerba · on Oct 10, 2018

Out of curiosity what sort of resource utilization is each deployment?

diek · on Oct 10, 2018

Varies quite a bit. Some deployments are high CPU, high I/O, some are basically idle. All are running the same docker image.