Google hasn't used TCP in the datacenter for years. What they use I don't know. But it's even custom switches with custom chips.
My son did work in graduate school for a clean-slate network implementation of a network for the datacenter. Maybe Google, I don't remember.
One issue I remember they addressed was, scheduling bandwidth for VM migration within their datacenter cloud. See, some customer reserves a 'machine' for their services but really they get something like a VM slice of a ginormous machine (Multiple TB memory, 100 cores or whatnot). Each customer gets some of that and thinks it's a machine of their own.
That customer slice shares the larger machine with maybe 10-100 other customers. Then somebody's slice starts to use more resources and has to be moved to a machine with more 'room'. That wants to be fast and seamless. It can be maybe 1TB of stuff. Their slice doesn't want to be interrupted for long. So this machine needs bandwidth that isn't subscribed for the migration. So does the target machine. So does the cloud network. Then all the addresses have to be re-homed.
Another issue: those competing slices need a virtual network adapter. They each think they own one (each is running a copy of linux or whatnot), but it has to be a physically shared and rationed device. All while using the TCP abstraction on a network-adapter abstraction on a driver abstraction, but really on their new network hardware that's actually present on the ginormous machine. This includes all the TCP features plus the bandwidth reservations the cloud needs etc.
So yes it's abundantly obvious that the datacenter needs (has) a new network.
> Google hasn't used TCP in the datacenter for years.
That's absolutely false. I don't have any sources except for having worked at Google from 2013-2022, but it's not like you quoted any sources either, so...
There's a reason why Google is still releasing stuff like TCP BBR (2017).
Well, the first link googling 'google datacenter hardware' is google's article on how they don't use standard TCP hardware or software in their datacenters. But I guess that was too much to ask...
There's a big difference between "hasn't used TCP in the datacenter for years" and "don't use standard TCP hardware or software in their datacenters". Google uses TCP with non-standard configuration, but they still use TCP.
I can't seem to find the same search result with this query. In fact, searching for "google" "standard TCP" doesn't seem to find any such article (only Google's 2011 publication on TCP Fast-Open deployment, ironically), so it's going to be hard to find what you're talking about.
If you link to the article in question (and relevant quotes) I'm happy to try and clarify your misunderstanding.
Another good reason for VM migrations is cooling. Apparently certain cloud providers save a lot of money on cooling when they migrate vms across devices.
My son did work in graduate school for a clean-slate network implementation of a network for the datacenter. Maybe Google, I don't remember.
One issue I remember they addressed was, scheduling bandwidth for VM migration within their datacenter cloud. See, some customer reserves a 'machine' for their services but really they get something like a VM slice of a ginormous machine (Multiple TB memory, 100 cores or whatnot). Each customer gets some of that and thinks it's a machine of their own.
That customer slice shares the larger machine with maybe 10-100 other customers. Then somebody's slice starts to use more resources and has to be moved to a machine with more 'room'. That wants to be fast and seamless. It can be maybe 1TB of stuff. Their slice doesn't want to be interrupted for long. So this machine needs bandwidth that isn't subscribed for the migration. So does the target machine. So does the cloud network. Then all the addresses have to be re-homed.
Another issue: those competing slices need a virtual network adapter. They each think they own one (each is running a copy of linux or whatnot), but it has to be a physically shared and rationed device. All while using the TCP abstraction on a network-adapter abstraction on a driver abstraction, but really on their new network hardware that's actually present on the ginormous machine. This includes all the TCP features plus the bandwidth reservations the cloud needs etc.
So yes it's abundantly obvious that the datacenter needs (has) a new network.