"We hypothesize that flow-consistent routing is responsible for virtually all of...

hinkley · on Oct 31, 2022

Or, by sending the traffic over all routes, there is no way to keep one server from monopolizing all traffic, because each route is oblivious to the stress currently being experienced by all its peers. It has to set a policy using local data, not global data.

The usual failure mode for clever people thinking about software is taking their third person omniscient view of the system status and thinking they can write software that replicated what a human would do in that situation. We are still so very far from human level intuition and reasoning.

wmf · on Oct 31, 2022

Ultimately one server cannot inject more than one link worth of traffic (e.g. 100 Gbps) into the network which is a tiny fraction of total capacity. Researchers have gotten really good results with "spray and pray" for sub-RTT flows combined with latency and queue depth feedback for multi-RTT flows.

hinkley · on Oct 31, 2022

Spray and pray sounds like a reasonable fit for UDP, no?

We’ve had these sorts of bottlenecks before, and they didn’t last. It’s always possible something fundamental changed, but it’s also possible that we are doing something wrong as the motherboard or OS levels and adopting new solutions puts us right back in that space where a couple of servers can easily saturate a network.

If a network card can move data as fast or faster than the main memory bus on a computer then what are we even doing? Should we be treating each subsystem as a special purpose computer and turn the bus into a network switch?

NavinF · on Oct 31, 2022

You just described the motivation behind infiniband (and RDMA in general)

pram · on Oct 31, 2022

The network is the computer™

hinkley · on Oct 31, 2022

Well, I mean yeah, that silly slogan is definitely rattling around in my head.

throwaway892238 · on Oct 31, 2022

And we could totally construct systems that take some approximation of a global internet state into local routing decisions. But that might devalue some incumbent player's position in the market (or create a new privileged set of players) so even if we made a POC, it wouldn't get adopted.

ghshephard · on Oct 31, 2022

This is true, and the congestion mentioned here was subtle and not called out - typically flows are handled in a stateless manner by load balancers that hash on some set of MAC/IP/PORT features of the packet. This is where congestion occurs and the paper mentions it here:

    All that is needed for congestion is for two large flows
    to hash to the same intermediate link; this hot spot will persist 
    for the life of the flows and cause delays for any other
    messages that also pass over the affected link.

It makes logical sense, but I'd love to see the evidence for this.

topranks · on Oct 31, 2022

“Elephant” flows are a definitely a thing.

It all depends on the application and overall use in of the network.

With sufficient flows and a mix of sizes it’ll still tend to even out. But if you’ve significant high-throughout, long lived flows this is definitely something you might hit.