It would be very helpful to deeply understand the truth behind this management failing. The actual players involved, and their thinking. Was it truly a blind spot? Or was it mistaken priorities? I mean, this situation has been so obvious and tragic, that I can't help feeling like there is some unknown story-behind-the-story. We'll probably never really know, but if we could, I wouldn't spend quite as much time wearing a tinfoil hat.
My guess is it’s just incompetence. Imagine you’re in charge of ROCm and your boss asks you how it’s going. Do you say good things about your team and progress? Do you highlight the successes and say how you can do all the major things CUDA can? I think many people would. Or do you say to your boss “the project I’m in charge of is a total disaster and we are a joke in the industry”? That’s a hard thing to say.
Intel was never famous for good GPUs, and they are basically the only ones still trying to make something out of OpenCL, with most of the tooling going beyond what Khronos offers.
one API is much more than a plain old SYCL distribution, and still.
I'd argue Intel fell is large part because of Intel's own complacency and incompetence. If Intel had taken AMD seriously, they'd probably still be a serious competitor today.
if you asked AMD execs they'd probably say they never had the money to build out a software team like NVIDIA's. that might only be part of the answer. the rest would be things like lack of vision, "can't turn a tanker on a dime", etc.
CUDA was built during the time AMD was focusing every resource on becoming competitive in the CPU market again. Today they dominate the CPU industry - but CUDA was first to market and therefore there's a ton of inertia behind it. Even if ROCm gets very good, it'll still struggle to overcome the vast amount of support (read "moat") CUDA enjoys.
True. After all Nvidia hasn't built tensorflow or PyTorch. That stuff was bound to be built on the first somewhat viable platform. Rocm is probably far ahead of where cuda was back then, but the goal moved.
Has to be lack of vision. I refuse to believe it's impossible to _do_, but it sounds like it's impossible to _specify_ within AMD. Like they're genuinely incapable of working out what the solution might look like.
Nobody is asking AMD to rebuild the entire NVidia ecosystem. Most people just want to run GPGPU code or ML code on AMD GPUs without the entire computer crashing on them.
according to public information NVIDIA started working on CUDA in 2004, that was before AMD made the ATI acquisition.
my suspicion is that back then ATI and NVIDIA had very different orientations. neither AMD nor ATI were ever really that serious about software. so in that sense i guess it was a match made in heaven.
so you have a cultural problem, which is bad enough, then you add in the lean years AMD spent in survival mode. forget growing software team, they had to cling on to fewer people just to get through.
now they're playing catch-up in a cutthroat market that's moving at light speed compared to 20 years ago.
we're talking about a major fumble here so it's easy to lose context and misunderstand things were a little more complex than they appeared.