A better question would be when will serious ML researchers stop using proprietary frameworks? The open source projects should work towards using something more open than Cuda, probably Vulkan. It's a real shame that Nvidia has a whole industry in their grip, and nobody seems to mind.
Why would serious ML researchers stop using proprietary frameworks? They're the best that are out there right now, and ahead in pretty much every way than any other open source framework.
Researchers want to get their work done. They don't want to fight against their tools.
They shouldn't, it was poor wording from me. What I meant was when will there be non proprietary frameworks available for researchers to use?
There's also the issue that by relying on proprietary frameworks that work now, you might be painting yourself into a corner if Nvidia changes something in the future and then you have to adapt to them because you have no choice.
When somebody pays somebody to write those frameworks.
When nvidia sees a need, they can change CUDA over night to address it, and they pay people to do that.
When you need to do the same in Vulkan, that’s a multi year process till your extension is “open”. A teaser here whose job is to get something done with ML has better things to do than going through that.
We do mind, the incentives in academia are just not setup for people to build a replacement. Google are working on their TPUs which are even more closed. Facebook, MS. et al do not seem to be working on porting popular ML framework to something else...
cuda is basically equivalent to the combination of the instruction set and the C/C++ language, API and standard library, and all bundled together. In the CPU's case, C/C++, API and standard library are designed to be hardware platform agnostic, instruction sets is separate and open, and x86 have two players. Cuda became this way because it is designed by the hardware company and its business need. We really should be thankful that C/C++, open compilers, API and library exists in the CPU world back then. And hardware vendors like Intel published raw instruction sets and allow any software to target the instruction sets. I can't imagine in a world where people uses the programming language and libraries designed by a hardware CPU company and all that software only runs on hardware from such company.
Some things were better back in the day. E-mail could not be invented today, it would be a mix of incompatible mail system like Facebook mail, Google mail, Apple mail, Slack mail, some distributed open source mail nobody used, etc.
Go read the SMTP and related RFCs and see if you still think that email back in the day wasn’t already a mix of incompatible systems. The parts about address formats alone was quite an eye opener for me.
Yes, I remember IBM had its own email system back in the early 80's when I worked there, AT&T had its own in the later 80's. Even Prodigy/Compuserve/AOL had proprietary email systems (IIRC) with internet email bolted on.
And getting email to the internet at large was no mean feat: I remember having to do a slew of UUCP addressing to get an email from AT&T to the internet (something along the lines of "astevens@redhill3!ihnp4@mit.edu). It was the wild west.
I applied for a sysadmin job at BBN in the late 90s, and X.400 knowledge was listed in the description. Thankfully that turned out to be a red herring.
email wouldn't be invented by private institutions today because nobody in their right mind would willingly get rid of their moat... it'd take government regulation to get there and you know how efficient that would be. technical problems are easy in comparison.
Lots of people mind, but AMD screwed themselves by not jumping on deep learning years ago and the major open source efforts are all corporate sponsored and standardized around cuda. If you think you can make better, please do.
Yes, it is to a large degree AMD's fault, but NVIDIA also actively acted maliciously by neglecting OpenCL support, which meant that during the critical time 5-6 years ago, there was no realistic chance for the open alternative to succeed.
While developing a small competitor to Tensorflow back then (Leaf), we were one of the few frameworks that also tried to support OpenCL, but the additional dev work made it unfeasible.
SYCL looks nice in theory, but AMD seems determined to stay irrelevant in GPGPU. To use SYCL on AMD you use a backend that targets ROCm, which still doesn't work on Windows. So why would I use that instead of Cuda and just not care about AMD's GPUs? A portable framework loses a lot of its meaning when it's not portable, and now SYCL just looks like an unnecessary layer over Cuda and OpenMP.
GPGPU has been a thing for 20 years now, and I still can't easily write code that works on Nvidia and AMD and ship it to consumers on Windows. From what I've seen OpenCL seems to be dying, AMD doesn't care about compute on Windows or on their Radeon cards, and Cuda continues to be the only real option year after year in this growing segment. Why would anyone buy anything else than Nvidia if they're using Photoshop, Blender, DaVinci Resolve, or other compute heavy consumer software? Maybe it's unrealistic to hope that any library can fix this and just rename GPGPU to Nvidia compute and be done with it.
I really don't get this. You can just use a SYCL backend that compiles to OpenCL or SPIR and target AMD GPUs. I did it yesterday. Use ComputeCpp and target AMD GPUs.
If you're using Blender it can absolutely make a ton of sense to use AMD hardware. For most of the time where Blender supported GPGPU AMD was the best choice, and I set up rendering servers with AMD hardware for that express purpose.
I feel like a big part of this attitude is from not actually having tried it. Because SYCL works fine on AMD. In fact, you have more backend options for AMD than NVidia.
Which AMD cards though? It's one thing to be able to buy AMD hardware for your data center or workstations, but as far as I can see their latest consumer platform RDNA2 doesn't support ROCm [0]. Radeon DNA doesn't support Radeon Open Compute. ROCm have never supported Windows, and it doesn't look like there are any future plans for it either. And when looking at which AMD cards support SPIR or SPIR-V I can't find any good list, but I do find issues where AMD removed support in the drivers and told people to use old drivers if they needed it [1]. Compare to Nvidia where you can use any Geforce card you can find, so if you have a Nvidia GPU you know Cuda will work.
If you control your own hardware and software stack maybe an AMD CDNA card is fine, but if you want to ship software to end users it seems to be difficult to even know what will work. So you use cross platform code for a worse experience on Nvidia and spotty support on AMD, or only Cuda and accept that it's Nvidia only but will give you a better experience.
I haven't done a lot of GPGPU programming, but I've tried to look at it from time to time, and I've been disheartened by it every time. Nvidia's handling of OpenCL, AMD's disregard for SPIR. This is what an AMD representative had to say in 2019 [2]:
"For intermediate language, we are currently focusing on direct-to-ISA compilation w/o an intervening IR - it's just LLVMIR to GCN ISA. [...] Future work could include SPIRV support if we address other markets but not currently in the plans."
Only if you want to write C++, which a diminishing number of ML researchers or practitioners do. SYCL makes more sense for traditional HPC, but since it works at the source level it doesn't make as much sense for ML framework codegen. Something like MLIR will hopefully help though (eventually).
If you're going to be writing CUDA, then I don't think C++ will add a lot of overhead. In fact, I think that SYCL/C++ being single source makes it more ergonomic for GPU programming than higher level languages.
Precisely. And one of the worst companies by the attitude to the FOSS world. I hope, there will be more development from the AMD side for this. Otherwise we are doomed.
There is a lot that has been done. AMD's HIP and the open SYCL infrastructure can already run TensorFlow and can run a large amount of CUDA code. The reason the transition isn't being done is momentum and tradition, really.