OpenCL 1.0: The Road to Pervasive GPU Computing
by Derek Wilson on December 31, 2008 6:40 PM EST- Posted in
- GPUs
OpenCL Extending OpenGL
OpenGL 3.0 was a disappointment to game developers who hoped the API would add some key features that ended up being left behind. With the latest release, Khronos relegated OpenGL to professional and workstation applications like CAD/CAM and 3D content creation software, foregoing the wants and desires of game programmers. While not ideal from our perspective (competition is always good), the move is understandable, as OpenGL hasn't been consistently used by any major game engine developer other than Id software for quite some time. DirectX is seen as the graphics API of choice for game programming, and it looks like it will remain that way for the foreseeable future.
But OpenCL does bring an interesting element to the table. One of the major advancements of DirectX 11 will be the addition of a compute shader to the pipeline. This compute shader will be general purpose and capable of operating on diverse data structures that pixel shaders are not geared towards. It will be capable of things like OpenCL is, though it will be tuned and geared toward doing so in the context of graphics. It is, after all, still DirectX. In DX11, the pixel shader and compute shader will share data via data structures rather than any sort of formal input/output mechanism. Because of the high level of integration, game developers (and other graphics engine developers) will be capable of tightly combining current techniques with more general purpose code that can handle a broader array of algorithms.
OpenGL doesn't have anything like this in the works, but OpenCL fixes that. OpenCL is capable of sharing data with OpenGL. And we aren't talking about copying data back and forth easily, we are talking about physically sharing data structures and memory locations. This essentially adds a compute shader to OpenGL for those who want it. Why is that the case? well, offering OpenCL users a means of using OpenGL images and buffers as OpenCL images and buffers means that OpenGL and OpenCL can share data with no copy or conversion overhead. This means that not only are OpenGL and OpenCL able to work on the same data, but that the method by which they communicate is very similar to what DX11 does to allow the passing of data between pixel an geometry shaders.
While game developers may be intrigued, the professional app developers may have more of a reason to get excited. Sure, this will allow OpenGL game developers to use a compute shader like option, but it gives professional application developers the ability to actually combine the real work of simulation or data manipulation with visualization. With support for double precision in hardware that supports it, this could be useful for applications where a lot of real work needs to be done both on the thing being visualized and the visualization itself. This could speed things up quite a bit and allow fluid realtime visualization and manipulation of much more complicated data sets.
Additionally, this compute shader will work on hardware not specifically designed as DX11 class hardware. DX11, as a strict superset of DX10, will extend some functionality to DX10 hardware, but we aren't yet certain about the specifics of this and it may include CS functionality. On top of this, OpenCL should get drivers in the first quarter of next year. This puts the combination of OpenGL 3.0 plus OpenCL 1.0, for the first time in a long time, ahead of DirectX in terms of technology and capability. This is by no means a result of the sluggish and non-innovative OpenGL ARB. But maybe this will inspire more use of OpenGL, which maybe will inspire more innovation from the ARB. But I'm not going to hold my breath on that one.
In any case, the fact that OpenGL and OpenCL can share data without requiring a copy or conversion is a key feature. Not only will OpenCL allow developers to use the GPU for general purpose computing, but using OpenCL with OpenGL will help build a bridge between data parallel computing and visualization. Existing solutions like CUDA and Brook+ haven't done very well in this area, and using OpenGL or DirectX for data parallel processing makes it difficult to get work done efficiently. OpenCL + OpenGL solves these problems.
And maybe we'll even see things go the other way as well. Maybe developers doing massive amounts of parallel data processing using OpenCL not formerly interested in "seeing" what's happening will find it easy and beneficial to enable advanced visualization of their data or the processing thereof through integration with OpenGL. However they are used together, OpenCL and OpenGL will definitely both benefit from their symbiotic relationship.
37 Comments
View All Comments
v12v12 - Wednesday, January 7, 2009 - link
Testing123, ignore plzcorporategoon - Tuesday, January 6, 2009 - link
Did this article go through an editor?chizow - Friday, January 2, 2009 - link
Kind of surprising you didn't directly address this given the amount of FUD being thrown around with regards to PhysX, particularly from AMD and its supporters. You indirectly answered what I had already suspected however, that given Nvidia has stated they plan CUDA to be fully portable to both OpenCL and DX11 there should also be no portability issues for AMD and Brook+:I'm guessing the unfinished thought from the first sentence should read something like "or write a CUDA to Brook+ wrapper" as thats essentially what the last part suggests. Since both vendors will write wrappers for their code to OpenCL, perhaps this wrapper could pull double duty, although it would double the amount of transcoding needed. Less than efficient for sure, but certainly better than a complete impasse due to incompatibility.
ltcommanderdata - Friday, January 2, 2009 - link
Are you suggesting that hardware PhysX acceleration will come to AMD GPUs as soon as nVidia and AMD enable hardware OpenCL support? Because I don't think it's that simple.nVidia seems to have rebranded the meaning of CUDA. Maybe it's all just marketing speak, but CUDA before seemed to mean using nVidia GPUs for GPGPUs operation in general. But now since OpenCL, CUDA seems to more specifically related to the GPGPU interface to nVidia GPUs with languages being separate on top, namely OpenCL, DX11 and C for CUDA. If PhysX is written in C for CUDA, which it no doubt is seeing there wasn't anything else available up to now, then adding support for the OpenCL language in the CUDA interface layer won't help get PhysX supported on AMD GPUs. PhysX will still be written in nVidia's proprietary language which AMD GPUs can't understand. To support AMD GPUs, either nVidia will have to rewrite PhysX from C for CUDA to OpenCL, which would be awfully generous of them or AMD will have to make a C for CUDA to CAL translator and hope PhysX doesn't have any nVidia hardware specific optimizations, which it no doubt has, to mess things up.
apanloco - Friday, January 2, 2009 - link
Anyone knows if multiple applications can take advantage of OpenCL at the same time? I think OpenGL is exclusive to one application, but if OpenCL is used by regular applications this could be a problem?
yyrkoon - Thursday, January 1, 2009 - link
"With R580 AMD (then ATI) actually published part of their ISA and called the initiative CTM (for Close to Metal). Before we had a beta version of CUDA, we had folding@home GPU accelerated on R520 and R580"I also read an interview through gamedev.net where ATI was emulating Direct 3D 10 calls in hardware on one of their x1900xtx's ( Direct 3D 9 hardware )long before I heard about folding@home on the GPU. I remember being so impressed with the technology, that I could not wait until Vista + Directx 10 titles became available. Too bad that there are so few ( if any ) titles that currently take advantage of this technology in the ways I had hoped. Hopefully that will change soon.
ltcommanderdata - Thursday, January 1, 2009 - link
http://www.tgdaily.com/content/view/38764/140/">http://www.tgdaily.com/content/view/38764/140/It's interesting that you mentioned that AMD and nVidia look to be continuing to push their proprietary GPGPU solutions, but AMD has actually made statements they are abandoning their proprietary CTM GPGPU implementation and are moving fully to OpenCL. Admittedly, its probably just a realization that CTM isn't taking off as fast as CUDA and it's in their best interest to push OpenCL. In comparison, nVidia will continue to develop their own CUDA implementation alongside OpenCL.
I wonder if you can get a statement from nVidia whether they will move PhysX to OpenCL? Right now I believe PhysX is written in C for CUDA and of course requires nVidia GPUs for hardware acceleration. If they moved to OpenCL, then AMD GPUs would support it as well. Although perhaps nVidia prefers to keep PhysX to themselves as a product differentiator.
It'd also be interesting if you could ask AMD whether older GPUs like the X1600, X1800, and X1900 will be supported in OpenCL? You already pointed out in your article that the RV530, R520, and R580 had GPGPU folding@home clients so they are certainly capable of GPGPU operation. It'd probably be in ATI's own interest to have as large an OpenCL base as possible and ATI's original FireStream dedicated GPGPU card was R580 based as well. Apple could probably help them as well seeing the number of X1600 and X1900 used in various iMac, MacBook Pro, and Mac Pro generations that could use support for OpenCL in Snow Leopard.
And I agree with melgross that it's strange Apple got no mention in the article seeing that they pretty much developed OpenCL, then submitted it to Khronos, and was no doubt a major driving force behind the quick ratification in order to get it ready for Snow Leopard. And I believe Apple's Aaftab Munshi was the chair of the OpenCL working group.
danger22 - Thursday, January 1, 2009 - link
i am looking forward to the day when I can run my finite element simulations on my GPU. come on Ansys its time for a GPGPU Multiphysics!Amiga500 - Thursday, January 1, 2009 - link
Same boat, same boat... with both CFD and FEA.Have you heard of FEAST-GPU (from Dortmund university)?
Its a GPU accelerated FE package - unfortunately it isn't out in the public domain yet.
Anyhow - from my own digging, I'm not sure if the CPU is a major bottleneck for FE simulations - a lot of what I see tends to point towards the hard-drive and I/O performance.
Sheep100 - Sunday, January 4, 2009 - link
If you provide enough RAM to the analysis you definitely end up CPU limited for single core runs. We have 24 - 32 GB per node for Abaqus and Nastran analyses. The nodes get RAM - bandwidth limited when stepping up the number of cores used or the number of concurrent runs on a node. We are looking forward to the core i7/Nehalem Xeon systems coming soon that will provide a big improvement here. (These codes run slower on Opteron cores.)GPGPU versions of Abaqus, Nastran & Ansys would be very interesting given the large memory bandwidth available on the high end cards. I suspect that re-writing & validating the various solver algorithms to target OpenCL would be a long process. I'm also unsure how possible it is to get data parallelism out of them since the scaling rate of Abaqus, for example, on multi-core systems, even with good bandwidth, is not anywhere near linear. Although this might just highlight the deficiency of the current method of extracting parallelism.