It does not sound like that. If there is no hardware to run it on, you can't make the software. Then the hardware needs to become commonplace, so it needs to displace enough older products to even get software vendors interested. You'll see software making use of it as soon as the tech is used on 100$ phones.
I think a large reason for that is because HSA is currently only available on AMD products that are uncompetitive CPU-wise with the competition. Developers, especially ones relying on CPU capabilities*, will only use what they already have, and they're buying Intel CPUs because AMD has been especially uncompetitive in traditional single-threaded CPU performance.
Basically I don't think we'll really see HSA software on PC until Zen APUs are a thing...and that's assuming Zen isn't Bulldozer MKII.
*compare this to desktop GPUs where not only can you pretty easily swap one for another but you can even have graphics drivers from multiple vendors installed at the same time.
This comment seems weird. Just to bash AMD. If this is the case then ARM and all the rest are screwed since their CPUs are even slower.
The main issue will be prevalence and compatibility with normal systems (without HSA capabilities). It seems this is especially beneficial for lower end systems with slower CPUs and GPUs (go figure). To get more out of those systems. Laptops with low TDP APUs for example would see really good performance if I understand this correctly.
Uhhh, I would be all over a Zen APU. I wanted a Kaveri APU, but it's single-threaded performance is too poor for emulation, so I had to make due with an Intel iGPU instead.
It's not a matter of absolute CPU speed. It' s a matter of how a SW development choice impacts on the market it is destined to. In desktop market, that is under Windows, developing a HSA software means targeting less than 10% of the market. To probably achieve a speed only slightly faster than that achieved with a normal optimized serialized code for Intel processors, which are much more powerful in INT and FP calculations (expecially with the new AVX).
In mobile market, if most of the SoC builder are going to create HSA HW, then a SW written for this market is going to hit most of the market with improved speed and, most important, with reduced power usage.
So AMD hope to see HSA idea wide spreading in SW development is that mobile market is going to use it, otherwise they will never see a single SW written for the desktop market. It would just be anti-economical for anyone.
Try to always see a comment against AMD as bashing or hating is a trend fanboys are never going to miss just to identify themselves as fanboys really easily.
Um, ARM hasn't been slower than AMD for a while. I mean, specific implementations of ARM can now match slower Intel Core M and Core i3 CPUs at Geekbench and other benchmarks now, which beat mobile AMD processors.
The only benchmarks ARM wins in are benchmarks that have been hand-optimized for ARM. ARM is nowhere close to Intel's Core i performance. Throw something objective like Linpack at it and ARM falls apart.
If I'm not mistaken Zen will be a CPU without the GPU. Back to basics I guess. You will have to wait for the next generation to see the Zen core used in an APU.
C++ AMP from Microsoft with help from AMD came out quite some time ago in preparation for HSA. Hopefully they invest in it more, with equivalent functionality coming to other languages.
What exactly does this mean? The Mediatek slide suggests that today ARM CPUs+GPUs do not share a single coherent address space. I was under the impress that, since forever, - SoCs provide a single address space for CPU and GPU - that address space is coherent (at least on the CPU side) in the sense that APIs like Metal don't require any sort of "flush the data structures the CPU has created" type calls.
So what's missing? Is it coherence in the other direction (ie GPU can perform a computation, and CPU can trivially read it without the GPU having to perform an explicit flush)? And/or is it the ability to interrupt the GPU (thus allowing for time-sharing, and for GPU computation to take arbitrarily long, rather than having to be broken up into chunks of certain maximum duration)?
In SoC's as with integrated GPU's you assign a part of RAM to the GPU. From then on it belongs to the GPU and acts like the RAM on a dGPU. The CPU can't play with it and any data needed by it has to be copied into the CPU RAM. HSA aim's to eliminate that but aside from AMD noone did it yet. As the article says they need the right interconnect for that.
Shared virtual memory Cache coherency The ability for a processor to schedule work on another processor A software layer to compile HSAIL down to your native instruction set
Seen that on SoCs CPU and GPU share the same memory controller (same bus, same memory pool), all the limitations seems into MMU and OS rather than needing added HW resources like new buses. Cache coherency is needed only for L1 cache on CPU and GPU, but again that can be solved through the use of the same MC. Or I am missing something?
I assumed other ARM SoCs were like Apple's SoCs. Apparently not. As far as I can tell, Apple SoCs ARE like I described. To quote the Metal documents: "Resources allocated with the shared storage mode are stored in memory that is accessible to both the CPU and the GPU. " (The shared storage mode is default for iOS.)
Again, as far as I can tell, the reason Metal took an additional year to move to OSX is precisely the split memory model for dGPUs on OSX, and the changes that were necessary to the Metal driver (along with API additions like describing memory as shared [old style Metal, default for iOS] vs managed vs private [alternative memory models, both for OSX, private can be used for some purposes on iOS, but is not necessary there].
The primary point, as far as I and other knowledgeable observers can tell, of the large (but high latency) L3 on the Apple SoCs is not so much to serve as further out larger cache, but to serve as the coherence point between the CPU and the GPU.
So, to reiterate, and to build on what Ryan said below. As far as I can tell Apple HAVE - shared VM - coherency (at least one way, CPU to GPU; possibly not the other GPU to CPU) - The extent to which the GPUs are controllable by the OS as traditional virtualizable processors (the point I covered by talking about interrupts) I remain unclear on. I don't remember ever seeing anything about limits to how long a Metal computation kernel can run for, which suggests (but does not prove) that the OS can interrupt a kernel.
- HSAIL compliance is by far the least interesting aspect of this. What matters is the concepts of the technology, not a particular implementation. It's like being interest in 64-bit computing taking off, rather than insisting that the only 64-bit ISA that matters is x64.
It is truly exciting to see that other companies are totally on-board with this. I mean reading this article was like reading an example of an engineer's dream solution, one single standard, multiple companies on-board, and the fact that you can even mix and match IP from different vendors and still have it be HSA compliant is just icing on the cake! Hat's off to AMD for doing this the right way.
On another thought, any word from Intel on this? I don't see their logo in the chart above, but they would seem like a likely candidate, I mean they do build cpu/gpu's and from what I understand they do actually support SOME HSA features, but they haven't really made a big deal about them publicly.
Also, how is HSA different (other than being supported by multiple companies) from what is already available with CUDA 7.5 (e.g., Intel CPU + NVIDIA Maxwell GPU)?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
20 Comments
Back to Article
DigitlDrug - Tuesday, October 6, 2015 - link
Any word on the software side? Until it's integrated into the major frameworks and API's (.Net, Java, DX, Android, etc), I don't see much uptick.It sounds like more of a situation where "if we build it they will come" several years after availability. . . .
SleepyFE - Tuesday, October 6, 2015 - link
It does not sound like that. If there is no hardware to run it on, you can't make the software. Then the hardware needs to become commonplace, so it needs to displace enough older products to even get software vendors interested. You'll see software making use of it as soon as the tech is used on 100$ phones.Nintendo Maniac 64 - Tuesday, October 6, 2015 - link
I think a large reason for that is because HSA is currently only available on AMD products that are uncompetitive CPU-wise with the competition. Developers, especially ones relying on CPU capabilities*, will only use what they already have, and they're buying Intel CPUs because AMD has been especially uncompetitive in traditional single-threaded CPU performance.Basically I don't think we'll really see HSA software on PC until Zen APUs are a thing...and that's assuming Zen isn't Bulldozer MKII.
*compare this to desktop GPUs where not only can you pretty easily swap one for another but you can even have graphics drivers from multiple vendors installed at the same time.
Azix - Tuesday, October 6, 2015 - link
This comment seems weird. Just to bash AMD. If this is the case then ARM and all the rest are screwed since their CPUs are even slower.The main issue will be prevalence and compatibility with normal systems (without HSA capabilities). It seems this is especially beneficial for lower end systems with slower CPUs and GPUs (go figure). To get more out of those systems. Laptops with low TDP APUs for example would see really good performance if I understand this correctly.
III-V - Tuesday, October 6, 2015 - link
"This comment seems weird. Just to bash AMD. If this is the case then ARM and all the rest are screwed since their CPUs are even slower."Er, AMD's on life support -- ARM et al. are not.
Nintendo Maniac 64 - Tuesday, October 6, 2015 - link
"Just to bash AMD"Uhhh, I would be all over a Zen APU. I wanted a Kaveri APU, but it's single-threaded performance is too poor for emulation, so I had to make due with an Intel iGPU instead.
Nintendo Maniac 64 - Tuesday, October 6, 2015 - link
*itsCiccioB - Wednesday, October 7, 2015 - link
It's not a matter of absolute CPU speed. It' s a matter of how a SW development choice impacts on the market it is destined to.In desktop market, that is under Windows, developing a HSA software means targeting less than 10% of the market. To probably achieve a speed only slightly faster than that achieved with a normal optimized serialized code for Intel processors, which are much more powerful in INT and FP calculations (expecially with the new AVX).
In mobile market, if most of the SoC builder are going to create HSA HW, then a SW written for this market is going to hit most of the market with improved speed and, most important, with reduced power usage.
So AMD hope to see HSA idea wide spreading in SW development is that mobile market is going to use it, otherwise they will never see a single SW written for the desktop market. It would just be anti-economical for anyone.
Try to always see a comment against AMD as bashing or hating is a trend fanboys are never going to miss just to identify themselves as fanboys really easily.
michael2k - Wednesday, October 7, 2015 - link
Um, ARM hasn't been slower than AMD for a while. I mean, specific implementations of ARM can now match slower Intel Core M and Core i3 CPUs at Geekbench and other benchmarks now, which beat mobile AMD processors.patrickjp93 - Monday, October 12, 2015 - link
The only benchmarks ARM wins in are benchmarks that have been hand-optimized for ARM. ARM is nowhere close to Intel's Core i performance. Throw something objective like Linpack at it and ARM falls apart.SleepyFE - Wednesday, October 7, 2015 - link
If I'm not mistaken Zen will be a CPU without the GPU. Back to basics I guess. You will have to wait for the next generation to see the Zen core used in an APU.Gigaplex - Wednesday, October 7, 2015 - link
C++ AMP from Microsoft with help from AMD came out quite some time ago in preparation for HSA. Hopefully they invest in it more, with equivalent functionality coming to other languages.name99 - Tuesday, October 6, 2015 - link
"although ARM isn’t at full HSA compliance"What exactly does this mean? The Mediatek slide suggests that today ARM CPUs+GPUs do not share a single coherent address space. I was under the impress that, since forever,
- SoCs provide a single address space for CPU and GPU
- that address space is coherent (at least on the CPU side) in the sense that APIs like Metal don't require any sort of "flush the data structures the CPU has created" type calls.
So what's missing?
Is it coherence in the other direction (ie GPU can perform a computation, and CPU can trivially read it without the GPU having to perform an explicit flush)?
And/or is it the ability to interrupt the GPU (thus allowing for time-sharing, and for GPU computation to take arbitrarily long, rather than having to be broken up into chunks of certain maximum duration)?
SleepyFE - Tuesday, October 6, 2015 - link
In SoC's as with integrated GPU's you assign a part of RAM to the GPU. From then on it belongs to the GPU and acts like the RAM on a dGPU. The CPU can't play with it and any data needed by it has to be copied into the CPU RAM. HSA aim's to eliminate that but aside from AMD noone did it yet. As the article says they need the right interconnect for that.Ryan Smith - Tuesday, October 6, 2015 - link
To add to that, you need the following.Shared virtual memory
Cache coherency
The ability for a processor to schedule work on another processor
A software layer to compile HSAIL down to your native instruction set
CiccioB - Wednesday, October 7, 2015 - link
Seen that on SoCs CPU and GPU share the same memory controller (same bus, same memory pool), all the limitations seems into MMU and OS rather than needing added HW resources like new buses.Cache coherency is needed only for L1 cache on CPU and GPU, but again that can be solved through the use of the same MC.
Or I am missing something?
name99 - Tuesday, October 6, 2015 - link
I assumed other ARM SoCs were like Apple's SoCs. Apparently not.As far as I can tell, Apple SoCs ARE like I described. To quote the Metal documents:
"Resources allocated with the shared storage mode are stored in memory that is accessible to both the CPU and the GPU. "
(The shared storage mode is default for iOS.)
Again, as far as I can tell, the reason Metal took an additional year to move to OSX is precisely the split memory model for dGPUs on OSX, and the changes that were necessary to the Metal driver (along with API additions like describing memory as shared [old style Metal, default for iOS] vs managed vs private [alternative memory models, both for OSX, private can be used for some purposes on iOS, but is not necessary there].
The primary point, as far as I and other knowledgeable observers can tell, of the large (but high latency) L3 on the Apple SoCs is not so much to serve as further out larger cache, but to serve as the coherence point between the CPU and the GPU.
So, to reiterate, and to build on what Ryan said below. As far as I can tell Apple HAVE
- shared VM
- coherency (at least one way, CPU to GPU; possibly not the other GPU to CPU)
- The extent to which the GPUs are controllable by the OS as traditional virtualizable processors (the point I covered by talking about interrupts) I remain unclear on. I don't remember ever seeing anything about limits to how long a Metal computation kernel can run for, which suggests (but does not prove) that the OS can interrupt a kernel.
- HSAIL compliance is by far the least interesting aspect of this. What matters is the concepts of the technology, not a particular implementation. It's like being interest in 64-bit computing taking off, rather than insisting that the only 64-bit ISA that matters is x64.
extide - Wednesday, October 7, 2015 - link
It is truly exciting to see that other companies are totally on-board with this. I mean reading this article was like reading an example of an engineer's dream solution, one single standard, multiple companies on-board, and the fact that you can even mix and match IP from different vendors and still have it be HSA compliant is just icing on the cake! Hat's off to AMD for doing this the right way.On another thought, any word from Intel on this? I don't see their logo in the chart above, but they would seem like a likely candidate, I mean they do build cpu/gpu's and from what I understand they do actually support SOME HSA features, but they haven't really made a big deal about them publicly.
Ktracho - Friday, October 9, 2015 - link
Also, how is HSA different (other than being supported by multiple companies) from what is already available with CUDA 7.5 (e.g., Intel CPU + NVIDIA Maxwell GPU)?jowsjows22 - Wednesday, June 22, 2016 - link
My children required a form last year and located a business that has a ton of fillable forms . If others require it too , here's http://goo.gl/PJtmFv