The GPU Advances: ATI's Stream Processing & Folding@Home
by Ryan Smith on September 30, 2006 8:00 PM EST- Posted in
- GPUs
Enter the GPU
Modern GPUs such as the R580 core powering ATI's X19xx series have upwards of 48 pixel shading units, designed to do exactly what the Folding@Home team requires. With help from ATI, the Folding@Home team has created a version of their client that can utilize ATI's X19xx GPUs with very impressive results. While we do not have the client in our hands quite yet, as it will not be released until Monday, the Folding@Home team is saying that the GPU-accelerated client is 20 to 40 times faster than their clients just using the CPU. Once we have the client in our hands, we'll put this to the test, but even a fraction of this number would represent a massive speedup.
With this kind of speedup, the Folding@Home research group is looking to finally be able to run simulations involving longer folding periods and more complex proteins that they couldn't run before, allowing them to research new proteins that were previously inaccessible. This implementation also allows them to finally do some research on their own, without requiring the entire world's help, by building a cluster of (relatively) cheap video cards to do research, something they've never been able to do before.
Unfortunately for home users, for the time being, the number of those who can help out by donating their GPU resources is rather limited. The first beta client to be released on Monday only works on ATI GPUs, and even then only works on single X19xx cards. The research group has indicated that they are hoping to expand this to CrossFire-enabled platforms soon, along with less-powerful ATI cards.
The situation for NVIDIA users however isn't as rosy, as while the research group would like to expand this to use the latest GeForce cards, their current attempts at implementing GPU-accelerated processing on those cards has shown that NVIDIA's cards are too slow compared to ATI's to be used. Whether this is due to a subtle architectural difference between the two, or if it's a result of ATI's greater emphasis on pixel shading with this generation of cards as compared to NVIDIA we're not sure, but Folding@Home won't be coming to NVIDIA cards as long as the research group can't solve the performance problem.
Conclusion
The Folding@Home project is the first of what ATI is hoping will be many projects and applications, both academic and commercial, that will be able to tap the power of GPUs. Given the results showcased by the Folding@Home project, the impact on the applications that would work well on a GPU could be huge. In the future we hope to be testing technologies such as GPU-accelerated physics processing for which both ATI and NVIDIA have promised support, and other yet to be announced applications that utilize stream processing techniques.
It's been a longer wait than we were hoping for, but we're finally seeing the power of the GPU unleashed as was promised so long ago, starting with Folding@Home. As GPUs continue to grow in abilities and power, it should come as no surprise that ATI, NVIDIA, and their CPU-producing counterparts are looking at how to better connect GPUs and other such coprocessors to the CPU in order to further enable this kind of processing and boost its performance. As we see AMD's Torrenza technology and Intel's competing Geneseo technology implemented in computer designs, we'll no doubt see more applications make use of the GPU, in what could be one of the biggest-single performance improvements in years. The GPU is not just for graphics any more.
As for our readers interested in trying out the Folding@Home research group's efforts in GPU acceleration and contributing towards understanding and finding a cure for Alzheimer's, the first GPU beta client is scheduled to be released on Monday. For more information on Folding@Home or how to use the client once it does come out, our Team AnandTech members over in our Distributed Computing forum will be more than happy to give a helping hand.
Modern GPUs such as the R580 core powering ATI's X19xx series have upwards of 48 pixel shading units, designed to do exactly what the Folding@Home team requires. With help from ATI, the Folding@Home team has created a version of their client that can utilize ATI's X19xx GPUs with very impressive results. While we do not have the client in our hands quite yet, as it will not be released until Monday, the Folding@Home team is saying that the GPU-accelerated client is 20 to 40 times faster than their clients just using the CPU. Once we have the client in our hands, we'll put this to the test, but even a fraction of this number would represent a massive speedup.
Click to enlarge |
With this kind of speedup, the Folding@Home research group is looking to finally be able to run simulations involving longer folding periods and more complex proteins that they couldn't run before, allowing them to research new proteins that were previously inaccessible. This implementation also allows them to finally do some research on their own, without requiring the entire world's help, by building a cluster of (relatively) cheap video cards to do research, something they've never been able to do before.
Unfortunately for home users, for the time being, the number of those who can help out by donating their GPU resources is rather limited. The first beta client to be released on Monday only works on ATI GPUs, and even then only works on single X19xx cards. The research group has indicated that they are hoping to expand this to CrossFire-enabled platforms soon, along with less-powerful ATI cards.
The situation for NVIDIA users however isn't as rosy, as while the research group would like to expand this to use the latest GeForce cards, their current attempts at implementing GPU-accelerated processing on those cards has shown that NVIDIA's cards are too slow compared to ATI's to be used. Whether this is due to a subtle architectural difference between the two, or if it's a result of ATI's greater emphasis on pixel shading with this generation of cards as compared to NVIDIA we're not sure, but Folding@Home won't be coming to NVIDIA cards as long as the research group can't solve the performance problem.
Conclusion
The Folding@Home project is the first of what ATI is hoping will be many projects and applications, both academic and commercial, that will be able to tap the power of GPUs. Given the results showcased by the Folding@Home project, the impact on the applications that would work well on a GPU could be huge. In the future we hope to be testing technologies such as GPU-accelerated physics processing for which both ATI and NVIDIA have promised support, and other yet to be announced applications that utilize stream processing techniques.
It's been a longer wait than we were hoping for, but we're finally seeing the power of the GPU unleashed as was promised so long ago, starting with Folding@Home. As GPUs continue to grow in abilities and power, it should come as no surprise that ATI, NVIDIA, and their CPU-producing counterparts are looking at how to better connect GPUs and other such coprocessors to the CPU in order to further enable this kind of processing and boost its performance. As we see AMD's Torrenza technology and Intel's competing Geneseo technology implemented in computer designs, we'll no doubt see more applications make use of the GPU, in what could be one of the biggest-single performance improvements in years. The GPU is not just for graphics any more.
As for our readers interested in trying out the Folding@Home research group's efforts in GPU acceleration and contributing towards understanding and finding a cure for Alzheimer's, the first GPU beta client is scheduled to be released on Monday. For more information on Folding@Home or how to use the client once it does come out, our Team AnandTech members over in our Distributed Computing forum will be more than happy to give a helping hand.
43 Comments
View All Comments
ProviaFan - Sunday, October 1, 2006 - link
The problem with F@H scaling over multiple cores is not what one might first think. I've run F@H on my Athlon X2 system since I bought it when the X2's became available in mid-2005. Since each F@H client is single-threaded, you simply install a separate command line client for each core (the GUI client can't run more than one of itself at once), and once they are installed as Windows services, they distribute nicely over all of the available CPUs. The problem with this is that each client has its own work unit with the requisite memory requirements, which with the larger units can become significant if you must run four copies of the client to keep your quad-core system happy. The scalability issues mentioned actually involve in the difficulty in making a single client with a single unit of work multi-threaded. I'm hoping that the F@H group doesn't give up trying to make this possible, because the memory requirements will become a serious issue with large work units when quad and octo core systems become more readily available.highlandsun - Sunday, October 1, 2006 - link
Two thoughts - first, buy more RAM, duh. But that raises a second point - I've got 4GB of RAM in my X2 system. If resource consumption is really such a problem, how is a GPU with a measly 256MB of RAM going to have a chance? How much of the performance is coming from having superfast GDDR available, what kind of slowdown do they see from having to swap data in and out of system memory?As for Crossfire (or SLI, for that matter) why does that matter at all, these things aren't rendering a video display any more, they don't need to talk to each other at all. You should be able to plug in as many video cards as you have slots for and run them all independently.
It sounds to me like these programs are compute-bound and not memory bandwidth-bound, so you could build a machine with 32 PCIEx1 slots and toss 32 GPUs into it and have a ball.
icarus4586 - Tuesday, October 3, 2006 - link
It depends on the type of core, and the data it's working on. There's an option when you set up the client for whether or not to do "big" WUs. I've found that a "big" WU generally uses somewhere around 500MB of system memory, while "small" ones use 100MB or less. I would assume that they'd target it to graphics card memory sizes. Given that the high-end cards they're targeting have 256MB or 512MB of RAM, this should be doable.gersson - Saturday, September 30, 2006 - link
I'm sure my PC can do some good...3.5Ghz C2D and x1900 Crossfire.I've done some Folding @ home before but could never get into it. I'll give it a spin when the new client comes out.
Pastuch - Saturday, September 30, 2006 - link
I just wanted to say thanks to Anandtech for writing this article. I have been an avid reader for years and an overclocker. People always talk about folding in the OC scene but I never took the time to learn just what folding@home is. I had no idea it was research into Alzheimer's. I'm downloading the client right now.Griswold - Sunday, October 1, 2006 - link
Unfortunately, many overclocked machines that are stable by the owners standards dont meet the standards of such projects. The box may run rock stable but can you vouch for the the results to be correct due to the fact that the system is running outside of its specs?If you read the forums of these projects, you will soon see that the people running them arent too fond of overclocking. I've never seen any figures, but I bet there are many, many work units being discarded (yet you still get credit) because they're useless. However, the benefit still seems to outweight the damage. There are just so many people contributing to the project because they want to see their name on a ranking list - without caring about the actual background. I guess this can be called a win-win situation.
JarredWalton - Sunday, October 1, 2006 - link
If a WU completes - whether on an OC'ed PC or not - it is almost always a valid result. If a WU is returned as an EUE (or generates another error that apparently stems from OC'ing), then it is reassigned to other users to verify the result. Even non-OC'ed PCs will sometimes have issues on some WUs, and Standford does try to be thorough - they might even send out all WUs several times just to be safe? Anyway, if you run OC'ed and the PC gets a lot of EUEs (Early Unit Ends), it's a good indication that your OC is not stable. Memory OCs also play a significant role.nomagic - Saturday, September 30, 2006 - link
I'd like to put some of my GPU power into some use too.Griswold - Sunday, October 1, 2006 - link
Read the article.Furen - Saturday, September 30, 2006 - link
Has ATI updated it at all? I dont have an ATI video card around here so I can't go check it out but from what I've seen it's was an extremely barebone application.