ATI Radeon HD 2900 XT: Calling a Spade a Spade
by Derek Wilson on May 14, 2007 12:04 PM EST- Posted in
- GPUs
Texturing, Caches, and Memory
Texturing
R600 features less texture hardware than we would expect to see, though AMD stands by the argument that compute power will come out on top when it matters. At the same time, we can't compute anything if we don't have any data to work with. So let's take a look at what AMD has done with their texture units.
There are four texture units in R600, one for each SIMD unit. These units don't share resources with the hardware in the SIMD units and are independently scheduled by AMD's dispatch processor. The dispatch processor is able to determine what data will be needed for threads about to execute and can handle setting up the texture units without waiting for the SIMD unit to request data and come up empty.
Texture units on the R600 are able to make both filtered and unfiltered texture requests no matter what shader is running. Unfiltered textures are useful with non-image-based texture data like vertex textures, normal maps, and generic blocks of data. Filtered requests will generally be for image data to be used in determining the color of a pixel. R600 can address one unfiltered texture per clock per texture unit and one filtered textures per clock per texture unit. Filtered units can be used to request unfiltered textures if necessary, providing an extra four unfiltered textures in place of one filtered texture.
The unfiltered texture requests will come back through four fp32 texture samplers (one per component), while the filtered requests will return 16 data points which will be run through the texture filtering hardware resulting in four filtered texture samples. The hardware can at best produce 32 single component fp16 unfiltered results per texture unit per clock. More practically, each texture unit can produce four bilinear filtered four component fp16 samples per clock alongside four unfiltered results. For textures with fp32 components, two clocks would be required to complete a bilinear filter process, as only half the data is loaded at a time to conserve bandwidth.
This is definitely a step up for R600, as R5xx hardware doesn't have texture filtering hardware for floating point textures. All told, with each of its four texture units working, R600 can consume up to 32 unfiltered textures or 16 unfiltered textures plus 16 filtered textures (as long as they're fp16 or fewer bits and we're only using bilinear filtering).
G80 is built with four texture address units and eight texture filters per block of 16 SPs. In total, this means NVIDIA's hardware can produce 32 filtered texture samples per clock (again these are fp16 and bilinear filtered). Of course, NVIDIA is operating on twice as many threads per clock, so it is conceivable that they would benefit more from having the extra filtered data.
We will have to wait and see if AMD's approach of providing unfiltered and filtered texture access in parallel pays off. For the general case on pixel shaders, we would want to see more filtered textures per clock, but with vertex and geometry shaders coming into the mix this could be a good way to save hardware space while offering more texturing power. On a final texturing note, AMD implemented "percentage closer" filter hardware for depth stencil textures. This will allow developers to implement fast soft shadows. The details of the implementation weren't indicated though.
86 Comments
View All Comments
mostlyprudent - Monday, May 14, 2007 - link
Frankly, neither the NVIDIA nor the AMD part at this price point is all that impressive an upgrade from the prior generations. We keep hearing that we will have to wait for DX10 titles to know the real performance of these cards, but I suspect that by the time DX10 titles are on the shelves we will have at least product line refreshes by both companies. Does anyone else feel like the graphics card industry is jerking our chains?johnsonx - Monday, May 14, 2007 - link
It seems pretty obvious that AMD needs a Radeon HD2900Pro to fill in the gap between the 2900XT and 2600XT. Use R600 silicon, give it 256Mb RAM with a 256-bit memory bus. Lower the clocks 15% so that power consumption will be lower, and so that chips that don't bin at full XT speeds can be used. Price at $250-$300. It would own the upper-midrange segment over the 8600GTS, and eat into the 8800GTS 320's lunch as well.GlassHouse69 - Monday, May 14, 2007 - link
If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.GlassHouse69 - Monday, May 14, 2007 - link
wrong section. dt is doing that today it seems to a few peoplexfiver - Monday, May 14, 2007 - link
Hi, thank you for a really in depth review. While reading other 'earlier' reviews I remember a site using Catalyst 8.38 and reported performance improvements upto 14% from 8.37. Look forward to Anandtech's view on this.xfiver - Monday, May 14, 2007 - link
My apologies it was VR zone and 8.36 to 8.37 (not 8.38)GlassHouse69 - Monday, May 14, 2007 - link
If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.Gary Key - Tuesday, May 15, 2007 - link
I have worked extensively with four 8.37 releases and now the 8.38 release for the upcoming P35 release article. The 8.37.4.2 alpha driver had the top performance in SM3.0 heavy apps but was not very stable with numerous games, especially under Vista. The released 8.37.4.3 driver on AMD's website is the most stable driver to date and has decent performance but nothing near the alpha 8.37 or beta 8.38. The 8.38s offer great benchmark performance in the 3DMarks, several games, and a couple of DX10 benchmarks from AMD.
However, the 8.38s more or less broke CrossFire, OpenGL, and video acceleration in Vista depending upon the app and IQ is not always perfect. While there is a great deal of promise in their performance and we see the potential, they are still Beta drivers that have a long ways to go in certain areas before their final release date of 5/23 (internal target).
That said, would you rather see impressive results in 3DMarks or have someone tell you the truth about the development progress or lack of it with the drivers. As much as I would like to see this card's performance improve immediately, it is what it is at this time with the released drivers. AMD/ATI will improve the performance of the card with better drivers but until they are released our only choice is to go with what they sent. We said the same thing about NVIDIA's early driver issues with the G80 so there are not any fanboys or people taking money under the table around here. You can put all the lipstick on a pig you want, but in the end, you still have a pig. ;-)
Anand Lal Shimpi - Monday, May 14, 2007 - link
There's nothing sinister going on, ATI gave us 8.37 to test with and told us to use it. We got 8.38 today and are currently testing it for a follow-up.Take care,
Anand
GlassHouse69 - Monday, May 14, 2007 - link
wow dood. you replied!Yes, I have been wondering about the ethics of your group here for about a year now. I felt this sorta slick leaning towards and masking thing goign on. Nice to see there is not.
Thanks for the 1000's of articles and tests!
-Mr. Glass