It's only available in solder-on packages. Meaning OEM only. It's meant to replace the lower end DGPU's in mid-range ultrabooks. Not for consumers. So the end cost is about the same when you subtract the cost of the DGPU
That aspect of it pisses me off. I've been looking forward to building a new desktop to upgrade from my current Bloomfield when Haswell comes out, and so wanted this to be the integrated DRAM version.
The L4 cache would be a huge win for servers as well if they didn't make the stupid thing soldered only. While you could argue that noone ever upgrades a server CPU, they just replace the server, we do have hundreds of dual socket HP servers here ordered with only one socket populated because of core count limits on software licensing.
Well, if the rumors that Intel intends not to offer user-installed versions of Broadwell are correct, perhaps this is the reason. So they can offer more eDRAM versions of the CPU.
I'm aware that it's not for consumers. What I was trying to express is that where graphics performance is important, given the choice between a machine using an i7 with its accompanying price premium plus this on top vs. something like an i3/i5 with discrete, I am still going to be looking at the latter.
Yea, not quite sure what the purpose is. This type of addition would be fantastic on a higher end-ish i3 or i5 that would give me a decent system and decent graphics, but more portable. Sticking it in with an i7, most people will still want a dGPU to power their games because even with eDRAM, Haswell won't be powerful enough.
Well, the 15" model has a GT 650M, so even better performance than IVB would have with eDRAM. And I've read that it is more a software issue (and single thread CPU performance issue) than a GPU performance issue. Many people are fine running 1440p/1600p displays off their Ultrabooks in Windows without performance drawbacks in the general UI. :)
The GT3e in the new Haswell chip has to perform 30% better overall compared to the GT650M in order for Apple to ditch the dGpu. There is still a competitor in the GT 750M newchip from Nvidia which might just double the GT650 and if so would still be useful for high-end Retinas if Apple so decides. The improvements to the 13inch Retinas will certainly be iGpu while the highest end would probably get the dGpu (maybe of AMD persuasion ?). Kepler would be king for a long while as Intel still struggles to be "good enough" in the gpu arena.
It seems very much as if they were made with each other in mind. Even with all the latest updates the Retina Macbook Pros drop too many frames during basic UI animations for my liking, Hasewell with GT3(possibly e) would have been great for it.
Although I wouldn't buy an apple product until they change some of their policies regarding hardware/software separation, offering this gt3e along with a highend firegl/quadro would be nice. Now, I don't know if infrastructure is there, but imagine having three modes of operation: 1. running the quadro, 2. running the gt3e, 3. running the gt3 with the dram switched off. That's not including the various hybrid modes. I don't know if this is possible yet, but it would make for some interesting possibilities considering how powerful intel has made their gpus of late.
As Anand said in the article, GT3e isn't expected to be available in low-power parts suitable for ultrabooks, which is too bad really since that market could benefit from increased IGP performance. Since GT3e is only available in higher power parts where discrete GPU alternatives with the same or better performance are available, its use case seems to be more for BOM/board space simplification rather than directly improving performance constrained situations.
Well, Anand also mentioned that the embedded DRAM added heat, so I'd imagine that's a part of why it's not expected to be available in low-power parts suitable for ultrabooks.
I'd actually be surprised if there isn't a ULV part suitable for ultra books down the line. The catch is that CPU performance would likely have to be further reduced to account for the eDRAM. The other option Intel would have is to make the eDRAM strictly a GPU feature and scale the active amount based upon workloads. In other words, while using a word processing, the eDRAM gated down to 32 MB in size without any L4 cache functionality. While gaming it activates all 128 MB.
Well, look at the Razer Edge. It includes a discrete GPU. I imagine a few companies will give it a whirl and slap this into something similar that should cost a lot less. In the meanwhile, I imagine AMD/nVidia will be forced to lower their prices to match the GT3e's new baseline for such systems.
Could be a decent price drop for better integrated/low-end GPU's.
I would also find a Surface Pro based on this chip appealing. Unfortunately, I doubt you'll see a processor with higher than ultrabook power draw in a Surface Pro. It suck, though, as people with a lumberjack build, like me, wouldn't care about an extra Kg in batteries and cooling system.
Not to be that guy, but has Intel been talking about stepping up their graphics driver development as well? GT650 performance would be enough to run quite a few games at reasonable settings, but without vigilant driver updates it all means nothing.
Exactly. I think the emphasis on the GPU aspect of it is short-sighted --- interesting if it somehow affects your Haswell buying decision, but no more than that.
The larger story here is Intel finally adopting eDRAM as the next step in increasing performance. As always Intel is being cautious this time giving us one specialty part with something of an add-on. But the interesting question is where do we go with this when it's ready for the big-time? In particular: do we remove the current L3, give each core a beefed up L2 (maybe 1MB or so), and move to an eDRAM L3 of 128 (or 256?, or 384?) MB? I guess it all hinges on how close Intel can get that eDRAM to the CPU. Can they manufacture it on the same chip? or are we finally ready for the sort of chip-to-chip (as opposed to package-to-package) contact solutions people have been talking about for years?
I'll be interested to see how embedded RAM works in a Xeon CPU.
First, as an L4 cache. At my job, we develop an in-memory database that is typically deployed on a machine with 1 or 2 TB of DRAM and multiple CPUs in a NUMA configuration connected by Intel's QPI. So near and far memory, where caching far memory in near memory could make a big difference.
But secondly, we have experimented with performing certain calculations (statistics, predictive analytics, etc) on the GPU. The trouble was moving the data from the CPU memory to the GPU memory, and then moving the results back. I'm wondering if on a Xeon with an embedded GPU (yeah, I know, doesn't exist today), if the embedded DRAM would be shared between the CPU and GPU so that the data wouldn't have to be moved.
To address the first idea, IBM does use eDRAM in their high end POWER line and their System Z mainframes. The POWER7+ has 80 MB of eDRAM on-die. The mainframes are even beefier with 48 MB of eDRAM L3 cache on die and a massive 384 MB external L4. Those systems are not x86 based but they are fast. Intel going with eDRAM would likely perform similarly. However, it should be noted that Intel has focused on latencies with their caches, not capacities.
Having a common cache between the integrated CPU and GPU would be advantageous for the very reasons you cite. While it didn't get a lot of press, Intel does have a unified address space between their HD graphics in Sandybridge/Ivy Bridge. So essentially Haswell GT3e will have the functionality.
With regards to Xeon + discrete GPU though, there will always be overhead by merit of PCI-e connectivity even if the GPU is using the same x86 address space. It'll be higher bandwidth and lower latency than today but the results will be nothing like the on-die integration we're seeing on the consumer side. Just having everything on-die helps a lot. Then again, both AMD and Intel can remove PCI-e and ship discrete GPU's using their own proprietary processor interconnects (HyperTransport and QPI respectively). At this point the discrete GPU would be logically seen as another socket node.
that last bit of connecting a gpu via QPI/HyperTransport is a very interesting proposition. However what would be the performance gains? it's not even twice the speed of x16 pcie3, so i guess it's mostly direct memory access & latency. right?
For the most part, yes, lower latency and direct memory access as if they were another socket/core. This idea isn't new either. One of Intel's early slide decks regarding Larrabee had mention of a Larrabee chip that'd drop into a quad socket motherboard.
I'm actually quiet surprised that AMD hasn't gone this route or have many any mention of it on their road maps. They do have new sockets coming out next year and HSA GPU's so perhaps next year we'll hear something a bit more concrete.
The other thing about using a common socket between a CPU and a GPU would be that each aspect would have to support a common memory standard. AMD looks to be going with GDDR5 for mobile parts for bandwidth reasons. Considering that laptops (and especially ultra books) are not designed for upgradability or 24/7 rock hard stability. It also means that more desktop/server centric sockets would imply support for ECC protected DIMMs. This would also bring huge amounts of memory support to the GPU side. These two things would be huge on the GPU side.
One thing moving to QPI/Hypertransport for GPU's would result in is the eventual removal of nVidia from this space. PCI-e will still hang around but hardware using it would be at a disadvantage.
-Hmm, are SB & IB really have unified memory address space between CPU & GPU? CPU can access GPU memory pointers & vice-versa? Like AMD's upcoming Kaveri? Any documentation/white paper from Intel on this?
-What I know is their L2/L3 caches are definitely shared/unified, though. Intel's Instantaccess DX extension is only implemented in Haswell, so I doubt this is the case.
Ivy Bridge cant share L3 with CPU and GT and both use it at the same time, Haswell fix's this issue and unifies it. A lot of ppl are getting this wrong.
Thus the focus on silicon photonics Intel, IBM, and others have been working on. Interconnects such as QPI, HyperTransport, or Xilinx's RapidIO use too much power and/or require much more space for multiple parallel i/o traces. An optical interconnect eliminates many of the restraints imposed by QPI. The optical signal frequency can be orders of magnitude higher than what is possible for electronic signals without increasing power or thermal load. It is not possible, long term, to continue to integrate components onto a single larger and larger piece of silicon (ie. SoC).
Silicon photonics is a way to connect chip-to-chip at full chip speed, or in other words connect multiple chips together to make a single large virtual chip. Since the optical signal can also maintain this high speed for far further trace distances, it can also be used to make chip-to-chip interconnects even when the chips are on different motherboards in separate cluster nodes. Think a rack of servers that function as a single, very large SoC.
We will first see it used for optical Thunderbird, (ie. extending PCIe bus off-chip), but probably for special purpose chip-to-chip soon after. For example, a CPU and discrete GPU + eDRAM pair in a 2-chip module connected via silicon waveguide.
Would the initial GT3e have that extra eDRAM available to the CPU as well, or is that more speculation on how such a feature might make its way onto a server part?
I don't personally have much interest in a faster embedded GPU, but a pile of L4 available to the CPU sounds like it could make for some more interesting use cases.
2nd to last paragraph: "Based on leaked documents, the embedded DRAM will act as a 4th level cache and should work to improve both CPU and GPU performance."
But from the leaks the TDP of the Haswell parts with GT3e is too high for the 13" Macbooks. I would love to see it in there, but I would guess it's just getting GT2.
I'd like to know if the CPU can dip into the eDRAM as a L4 cache of sorts if the GPU is underused or disabled. It would be a shame to waste that huge eDRAM die right beside the processor if the GPU goes unused.
This doesn't make a bit of sense. If the primary purpose is to be L4 cache for the CPU and boost performance that way, then why not make it available in desktop and server chips, which would offer far more plausible benefits than laptops?
And if the primary purpose is to be GPU memory bandwidth, then why 128 MB? I could see big benefits to having the heavily-accessed depth buffer and frame buffer in cache, but at 1080p, those are a tad under 8 MB each. Maybe you want to put extra frame buffers there, for use in post-processing or to have both the front and back frame buffers cached. But that's not going to get you anywhere near 128 MB, and if it's for graphics, you're going to end up using most of that space for lightly accessed textures where it doesn't matter.
Surely they're not planning on moving the really heavily used stuff that doesn't take much space and currently goes in GPU cache to slower eDRAM. That would be as dumb as making an Intel i740 without dedicated video memory because they want to use slower system memory instead.
Do tell how you propose to stick an 8 MB frame buffer in < 1 MB of L2 cache. For comparison, a Radeon HD 7970 has 768 KB of L2 cache, a Radeon HD 7870 has 512 KB, and a GeForce GTX 680 has 512 KB. Older or lower end cards tend to have less yet.
And the L1 and L2 caches are presumably needed for smaller but more frequently accessed data such as program binaries and uniforms that are needed at every single shader invocation throughout the graphics pipeline.
Clearly it does make a difference, as GPUs accessing the system DDR memory take a huge performance penalty. Otherwise why would GPU makers strap on so much memory?
Yes, accessing textures from video memory rather than having to pass it through a PCI Express bus does make a big difference. But if you want to do that, you have to have enough video memory to actually store all of your textures in video memory. That's why modern video cards nearly always come with at least 1 GB and often more. 128 MB would let you stick a small fraction of your textures and vertex data in it, but nowhere near all of it except at low texture resolutions or in older games where you don't need much video memory.
If textures are the goal, you'd likely see more benefit from adding a third channel of system memory, which lets you use a few GB if you need to. And while hardly cheap, that might well be cheaper than 128 MB of eDRAM.
For modern graphical purposes, I have to agree, I don't see the point of adding 128MB of eDRAM. If it is for textures, any 3d game made in the last decade uses a few hundred MB, if not well over 1GB in some cases, at any reasonable resolution.
I really only see this being useful as a cache for the CPU or for 2D applications.
Also a unified fast memory between the GPU and CPU is exactly what is needed for good GPGPU performance, up until now the bottleneck was transferring things from the GPU memory to the CPU memory.
While this could conceivably have some big benefits for certain GPGPU applications, I really doubt that's the primary intended purpose. If you want to do serious GPGPU, you get a desktop or workstation or server or some such so that you can dissipate serious amounts of heat. You definitely don't get a laptop with dinky little integrated graphics that sports a peak GFLOPS rating of not much, which is the only place that GT3e is going to be used.
IIIRC, compositing window managers provide a draw buffer for every open window, even if it's (partially) hidden. With a couple of windows open, that's easily tens of megabytes. Is there any reason to not keep those in the eDRAM?
It's a way to make the research into eDRAM pay for itself while it goes on. It's essentially a research chip in this iteration. This means (a) if they can't deliver, no catastrophe (b) if the power utilization is higher than the actual benefits for general use cases, again no catastrophe.
I see the fact that they have gone this route rather than giving all the CPUs eDRAM as telling us that eDRAM is actually harder to get right than it looks, and they are being cautious. One might confirm this by noting that AMD has not jumped on eDRAM as a way to goose their sales, even though it would have been an obvious point of differentiation from Intel.
Given how often it's repeated that Apple pushed for the eDRAM and higher performing integrated graphics in general, it seems like the classes the GT3e is put into miss the product it would benefit most, namely the 13" Macbook Pro and Retina version in particular. It seems only quad core 50+ watt TDP processors will get GT3e, and only ultrabooks will get GT3 without the eDRAM, so 13" laptops seem like they're stuck with GT2. A 15" laptop would generally have enough room for a better dedicated graphics chip, so it seems like it's missing what would be a perfect match for it.
Where does it say that? It seems to me like GT3e will be available in the non ultrabok (read >17W TDP) chips, which mean the 35w class stuff they use in the 13" MBPr will have GT3e available.
If those slides are an exhaustive list, there aren't any dual-core standard voltage parts at all. I think it's likely that when the dual-core chips are released some of them might have GT3e.
I'm confused, it's an integrated GPU. If it's too big for ultrabooks then what's the point? Anything bigger (thicker) and I'm getting a dedicated GPU with proper driver updates. Their will either be ultrabooks that use it or it will only be Apple using it. On the other hand (I find this nearly impossible) if they can actually hit a GT650M in terms of across the board performance I might actually care this time.
If they were going to pair in a low end GPU in a larger laptop, it's still cost, power, and motherboard area savings. That could be used to reduce weight, increase battery size, etc etc.
It is not nearly impossible, from a hardware viewpoint alone. The currently existing difference between HD4000 and an 650M is almost completely covered by the increase in Execution Units alone. Add some improvements in IPC efficiency, some catching up by the driver development team, and the performance boost from the integrated memory, and it is actually quiet likely that GT3e will achieve 650M levels.
On the AMD method, GDDR5 is obviously great for GPUs, but the memory timings are much slower than DDR3, even accounting for the clock speed differences. I know differences in latency don't really affect modern CPUs a whole lot, but this difference would be bigger than the slowest to the fastest DDR3 latencies, it's many times higher. I wonder how that will turn out.
I'm also curious, IBM integrated eDRAM into its Power7 processors a long time ago, I wonder what the performance implications of the CPU being able to access the eDRAM in Haswell will be, and if they'll carry that over to the high end server/workstation markets like Power7.
The comparison for using eDRAM on the POWER7 was similar latencies to having an external SRAM in the same package (like IBM did with the POWER6). Going SRAM for the L3 cache would have been faster in the POWER7 but IBM felt that capacity/density was more important considering the chip's market.
The basic desiderata for caches are - for L1 what matters most is latency - for L2 what matters most is bandwidth - for L3 what matters most is capacity.
The power draw may be too much for a light mobile device but could be excellent for something like an HTPC. I'm very curious as to what will be offered in the mini-ITX format in a few months. Some good GPU power and a lower overall power envelope than current IB choices would be worth waiting for.
The models I've seen so far show GT3e in 47W TDP parts. The Mac Mini had a discreet GPU in it at some point, so without that I'm sure something of the size can theoretically handle 47W.
To be honest I was hoping for (if not expecting) quite a bit more than 64GB/s... as beneficial as lower latency is, quad channel DDR3 already gives us 50. Even for notebook graphics, that's far from ground-breaking. 650M performance seems like a stretch, especially considering how this will be relegated to larger laptops, at which point having a dGPU sounds much more feasible.
True, but you don't get quad channel ddr3 in a laptop and especially not with ram soldered to the mobo. This is was perf/watt decision with mobile as a priority.
You get a lot of temporal goodness for games (streaming texture buffer should fit in the 128MB nicely) and smaller datasets and you free up the pipes to main memory so they can keep the L4 full. It's a win-win.
Servers will come when it can scale to really benefit them.
It's only 128MB. With 64GB/s, you can already read or overwrite the complete memory in only 2ms, thus less than a quarter of even a 120Hz frame. Since most graphics software should not work iteratively, this seems fast enough. And why does 650M performance seem like a stretch to you based on this number, when the 650M itself does have exactly the same bandwidth, or even less with the DDR3-version? Even the 675M comes only with 96GB/s.
So I guess this is what all those Fabs extra capacity are about. Not actually Fabing a lots of Custom Silicon for others, but doubling their usage with eDRAM.
Intel's biggest problem is still their Drivers. Its Ugly. Its Slow, its not tunes, and you can feel no love with it when comparing to Nvidia.
In server environments, I can see embedded DRAM acting as a real boon to multi-core performance.
But 128MB is not that much more than the already available 20MB of L3 cache in a Xeon, while it is much less than the 32GB (or more) of available RAM. Sounds to me like only a very specific class of software would be able to profit from it. And if you have software that really speeds up with more low-latency memory, does that not mean you're better of running it on a Xeon Phi anyways?
If the previous rumor of a TDP of 55W for a chip carrying GT3e and a graphics performance roughly equivalent to a GT 650m, then I imagine large screen (15"+) "ultrabooks" carrying this chip. Laptops like the Macbook Pro and Razor. Without the need for a descrete GPU, other manufacturers would not need premium materials or smart engineering to stay within the thermal limit of their thin designs.
An additional $50 is not much when you look at the price of a mobile i7 chip. It's another 2-4% price bump for the overall price of the typical i7 carrying laptop. If the GT3e performs similar to a GT 640m (more realistic) it would easily be a worthwhile upgrade for the people who only need/want mid-range GPU performance.
With 2-channel DDR3 memory controller Haswell will barely run games only in low settings and if you change to average or high details it'll be an unplayable slide show with 2-10fps and integrated 128MB memory will not help much as games nowadays require at least 1GB or higher of graphics memory.
So Haswell will survive only for 4-5 months till AMD Kaveri appears in October-November. Kaveri will support 4-channel GDDR5 and will have much better GPU than Haswell so for gaming Haswell will be a joke comparing with Kaveri.
Sony Playstation 4 will come with 8GB of unified 4-channel 256bit GDDR5 memory sub-system (with 176GB/s bandwidth) for both CPU and GPU. The 64GB/s bandwidth of Haswell eDRAM is still 3 times lower than what AMD did for Sony in PS4 and AMD Kaveri will have similar architecture as in PS4.
uptil I looked at the bank draft of $7284, I didn't believe that...my... best friend could actualy earning money in there spare time on-line.. there neighbour had bean doing this 4 only 18 months and just repayed the morgage on their place and purchased a great Lotus Elise. this is where I went, Jump44.comCHECK IT OUT
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
83 Comments
Back to Article
Spunjji - Tuesday, April 23, 2013 - link
$50 on top of the already-inflated cost of an i7... no, I don't think I'm too interested in that just yet. Could be nice a generation or two from now.aruisdante - Tuesday, April 23, 2013 - link
It's only available in solder-on packages. Meaning OEM only. It's meant to replace the lower end DGPU's in mid-range ultrabooks. Not for consumers. So the end cost is about the same when you subtract the cost of the DGPUglugglug - Tuesday, April 23, 2013 - link
That aspect of it pisses me off. I've been looking forward to building a new desktop to upgrade from my current Bloomfield when Haswell comes out, and so wanted this to be the integrated DRAM version.The L4 cache would be a huge win for servers as well if they didn't make the stupid thing soldered only. While you could argue that noone ever upgrades a server CPU, they just replace the server, we do have hundreds of dual socket HP servers here ordered with only one socket populated because of core count limits on software licensing.
HisDivineOrder - Wednesday, April 24, 2013 - link
Well, if the rumors that Intel intends not to offer user-installed versions of Broadwell are correct, perhaps this is the reason. So they can offer more eDRAM versions of the CPU.Spunjji - Thursday, April 25, 2013 - link
I'm aware that it's not for consumers. What I was trying to express is that where graphics performance is important, given the choice between a machine using an i7 with its accompanying price premium plus this on top vs. something like an i3/i5 with discrete, I am still going to be looking at the latter.seamonkey79 - Tuesday, April 30, 2013 - link
Yea, not quite sure what the purpose is. This type of addition would be fantastic on a higher end-ish i3 or i5 that would give me a decent system and decent graphics, but more portable. Sticking it in with an i7, most people will still want a dGPU to power their games because even with eDRAM, Haswell won't be powerful enough.jeffkibuule - Tuesday, April 23, 2013 - link
Do you think Apple was relying on Ivy Bridge having eDRAM in its 2012 Retina MacBook Pros?Death666Angel - Tuesday, April 23, 2013 - link
Well, the 15" model has a GT 650M, so even better performance than IVB would have with eDRAM. And I've read that it is more a software issue (and single thread CPU performance issue) than a GPU performance issue. Many people are fine running 1440p/1600p displays off their Ultrabooks in Windows without performance drawbacks in the general UI. :)fteoath64 - Thursday, April 25, 2013 - link
The GT3e in the new Haswell chip has to perform 30% better overall compared to the GT650M in order for Apple to ditch the dGpu. There is still a competitor in the GT 750M newchip from Nvidia which might just double the GT650 and if so would still be useful for high-end Retinas if Apple so decides. The improvements to the 13inch Retinas will certainly be iGpu while the highest end would probably get the dGpu (maybe of AMD persuasion ?). Kepler would be king for a long while as Intel still struggles to be "good enough" in the gpu arena.Flunk - Sunday, May 5, 2013 - link
The GT 750M is already available, it's a higher-clocked version of the same chip the 650M uses.tipoo - Tuesday, April 23, 2013 - link
It seems very much as if they were made with each other in mind. Even with all the latest updates the Retina Macbook Pros drop too many frames during basic UI animations for my liking, Hasewell with GT3(possibly e) would have been great for it.jasonelmore - Tuesday, April 23, 2013 - link
yeah but who's gonna buy a 15" Retina Macbook Pro with no Dedicated Graphics for $2700? They are still going to have to put DGPU's in it.This will probably go in the 13" but it's gonna run much hotter without a Process shrink.
HisDivineOrder - Wednesday, April 24, 2013 - link
There are many Apple diehards who would buy a Retina Macbook Pro even if it had a Pentium 3 1.13ghz and was on fire at time of delivery.tuxRoller - Wednesday, April 24, 2013 - link
Although I wouldn't buy an apple product until they change some of their policies regarding hardware/software separation, offering this gt3e along with a highend firegl/quadro would be nice. Now, I don't know if infrastructure is there, but imagine having three modes of operation: 1. running the quadro, 2. running the gt3e, 3. running the gt3 with the dram switched off.That's not including the various hybrid modes.
I don't know if this is possible yet, but it would make for some interesting possibilities considering how powerful intel has made their gpus of late.
FITCamaro - Tuesday, April 23, 2013 - link
I'd love to see a Surface Pro with this chip in it.ltcommanderdata - Tuesday, April 23, 2013 - link
As Anand said in the article, GT3e isn't expected to be available in low-power parts suitable for ultrabooks, which is too bad really since that market could benefit from increased IGP performance. Since GT3e is only available in higher power parts where discrete GPU alternatives with the same or better performance are available, its use case seems to be more for BOM/board space simplification rather than directly improving performance constrained situations.lolstebbo - Tuesday, April 23, 2013 - link
Well, Anand also mentioned that the embedded DRAM added heat, so I'd imagine that's a part of why it's not expected to be available in low-power parts suitable for ultrabooks.Death666Angel - Tuesday, April 23, 2013 - link
Yeah, they are making things a bit easier for the OEMs and stealing money from nVidia (and AMD to a lesser part). Good move for them.Kevin G - Tuesday, April 23, 2013 - link
I'd actually be surprised if there isn't a ULV part suitable for ultra books down the line. The catch is that CPU performance would likely have to be further reduced to account for the eDRAM. The other option Intel would have is to make the eDRAM strictly a GPU feature and scale the active amount based upon workloads. In other words, while using a word processing, the eDRAM gated down to 32 MB in size without any L4 cache functionality. While gaming it activates all 128 MB.HisDivineOrder - Wednesday, April 24, 2013 - link
Well, look at the Razer Edge. It includes a discrete GPU. I imagine a few companies will give it a whirl and slap this into something similar that should cost a lot less. In the meanwhile, I imagine AMD/nVidia will be forced to lower their prices to match the GT3e's new baseline for such systems.Could be a decent price drop for better integrated/low-end GPU's.
JPForums - Tuesday, April 23, 2013 - link
I would also find a Surface Pro based on this chip appealing. Unfortunately, I doubt you'll see a processor with higher than ultrabook power draw in a Surface Pro. It suck, though, as people with a lumberjack build, like me, wouldn't care about an extra Kg in batteries and cooling system.FITCamaro - Tuesday, April 23, 2013 - link
Well the Surface Pro is already not "thin". But yeah I don't care about the extra weight.This Guy - Wednesday, April 24, 2013 - link
It's thin compared to the previous few years of wintel tablets. And it costs less than a two year gym membership!Mr Perfect - Tuesday, April 23, 2013 - link
Not to be that guy, but has Intel been talking about stepping up their graphics driver development as well? GT650 performance would be enough to run quite a few games at reasonable settings, but without vigilant driver updates it all means nothing.tipoo - Tuesday, April 23, 2013 - link
That's true, the Sandy Bridge GPUs are already running behind on GPU drivers and they're fairly new.tipoo - Tuesday, April 23, 2013 - link
Meanwhile Nvidia still updates ridiculously old cards.marc1000 - Tuesday, April 23, 2013 - link
I wonder how would 128mb of cache behave on server loads... Maybe intell will do some variaton of this chip later for servers!tipoo - Tuesday, April 23, 2013 - link
That was my thought, the Power7 uses eDRAM too for those kinds of high end loads, I wonder if this will go to the server space with the Xeons too.name99 - Saturday, April 27, 2013 - link
Exactly. I think the emphasis on the GPU aspect of it is short-sighted --- interesting if it somehow affects your Haswell buying decision, but no more than that.The larger story here is Intel finally adopting eDRAM as the next step in increasing performance. As always Intel is being cautious this time giving us one specialty part with something of an add-on. But the interesting question is where do we go with this when it's ready for the big-time? In particular: do we remove the current L3, give each core a beefed up L2 (maybe 1MB or so), and move to an eDRAM L3 of 128 (or 256?, or 384?) MB?
I guess it all hinges on how close Intel can get that eDRAM to the CPU. Can they manufacture it on the same chip? or are we finally ready for the sort of chip-to-chip (as opposed to package-to-package) contact solutions people have been talking about for years?
holmberg - Tuesday, April 23, 2013 - link
I'll be interested to see how embedded RAM works in a Xeon CPU.First, as an L4 cache. At my job, we develop an in-memory database that is typically deployed on a machine with 1 or 2 TB of DRAM and multiple CPUs in a NUMA configuration connected by Intel's QPI. So near and far memory, where caching far memory in near memory could make a big difference.
But secondly, we have experimented with performing certain calculations (statistics, predictive analytics, etc) on the GPU. The trouble was moving the data from the CPU memory to the GPU memory, and then moving the results back. I'm wondering if on a Xeon with an embedded GPU (yeah, I know, doesn't exist today), if the embedded DRAM would be shared between the CPU and GPU so that the data wouldn't have to be moved.
Any thoughts on that?
Kevin G - Tuesday, April 23, 2013 - link
To address the first idea, IBM does use eDRAM in their high end POWER line and their System Z mainframes. The POWER7+ has 80 MB of eDRAM on-die. The mainframes are even beefier with 48 MB of eDRAM L3 cache on die and a massive 384 MB external L4. Those systems are not x86 based but they are fast. Intel going with eDRAM would likely perform similarly. However, it should be noted that Intel has focused on latencies with their caches, not capacities.Having a common cache between the integrated CPU and GPU would be advantageous for the very reasons you cite. While it didn't get a lot of press, Intel does have a unified address space between their HD graphics in Sandybridge/Ivy Bridge. So essentially Haswell GT3e will have the functionality.
With regards to Xeon + discrete GPU though, there will always be overhead by merit of PCI-e connectivity even if the GPU is using the same x86 address space. It'll be higher bandwidth and lower latency than today but the results will be nothing like the on-die integration we're seeing on the consumer side. Just having everything on-die helps a lot. Then again, both AMD and Intel can remove PCI-e and ship discrete GPU's using their own proprietary processor interconnects (HyperTransport and QPI respectively). At this point the discrete GPU would be logically seen as another socket node.
bernstein - Tuesday, April 23, 2013 - link
that last bit of connecting a gpu via QPI/HyperTransport is a very interesting proposition. However what would be the performance gains? it's not even twice the speed of x16 pcie3, so i guess it's mostly direct memory access & latency. right?Kevin G - Tuesday, April 23, 2013 - link
For the most part, yes, lower latency and direct memory access as if they were another socket/core. This idea isn't new either. One of Intel's early slide decks regarding Larrabee had mention of a Larrabee chip that'd drop into a quad socket motherboard.I'm actually quiet surprised that AMD hasn't gone this route or have many any mention of it on their road maps. They do have new sockets coming out next year and HSA GPU's so perhaps next year we'll hear something a bit more concrete.
The other thing about using a common socket between a CPU and a GPU would be that each aspect would have to support a common memory standard. AMD looks to be going with GDDR5 for mobile parts for bandwidth reasons. Considering that laptops (and especially ultra books) are not designed for upgradability or 24/7 rock hard stability. It also means that more desktop/server centric sockets would imply support for ECC protected DIMMs. This would also bring huge amounts of memory support to the GPU side. These two things would be huge on the GPU side.
One thing moving to QPI/Hypertransport for GPU's would result in is the eventual removal of nVidia from this space. PCI-e will still hang around but hardware using it would be at a disadvantage.
Musafir_86 - Tuesday, April 23, 2013 - link
-Hmm, are SB & IB really have unified memory address space between CPU & GPU? CPU can access GPU memory pointers & vice-versa? Like AMD's upcoming Kaveri? Any documentation/white paper from Intel on this?-What I know is their L2/L3 caches are definitely shared/unified, though. Intel's Instantaccess DX extension is only implemented in Haswell, so I doubt this is the case.
Regards.
jasonelmore - Tuesday, April 23, 2013 - link
Ivy Bridge cant share L3 with CPU and GT and both use it at the same time, Haswell fix's this issue and unifies it. A lot of ppl are getting this wrong.Jaybus - Thursday, April 25, 2013 - link
Thus the focus on silicon photonics Intel, IBM, and others have been working on. Interconnects such as QPI, HyperTransport, or Xilinx's RapidIO use too much power and/or require much more space for multiple parallel i/o traces. An optical interconnect eliminates many of the restraints imposed by QPI. The optical signal frequency can be orders of magnitude higher than what is possible for electronic signals without increasing power or thermal load. It is not possible, long term, to continue to integrate components onto a single larger and larger piece of silicon (ie. SoC).Silicon photonics is a way to connect chip-to-chip at full chip speed, or in other words connect multiple chips together to make a single large virtual chip. Since the optical signal can also maintain this high speed for far further trace distances, it can also be used to make chip-to-chip interconnects even when the chips are on different motherboards in separate cluster nodes. Think a rack of servers that function as a single, very large SoC.
We will first see it used for optical Thunderbird, (ie. extending PCIe bus off-chip), but probably for special purpose chip-to-chip soon after. For example, a CPU and discrete GPU + eDRAM pair in a 2-chip module connected via silicon waveguide.
extide - Tuesday, April 23, 2013 - link
There are indeed Xeon E3's with integrated GPU's.MrSpadge - Thursday, April 25, 2013 - link
For gaming and many desktop loads such a high-bandwidth low-latency L4 would also be cool.icrf - Tuesday, April 23, 2013 - link
Would the initial GT3e have that extra eDRAM available to the CPU as well, or is that more speculation on how such a feature might make its way onto a server part?I don't personally have much interest in a faster embedded GPU, but a pile of L4 available to the CPU sounds like it could make for some more interesting use cases.
Death666Angel - Tuesday, April 23, 2013 - link
2nd to last paragraph:"Based on leaked documents, the embedded DRAM will act as a 4th level cache and should work to improve both CPU and GPU performance."
r3loaded - Tuesday, April 23, 2013 - link
This chip has 13 inch retina MacBook Pro written all over it. Apple must have had some input on it.tipoo - Tuesday, April 23, 2013 - link
But from the leaks the TDP of the Haswell parts with GT3e is too high for the 13" Macbooks. I would love to see it in there, but I would guess it's just getting GT2.KaarlisK - Tuesday, April 23, 2013 - link
Where do they put the IVR (integrated voltage regulator) on the GT3e Haswell, or is it absent?tipoo - Tuesday, April 23, 2013 - link
I'd like to know if the CPU can dip into the eDRAM as a L4 cache of sorts if the GPU is underused or disabled. It would be a shame to waste that huge eDRAM die right beside the processor if the GPU goes unused.Quizzical - Tuesday, April 23, 2013 - link
This doesn't make a bit of sense. If the primary purpose is to be L4 cache for the CPU and boost performance that way, then why not make it available in desktop and server chips, which would offer far more plausible benefits than laptops?And if the primary purpose is to be GPU memory bandwidth, then why 128 MB? I could see big benefits to having the heavily-accessed depth buffer and frame buffer in cache, but at 1080p, those are a tad under 8 MB each. Maybe you want to put extra frame buffers there, for use in post-processing or to have both the front and back frame buffers cached. But that's not going to get you anywhere near 128 MB, and if it's for graphics, you're going to end up using most of that space for lightly accessed textures where it doesn't matter.
Surely they're not planning on moving the really heavily used stuff that doesn't take much space and currently goes in GPU cache to slower eDRAM. That would be as dumb as making an Intel i740 without dedicated video memory because they want to use slower system memory instead.
glugglug - Tuesday, April 23, 2013 - link
It's not for frame buffers, as you said, those are much smaller, and are most assuredly already in the GPU.The bandwidth is needed for accessing textures without pulling them from RAM each time. And yes, it does matter for the textures, in a huge way.
Quizzical - Tuesday, April 23, 2013 - link
Do tell how you propose to stick an 8 MB frame buffer in < 1 MB of L2 cache. For comparison, a Radeon HD 7970 has 768 KB of L2 cache, a Radeon HD 7870 has 512 KB, and a GeForce GTX 680 has 512 KB. Older or lower end cards tend to have less yet.And the L1 and L2 caches are presumably needed for smaller but more frequently accessed data such as program binaries and uniforms that are needed at every single shader invocation throughout the graphics pipeline.
tipoo - Tuesday, April 23, 2013 - link
Clearly it does make a difference, as GPUs accessing the system DDR memory take a huge performance penalty. Otherwise why would GPU makers strap on so much memory?Quizzical - Tuesday, April 23, 2013 - link
Yes, accessing textures from video memory rather than having to pass it through a PCI Express bus does make a big difference. But if you want to do that, you have to have enough video memory to actually store all of your textures in video memory. That's why modern video cards nearly always come with at least 1 GB and often more. 128 MB would let you stick a small fraction of your textures and vertex data in it, but nowhere near all of it except at low texture resolutions or in older games where you don't need much video memory.If textures are the goal, you'd likely see more benefit from adding a third channel of system memory, which lets you use a few GB if you need to. And while hardly cheap, that might well be cheaper than 128 MB of eDRAM.
Haydon987 - Sunday, April 28, 2013 - link
For modern graphical purposes, I have to agree, I don't see the point of adding 128MB of eDRAM. If it is for textures, any 3d game made in the last decade uses a few hundred MB, if not well over 1GB in some cases, at any reasonable resolution.I really only see this being useful as a cache for the CPU or for 2D applications.
tipoo - Tuesday, April 23, 2013 - link
Also a unified fast memory between the GPU and CPU is exactly what is needed for good GPGPU performance, up until now the bottleneck was transferring things from the GPU memory to the CPU memory.Quizzical - Tuesday, April 23, 2013 - link
While this could conceivably have some big benefits for certain GPGPU applications, I really doubt that's the primary intended purpose. If you want to do serious GPGPU, you get a desktop or workstation or server or some such so that you can dissipate serious amounts of heat. You definitely don't get a laptop with dinky little integrated graphics that sports a peak GFLOPS rating of not much, which is the only place that GT3e is going to be used.Zandros - Tuesday, April 23, 2013 - link
IIIRC, compositing window managers provide a draw buffer for every open window, even if it's (partially) hidden. With a couple of windows open, that's easily tens of megabytes. Is there any reason to not keep those in the eDRAM?name99 - Saturday, April 27, 2013 - link
It's a way to make the research into eDRAM pay for itself while it goes on.It's essentially a research chip in this iteration. This means
(a) if they can't deliver, no catastrophe
(b) if the power utilization is higher than the actual benefits for general use cases, again no catastrophe.
I see the fact that they have gone this route rather than giving all the CPUs eDRAM as telling us that eDRAM is actually harder to get right than it looks, and they are being cautious. One might confirm this by noting that AMD has not jumped on eDRAM as a way to goose their sales, even though it would have been an obvious point of differentiation from Intel.
tynopik - Tuesday, April 23, 2013 - link
could it boot say XP without any additional memory added?marc1000 - Tuesday, April 23, 2013 - link
lol it would be awesome!tipoo - Tuesday, April 23, 2013 - link
This eDRAM could fit all of Windows 95 into it, imagine how fast that would be!tipoo - Tuesday, April 23, 2013 - link
Given how often it's repeated that Apple pushed for the eDRAM and higher performing integrated graphics in general, it seems like the classes the GT3e is put into miss the product it would benefit most, namely the 13" Macbook Pro and Retina version in particular. It seems only quad core 50+ watt TDP processors will get GT3e, and only ultrabooks will get GT3 without the eDRAM, so 13" laptops seem like they're stuck with GT2. A 15" laptop would generally have enough room for a better dedicated graphics chip, so it seems like it's missing what would be a perfect match for it.extide - Tuesday, April 23, 2013 - link
Where does it say that? It seems to me like GT3e will be available in the non ultrabok (read >17W TDP) chips, which mean the 35w class stuff they use in the 13" MBPr will have GT3e available.tipoo - Tuesday, April 23, 2013 - link
http://techreport.com/news/24564/leaked-slides-rev...Everything with 5200 (GT3e) is in the 47W range.
Talaii - Tuesday, April 23, 2013 - link
If those slides are an exhaustive list, there aren't any dual-core standard voltage parts at all. I think it's likely that when the dual-core chips are released some of them might have GT3e.tipoo - Tuesday, April 23, 2013 - link
Hmm, I hope so. That would make the 13"ers more appealing.Hrel - Tuesday, April 23, 2013 - link
I'm confused, it's an integrated GPU. If it's too big for ultrabooks then what's the point? Anything bigger (thicker) and I'm getting a dedicated GPU with proper driver updates. Their will either be ultrabooks that use it or it will only be Apple using it. On the other hand (I find this nearly impossible) if they can actually hit a GT650M in terms of across the board performance I might actually care this time.tipoo - Tuesday, April 23, 2013 - link
If they were going to pair in a low end GPU in a larger laptop, it's still cost, power, and motherboard area savings. That could be used to reduce weight, increase battery size, etc etc.Ktracho - Tuesday, April 23, 2013 - link
But won't the price be too high for a low end laptop?tipoo - Wednesday, April 24, 2013 - link
Who said low end?ShieTar - Wednesday, April 24, 2013 - link
It is not nearly impossible, from a hardware viewpoint alone. The currently existing difference between HD4000 and an 650M is almost completely covered by the increase in Execution Units alone. Add some improvements in IPC efficiency, some catching up by the driver development team, and the performance boost from the integrated memory, and it is actually quiet likely that GT3e will achieve 650M levels.tipoo - Tuesday, April 23, 2013 - link
On the AMD method, GDDR5 is obviously great for GPUs, but the memory timings are much slower than DDR3, even accounting for the clock speed differences. I know differences in latency don't really affect modern CPUs a whole lot, but this difference would be bigger than the slowest to the fastest DDR3 latencies, it's many times higher. I wonder how that will turn out.I'm also curious, IBM integrated eDRAM into its Power7 processors a long time ago, I wonder what the performance implications of the CPU being able to access the eDRAM in Haswell will be, and if they'll carry that over to the high end server/workstation markets like Power7.
marc1000 - Tuesday, April 23, 2013 - link
a lot of people is asking that. IMHO they will do it, but not at launch. even if they say they won't do it.Kevin G - Tuesday, April 23, 2013 - link
The comparison for using eDRAM on the POWER7 was similar latencies to having an external SRAM in the same package (like IBM did with the POWER6). Going SRAM for the L3 cache would have been faster in the POWER7 but IBM felt that capacity/density was more important considering the chip's market.name99 - Saturday, April 27, 2013 - link
The basic desiderata for caches are- for L1 what matters most is latency
- for L2 what matters most is bandwidth
- for L3 what matters most is capacity.
IBM made the right decision.
epobirs - Tuesday, April 23, 2013 - link
The power draw may be too much for a light mobile device but could be excellent for something like an HTPC. I'm very curious as to what will be offered in the mini-ITX format in a few months. Some good GPU power and a lower overall power envelope than current IB choices would be worth waiting for.jb14 - Tuesday, April 23, 2013 - link
I suppose the TDP of this chip will preclude its use in an intel NUC unit? Reckon it would be a great combo if the cooling could handle it.tipoo - Tuesday, April 23, 2013 - link
The models I've seen so far show GT3e in 47W TDP parts. The Mac Mini had a discreet GPU in it at some point, so without that I'm sure something of the size can theoretically handle 47W.SetiroN - Tuesday, April 23, 2013 - link
To be honest I was hoping for (if not expecting) quite a bit more than 64GB/s... as beneficial as lower latency is, quad channel DDR3 already gives us 50.Even for notebook graphics, that's far from ground-breaking. 650M performance seems like a stretch, especially considering how this will be relegated to larger laptops, at which point having a dGPU sounds much more feasible.
shiznit - Tuesday, April 23, 2013 - link
True, but you don't get quad channel ddr3 in a laptop and especially not with ram soldered to the mobo. This is was perf/watt decision with mobile as a priority.You get a lot of temporal goodness for games (streaming texture buffer should fit in the 128MB nicely) and smaller datasets and you free up the pipes to main memory so they can keep the L4 full. It's a win-win.
Servers will come when it can scale to really benefit them.
ShieTar - Wednesday, April 24, 2013 - link
It's only 128MB. With 64GB/s, you can already read or overwrite the complete memory in only 2ms, thus less than a quarter of even a 120Hz frame. Since most graphics software should not work iteratively, this seems fast enough. And why does 650M performance seem like a stretch to you based on this number, when the 650M itself does have exactly the same bandwidth, or even less with the DDR3-version? Even the 675M comes only with 96GB/s.iwodo - Tuesday, April 23, 2013 - link
So I guess this is what all those Fabs extra capacity are about. Not actually Fabing a lots of Custom Silicon for others, but doubling their usage with eDRAM.Intel's biggest problem is still their Drivers. Its Ugly. Its Slow, its not tunes, and you can feel no love with it when comparing to Nvidia.
ShieTar - Wednesday, April 24, 2013 - link
But 128MB is not that much more than the already available 20MB of L3 cache in a Xeon, while it is much less than the 32GB (or more) of available RAM. Sounds to me like only a very specific class of software would be able to profit from it. And if you have software that really speeds up with more low-latency memory, does that not mean you're better of running it on a Xeon Phi anyways?
Crazy1 - Wednesday, April 24, 2013 - link
If the previous rumor of a TDP of 55W for a chip carrying GT3e and a graphics performance roughly equivalent to a GT 650m, then I imagine large screen (15"+) "ultrabooks" carrying this chip. Laptops like the Macbook Pro and Razor. Without the need for a descrete GPU, other manufacturers would not need premium materials or smart engineering to stay within the thermal limit of their thin designs.An additional $50 is not much when you look at the price of a mobile i7 chip. It's another 2-4% price bump for the overall price of the typical i7 carrying laptop. If the GT3e performs similar to a GT 640m (more realistic) it would easily be a worthwhile upgrade for the people who only need/want mid-range GPU performance.
calzahe - Friday, April 26, 2013 - link
With 2-channel DDR3 memory controller Haswell will barely run games only in low settings and if you change to average or high details it'll be an unplayable slide show with 2-10fps and integrated 128MB memory will not help much as games nowadays require at least 1GB or higher of graphics memory.So Haswell will survive only for 4-5 months till AMD Kaveri appears in October-November. Kaveri will support 4-channel GDDR5 and will have much better GPU than Haswell so for gaming Haswell will be a joke comparing with Kaveri.
calzahe - Friday, April 26, 2013 - link
Sony Playstation 4 will come with 8GB of unified 4-channel 256bit GDDR5 memory sub-system (with 176GB/s bandwidth) for both CPU and GPU. The 64GB/s bandwidth of Haswell eDRAM is still 3 times lower than what AMD did for Sony in PS4 and AMD Kaveri will have similar architecture as in PS4.MelodyRamos47 - Sunday, May 12, 2013 - link
uptil I looked at the bank draft of $7284, I didn't believe that...my... best friend could actualy earning money in there spare time on-line.. there neighbour had bean doing this 4 only 18 months and just repayed the morgage on their place and purchased a great Lotus Elise. this is where I went, Jump44.comCHECK IT OUT