I have a big pile of "Storage Servers" in my basement that function as a audio, video and data server. I have used PATA, SATA and SCSI 320 (in that order) to achieve necessary reliability. Put another way, when I started using enterprise class hardware, I quit having to worry (as much) about data loss.
What happens if you encounter a unrecovrable read error when you rebuid a raid5-array? (after a disk has failed) Is the whole array unusable, or do you only loose the file using the sector which can't be read?
actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.
actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.
It's nice to see that someone finally did an article that had information about SATA port multipliers (these devices have been around for around 2 years, and no one seems to know about them), but since I have no direct hands on experience, I feel the article concerning these was a bit skimpy.
Also, while I see you're talking about iSCSI (I think some call it SCSI over IP ?) in the comments section here, I'm a bit interrested as to why I didnt see it mentioned in the article.
I plan on getting my own SATA port multiplier eventually, and I have a pretty good idea how well they would work under the right circumstances, with the right hardware, but since I do not work in a data center (or some such profession), the likelyhood of me getting my hands on a SAS, iSCSI, FC, etc rack/system is un-likely. What I'm trying to say here, is that I think you guys could go a good bit deeper into detail with each technology, and let each reader decide if the cost of product x is worth it for whatever they want to do. In the last several months (close to two years) I've been doing alot of research in this area, and still find some of these technologies a bit confusing. iSCSI for example, the only documention I could find on the subject (around 6 months ago) was some sort of technical document, written by Microsoft that I found very hard time digesting. Since then, I've only seen (going from memory) white papers from companies like HP pushing thier own specific products, and I dont care about thier product in particular, I care about the technology, and COULD be interrested in building my own 'system' some day.
What I am about to say next, I do not mean as an insult in ANY shape or form, however I think when you guys write articles on such subjects, that you NEED to go into more detail. Motherboards are one thing, hard drives, whatever, but when you get into technology that isnt very common(outside of enterprise solutions) such as SAS, iSCSI, etc, I think you're actualy doing your readers a dis-service by showing a flow chart or two, and briefly describing the technology. NAS, SAN, etc have all been done to death, but I think if you look around, you will find that a good article on ATLEAST iSCSI, how it works, and how to implement it, would be very hard to find(without buying a prebuilt solution from a company). Anyhow (again) I think I've beat this horse to death, you get my drift by now im sure ;)
Great article. It addressed points that I not only didn't think of, but that were far more useful to me than just baseline performance.
It seems to me that for the moderately-sized business (or "enterprise-on-a-budget" role, such as K-12 education) that enterprise-level SATA such as Caviar RE drives in RAID-5, plus solid server backups (which should be done anyways) make more sense cost-wise than SAS. Sure, the risk for error is a bit higher, but that is why no systems/network administrator in their right minds would rely on RAID-5 alone to keep data secure.
I hope that Anandtech will do a similarly comprehensive article about backup for large storage someday, including multiple methods and software options. All this storage is great, but once you have it, data integrity (especially now that server backups can be hundreds of gigabytes or more) cannot be stressed enough.
P.S. It's one of the reasons I can't wait until we have enough storage that I can enable Shadow Copy on our Win2k3 boxes. Just one more method on top of the existing structure.
Working for an ISP, we started to use PATA/SATA a few years ago. We still use SCSI, FC & PATA/SATA depending on our needs. SATA is the first choice when we may have redundant data (and, in this case, disks are setup in JBOD (standalone) for performances issues). At the opposite, FC is only used for NFS filers (mostly used for mail storage, where average file size is a few KB).
Between both, we are looking at needed storage size & IO load to make up our mind. Even for huge IO loads but only when requested block size is big enough, SATA behaves quite well.
Nonetheless, something bugs me in your article on Seagate test. I manage a cluster of servers whose total throughoutput is around 110 TB a day (using around 2400 SATA disks). With Seagate figure (an Unrecoverable Error every 12.5 terabytes written or read), I would get 10 Unrecoverable Errors every day. Which, as far as I know, is far away from what I may see (a very few per week/month).
It's quite possible that the reason you are seeing far fewer unrecoverable errors than the specs would suggest is that you're reading all or at least a large percentage of your data far more frequently than the specs assume. Background 'scrubbing' of data - using a disk's idle periods to scan its surface and detect any sectors which are unreadable (in which case they can be restored from a mirror or parity-generated copy if one exists) or becoming hard to read (in which case they can just be restored to better health, possibly in a different location, from their own contents) - decreases the incidence of unreadable sectors by several orders of magnitude compared to the value specified, and the amount of reading that you're doing may be largely equivalent to such scrubbing (or, for that matter, perhaps you're actively scrubbing as well).
While Johan's article is one of the best that I've seen on storage technology in years, in this respect I think he may have been a bit overly influenced by Seagate marketing, which conveniently all but ignores (and certainly makes no attempt to quantify) the effect of scrubbing on potential data loss from using those alleged-risky SATA upstarts. Seagate, after all, has a lucrative high-end drive franchise to protect; we see a similar bias in their emphasis on the lack of variable sector sizes in SATA, with no mention of new approaches such as Sun's ZFS that attain more comprehensive end-to-end integrity checks without needing them, and while higher susceptibility to rotational vibration is a legitimate knock, it's worst in precisely those situations where conventional SATA drives are inappropriate for other reasons (intense, continuous-duty-cycle small-random-access workloads: I'd really have liked to have seen more information on just how well WD SATA Raptors match enterprise drives in that area, because if one can believe their specs they would seem to be considerably more cost-effective solutions in most such instances).
"Nonetheless, something bugs me in your article on Seagate test. I manage a cluster of servers whose total throughoutput is around 110 TB a day (using around 2400 SATA disks). With Seagate figure (an Unrecoverable Error every 12.5 terabytes written or read), I would get 10 Unrecoverable Errors every day. Which, as far as I know, is far away from what I may see (a very few per week/month). "
1. The EUR number is worst case, so the 10 Unrec errors you expect to see are really the worst situation that you would get.
2. Cached reads are not included as you do not access the magnetic media. So if on average the servers are able to cache rather well, you are probably seeing half of that throughtput.
And it also depends on how you measured that. Is that throughput on your network or is that really measured like bi/bo of Vmstat or another tool?
quote: Cached reads are not included as you do not access the magnetic media. So if on average the servers are able to cache rather well, you are probably seeing half of that throughtput.
And it also depends on how you measured that. Is that throughput on your network or is that really measured like bi/bo of Vmstat or another tool?
There is no cache (for two reason, first the data is accessed quite randomly while there is only 4 GB of memory for 6 TB of data, second data are stored/accessed on block device in raw mode). And, indeed, throughoutput is mesured on network but figures from servers match (iostat).
I liked this story, but I finished feeling informed but not satisfied. I love AT's focus on real-world performance, so I think an excellent addition would be more info into actually building a storage system, or at least some sort of a buyers guide to let us know how the tech theory translates over to the marketplace. The best idea would be a tour of AT's own equipment and a discussion of why it was chosen.
If you are feeling informed and not satisfied, we have reached our goal :-). The next article will go in through the more complex stuff: when do I use NAS, when do I use DAS and SAN. What about iSCSI and so on. We are also working to having different storage solutions in our lab.
Yeah, I notified Johan of the error but figured it wasn't big enough problem to hold back releasing the article. I guess I can Photoshop the image myself... I probably should have just done that, but I was thinking it would be more difficult than it is. The error is corrected now.
I appreciate the theory and the mentioning of some specific products and the general recommendations in this article, but you started off mentioning that you were building a system for AT's own use (at the lowest reasonable cost) without fully going into exactly what you ended up using or how much it cost.
So now I know something about SAS, SATA, and other technologies, but I have no idea what it will actually cost me to get (say) 1TB of highly-reliable storage suitable for use in a demanding database environment. I would love to see a line-item breakdown of the system that you ended up buying, along with prices and links to stores where I can buy everything. I'm talking about the cables, cards, drives, enclosures, backplanes, port multipliers, everything.
Of course my needs aren't the same as AnandTech's needs, but I just need to get an idea of what a typical "total solution" costs and then scale it to my needs. Also it'd be cool to have a price/performance comparison with vendor solutions like Apple, Sun, HP, Dell, etc.
What if you face a bunch of servers with modest disk I/O that require high availability? We typically use SATA drives in RAID-1 configurations, but I've seen some disturbing issues with the onboard SATA RAID controller on a SuperMicro server which leads me to believe that SCSI is the right way to go for us. (the issue was that the original Adaptec driver caused Windows to eventually freeze given a certain workload pattern -- I've also seen mirrors that refuse to rebuild after replacing a drive; we've now stopped buying Maxtor SATA drives completely)
More to the point: Seagate has shown that massive amount of IO requires enterprise class drives, but do they say anything about how enterprise class drives behave with a modest desktop-type load? (I wish the article linked directly to the document on Seagate's site, instead it links to a powerpoint presentation hosted by microsoft?)
Definitely... When I started writing this series I start to think about what I was asking myself years ago. For starters, what the weird I/O per second benchmarking. If you are coming from the workstation world, you expect all storage benchmarks to be in MB/s and ms.
Secondly, one has to know the interfaces available. The features of SAS for example could make you decide to go for a simple DAS instead of an expensive SAN. Not always but in some cases. So I had to make sure that before I start talking iSCSI, FC SAN, DAS that can be turned in to SAN etc., all my readers know what SAS is all about.
So I hope to address the things you brought up in the second storage article.
Sounds great, thanks. If possible it'd be great to see full schematics of the setup, pics of everything, etc. This is obviously outside the realm of your "everyday PC" stuff where we all know what's going on. I administer 6 servers at a colo facility and our servers (like 90% of the other servers that I see) are basically PC hardware stuck in a rackmount box (and a lot of the small-shop webhosting companies at the colo facility use plain towers! In the rack across from ours, there are 4 Shuttle XPC's! Unbelievable!).
We use workstation motherboards with ECC RAM, Raptor drives, etc. but still it's basically just a PC. These external enclosures, SAS, etc. are a whole new realm. I know that it'd be better than the ad-hoc storage situation we have now, but I'm kind of scared because I don't know how it works and I don't know how much it would cost. So now I know more about how it works, but the cost is still scary. ;)
I guess the last thing I'd want to know is the OS support situation. Linux support is obviously crucial.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
21 Comments
Back to Article
dickrpm - Saturday, October 21, 2006 - link
I have a big pile of "Storage Servers" in my basement that function as a audio, video and data server. I have used PATA, SATA and SCSI 320 (in that order) to achieve necessary reliability. Put another way, when I started using enterprise class hardware, I quit having to worry (as much) about data loss.ATWindsor - Friday, October 20, 2006 - link
What happens if you encounter a unrecovrable read error when you rebuid a raid5-array? (after a disk has failed) Is the whole array unusable, or do you only loose the file using the sector which can't be read?AtW
nah - Friday, October 20, 2006 - link
actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.nah - Friday, October 20, 2006 - link
actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.yyrkoon - Friday, October 20, 2006 - link
It's nice to see that someone finally did an article that had information about SATA port multipliers (these devices have been around for around 2 years, and no one seems to know about them), but since I have no direct hands on experience, I feel the article concerning these was a bit skimpy.Also, while I see you're talking about iSCSI (I think some call it SCSI over IP ?) in the comments section here, I'm a bit interrested as to why I didnt see it mentioned in the article.
I plan on getting my own SATA port multiplier eventually, and I have a pretty good idea how well they would work under the right circumstances, with the right hardware, but since I do not work in a data center (or some such profession), the likelyhood of me getting my hands on a SAS, iSCSI, FC, etc rack/system is un-likely. What I'm trying to say here, is that I think you guys could go a good bit deeper into detail with each technology, and let each reader decide if the cost of product x is worth it for whatever they want to do. In the last several months (close to two years) I've been doing alot of research in this area, and still find some of these technologies a bit confusing. iSCSI for example, the only documention I could find on the subject (around 6 months ago) was some sort of technical document, written by Microsoft that I found very hard time digesting. Since then, I've only seen (going from memory) white papers from companies like HP pushing thier own specific products, and I dont care about thier product in particular, I care about the technology, and COULD be interrested in building my own 'system' some day.
What I am about to say next, I do not mean as an insult in ANY shape or form, however I think when you guys write articles on such subjects, that you NEED to go into more detail. Motherboards are one thing, hard drives, whatever, but when you get into technology that isnt very common(outside of enterprise solutions) such as SAS, iSCSI, etc, I think you're actualy doing your readers a dis-service by showing a flow chart or two, and briefly describing the technology. NAS, SAN, etc have all been done to death, but I think if you look around, you will find that a good article on ATLEAST iSCSI, how it works, and how to implement it, would be very hard to find(without buying a prebuilt solution from a company). Anyhow (again) I think I've beat this horse to death, you get my drift by now im sure ;)
photoguy99 - Thursday, October 19, 2006 - link
Great article, well worth it for AT to have this content.Can't wait for part 2 -
ceefka - Thursday, October 19, 2006 - link
Can we also expect a breakdown and benchmarking on network storage solutions for the home and small office?LoneWolf15 - Thursday, October 19, 2006 - link
Great article. It addressed points that I not only didn't think of, but that were far more useful to me than just baseline performance.It seems to me that for the moderately-sized business (or "enterprise-on-a-budget" role, such as K-12 education) that enterprise-level SATA such as Caviar RE drives in RAID-5, plus solid server backups (which should be done anyways) make more sense cost-wise than SAS. Sure, the risk for error is a bit higher, but that is why no systems/network administrator in their right minds would rely on RAID-5 alone to keep data secure.
I hope that Anandtech will do a similarly comprehensive article about backup for large storage someday, including multiple methods and software options. All this storage is great, but once you have it, data integrity (especially now that server backups can be hundreds of gigabytes or more) cannot be stressed enough.
P.S. It's one of the reasons I can't wait until we have enough storage that I can enable Shadow Copy on our Win2k3 boxes. Just one more method on top of the existing structure.
Olaf van der Spek - Thursday, October 19, 2006 - link
Why does this simple (non-mechanical) operation take so long?
Fantec - Thursday, October 19, 2006 - link
Working for an ISP, we started to use PATA/SATA a few years ago. We still use SCSI, FC & PATA/SATA depending on our needs. SATA is the first choice when we may have redundant data (and, in this case, disks are setup in JBOD (standalone) for performances issues). At the opposite, FC is only used for NFS filers (mostly used for mail storage, where average file size is a few KB).Between both, we are looking at needed storage size & IO load to make up our mind. Even for huge IO loads but only when requested block size is big enough, SATA behaves quite well.
Nonetheless, something bugs me in your article on Seagate test. I manage a cluster of servers whose total throughoutput is around 110 TB a day (using around 2400 SATA disks). With Seagate figure (an Unrecoverable Error every 12.5 terabytes written or read), I would get 10 Unrecoverable Errors every day. Which, as far as I know, is far away from what I may see (a very few per week/month).
Bill Todd - Saturday, October 28, 2006 - link
It's quite possible that the reason you are seeing far fewer unrecoverable errors than the specs would suggest is that you're reading all or at least a large percentage of your data far more frequently than the specs assume. Background 'scrubbing' of data - using a disk's idle periods to scan its surface and detect any sectors which are unreadable (in which case they can be restored from a mirror or parity-generated copy if one exists) or becoming hard to read (in which case they can just be restored to better health, possibly in a different location, from their own contents) - decreases the incidence of unreadable sectors by several orders of magnitude compared to the value specified, and the amount of reading that you're doing may be largely equivalent to such scrubbing (or, for that matter, perhaps you're actively scrubbing as well).While Johan's article is one of the best that I've seen on storage technology in years, in this respect I think he may have been a bit overly influenced by Seagate marketing, which conveniently all but ignores (and certainly makes no attempt to quantify) the effect of scrubbing on potential data loss from using those alleged-risky SATA upstarts. Seagate, after all, has a lucrative high-end drive franchise to protect; we see a similar bias in their emphasis on the lack of variable sector sizes in SATA, with no mention of new approaches such as Sun's ZFS that attain more comprehensive end-to-end integrity checks without needing them, and while higher susceptibility to rotational vibration is a legitimate knock, it's worst in precisely those situations where conventional SATA drives are inappropriate for other reasons (intense, continuous-duty-cycle small-random-access workloads: I'd really have liked to have seen more information on just how well WD SATA Raptors match enterprise drives in that area, because if one can believe their specs they would seem to be considerably more cost-effective solutions in most such instances).
- bill
JohanAnandtech - Thursday, October 19, 2006 - link
"Nonetheless, something bugs me in your article on Seagate test. I manage a cluster of servers whose total throughoutput is around 110 TB a day (using around 2400 SATA disks). With Seagate figure (an Unrecoverable Error every 12.5 terabytes written or read), I would get 10 Unrecoverable Errors every day. Which, as far as I know, is far away from what I may see (a very few per week/month). "1. The EUR number is worst case, so the 10 Unrec errors you expect to see are really the worst situation that you would get.
2. Cached reads are not included as you do not access the magnetic media. So if on average the servers are able to cache rather well, you are probably seeing half of that throughtput.
And it also depends on how you measured that. Is that throughput on your network or is that really measured like bi/bo of Vmstat or another tool?
Fantec - Thursday, October 19, 2006 - link
There is no cache (for two reason, first the data is accessed quite randomly while there is only 4 GB of memory for 6 TB of data, second data are stored/accessed on block device in raw mode). And, indeed, throughoutput is mesured on network but figures from servers match (iostat).
Sunrise089 - Thursday, October 19, 2006 - link
I liked this story, but I finished feeling informed but not satisfied. I love AT's focus on real-world performance, so I think an excellent addition would be more info into actually building a storage system, or at least some sort of a buyers guide to let us know how the tech theory translates over to the marketplace. The best idea would be a tour of AT's own equipment and a discussion of why it was chosen.JohanAnandtech - Thursday, October 19, 2006 - link
If you are feeling informed and not satisfied, we have reached our goal :-). The next article will go in through the more complex stuff: when do I use NAS, when do I use DAS and SAN. What about iSCSI and so on. We are also working to having different storage solutions in our lab.stelleg151 - Wednesday, October 18, 2006 - link
In the table the cheetah decodes 1000block of 4KB faster than the raptor decodes 100 blocks of 4KB. Guessing this is a typo. Liked the article.JarredWalton - Wednesday, October 18, 2006 - link
Yeah, I notified Johan of the error but figured it wasn't big enough problem to hold back releasing the article. I guess I can Photoshop the image myself... I probably should have just done that, but I was thinking it would be more difficult than it is. The error is corrected now.slashbinslashbash - Wednesday, October 18, 2006 - link
I appreciate the theory and the mentioning of some specific products and the general recommendations in this article, but you started off mentioning that you were building a system for AT's own use (at the lowest reasonable cost) without fully going into exactly what you ended up using or how much it cost.So now I know something about SAS, SATA, and other technologies, but I have no idea what it will actually cost me to get (say) 1TB of highly-reliable storage suitable for use in a demanding database environment. I would love to see a line-item breakdown of the system that you ended up buying, along with prices and links to stores where I can buy everything. I'm talking about the cables, cards, drives, enclosures, backplanes, port multipliers, everything.
Of course my needs aren't the same as AnandTech's needs, but I just need to get an idea of what a typical "total solution" costs and then scale it to my needs. Also it'd be cool to have a price/performance comparison with vendor solutions like Apple, Sun, HP, Dell, etc.
BikeDude - Friday, October 20, 2006 - link
What if you face a bunch of servers with modest disk I/O that require high availability? We typically use SATA drives in RAID-1 configurations, but I've seen some disturbing issues with the onboard SATA RAID controller on a SuperMicro server which leads me to believe that SCSI is the right way to go for us. (the issue was that the original Adaptec driver caused Windows to eventually freeze given a certain workload pattern -- I've also seen mirrors that refuse to rebuild after replacing a drive; we've now stopped buying Maxtor SATA drives completely)More to the point: Seagate has shown that massive amount of IO requires enterprise class drives, but do they say anything about how enterprise class drives behave with a modest desktop-type load? (I wish the article linked directly to the document on Seagate's site, instead it links to a powerpoint presentation hosted by microsoft?)
JohanAnandtech - Thursday, October 19, 2006 - link
Definitely... When I started writing this series I start to think about what I was asking myself years ago. For starters, what the weird I/O per second benchmarking. If you are coming from the workstation world, you expect all storage benchmarks to be in MB/s and ms.Secondly, one has to know the interfaces available. The features of SAS for example could make you decide to go for a simple DAS instead of an expensive SAN. Not always but in some cases. So I had to make sure that before I start talking iSCSI, FC SAN, DAS that can be turned in to SAN etc., all my readers know what SAS is all about.
So I hope to address the things you brought up in the second storage article.
slashbinslashbash - Thursday, October 19, 2006 - link
Sounds great, thanks. If possible it'd be great to see full schematics of the setup, pics of everything, etc. This is obviously outside the realm of your "everyday PC" stuff where we all know what's going on. I administer 6 servers at a colo facility and our servers (like 90% of the other servers that I see) are basically PC hardware stuck in a rackmount box (and a lot of the small-shop webhosting companies at the colo facility use plain towers! In the rack across from ours, there are 4 Shuttle XPC's! Unbelievable!).We use workstation motherboards with ECC RAM, Raptor drives, etc. but still it's basically just a PC. These external enclosures, SAS, etc. are a whole new realm. I know that it'd be better than the ad-hoc storage situation we have now, but I'm kind of scared because I don't know how it works and I don't know how much it would cost. So now I know more about how it works, but the cost is still scary. ;)
I guess the last thing I'd want to know is the OS support situation. Linux support is obviously crucial.