Some statements on storage

Inspired by a discussion on Twitter (a short one, of course 😉 ) I’d like to post my opinion on storage in the data center.

I would no longer recommend to build up or significantly extend a FC-based SAN for virtualized environments to any customer, except for some very rare cases.

The reason is that I consider the complexity, capacity vs. performance constraints and the additional cost for the fabric (Directors, FC switches, HBAs etc) to be outdated and unnecessary. To me FC-SAN is legacy.

Given the density in computing resources possible with each single server today, we should go with a 10 Gbit Ethernet infrastructure anyway. Add redundancy and you end up with 20 to 40 Gbit of bandwidth for each server. Plenty – why should we add some more Gbit in FC? I think a converged network is the way to go.
So we’re talking NAS, which means iSCSI (yes, I know, the protocol – but I file this under NAS, like it or not) or NFS (errmm… no, FCoAnything is not an option to me, given the additional complexity). iSCSI offers significantly lower latency and easier multipathing setup, but the management is closer to the (complexity of) traditional SAN storage. NFS may not provide the highest performance, but usually it’s a lot easier to manage (no LUNS etc). Looks like freedom of choice to me.

Now about the storage itself. The capacity increased over the years, yet the I/Os per second and GB of standard hard disks decrease. Which basically means you end up with a lot of capacity, but could still face the problem that I/O demanding workloads suffer from low storage performance. Of course there’s ways around that, extensive multiple levels of caching and the like. Or the traditional way of adding spindles to increase the IOPS. Which gives you even more capacity…
But hey, there’s SSDs now! Blazingly fast, on the other hand for enterprise grade usage either still quite expensive (SLC) or not resilient enough (MLC aka customer grade). Anyway, if you’re going to utilize them you’ll dramatically increase the IOPS. But wait, how do you utilize them? Additional SSD-only shelves? Some LUNs built from a few additional SSDs? And how do you place I/O demanding workloads on your SSDs? Specific LUNS, virtual disks, whole VMs?
And: do you really want to do this manually?

I guess not. To me the most promising technology is hybrid storage. Cheap traditional off-the-shelf spinning disks for capacity, and a few SSDs for performance. Let some intelligence in the storage or storage abstraction layer take care of the I/O load, providing high IOPS for the demanding workloads and cheap capacity for the rest. And don’t bother the admins until either capability is degraded or depleted.

Depending on your planned compute density you may or may not intend to deploy larger servers with several internal hard disks and SSDs. If you do, these resources could be combined, abstracted and re-distributed by software, as with VMware’s upcoming VSAN or products like Maxta’s MxSP. If propietary hardware is fine, you could like the Nutanix approach. After all no dedicated storage system, with its specific pros and cons.
If you prefer purpose-built storage, or you intend to deploy small servers, you may want to have a look at Nimble Storage or Tintri, both concepts seem to be well thought-out and come with (a lot of) specific advantages and (a few) downsides.

Finally lets get back to the SSD question. My personal opinion is that we always talk about wearing these things off by using them, some quicker (MLC) than others (SLC). Some inbetween. But anyway I would not want to deploy SSDs without precautions against [silent] data corruption, and without monitoring the remaining device lifetime. So to me a decent checksumming is a must: if you find data was corrupted, you can read it again from another relica, or from the disks, depending on your system architecture. If these mechanisms are in place, I have no worries using consumer-grade SSDs. And BTW, the current technologies in high capacity hard disks are prone to data corruption as well, with the data written in tracks smaller than the read/write heads.

Bottom line: I would not want to use either SSDs or newest high capacity drives without checksums and redundant placement of data. These are the key features I always look at when evaluating new storage products.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code