Monday, 9 April 2007

Improving efficiency

Hu posted an interesting view of storage utilisation here. His view is that virtualisation on the USP would improve on the average 30% utilisation. I have to say that I disagree with this. There are lots of reasons why storage remains unused in the enterprise:


  1. inefficient administration by Storage Admins, SAs and DBAs (lost/orphan disks etc)
  2. deliberate overallocation by SAs and DBAs (in an attempt to manage change control versus growth)
  3. in process tasks (frame/host migrations, delayed decommissions)
  4. data distribution for performance management
  5. Storage Admin "buffers" to manage growth.

Many of these issues are process driven rather than due to the inadequacies of technology. Now, "true" virtualisation may address some of the problems listed above, especially those technologies which provide for thin provisioning or other overallocation methods. There are plenty of technologies on the market already offering this style of allocation however there are obviously shortcomings with the technology that could hold it back; most notably are performance and reporting.

Performance is seen as a problem due to the overhead of having to work out where blocks have been virtualised to. In addition, I/O must be written to the virtualisation device and rewritten to the physical storage creating a "dual" write scenario. Data distributed across multiple devices may not provide the same performance profile and lead to uneven response times. In fact this scenario already exists in existing enterprise storage. Data is written to cache and destaged at a later time; arrays may contain more than one type of disk size and type; data is mapped across physical disks in a RAID configuration.

Reporting is an issue as most people like to know where their data resides. There are good reasons for this; if hardware fails or if any disaster scenario is invoked then it is important to know what data is pinned where; is it still left in cache? If a RAID group fails, what data did it contain and from which volumes? In addition, overallocation creates its own problems. Predicting how virtual volumes will increase their storage usage is tricky and you most certainly don't want to get caught out refusing I/O write requests for production disks. This issue was very obvious with the early implementation of thin provisioning on Iceberg (a StorageTek storage subsystem) where reporting failed to cope with volumes that had been copied with snapshot.

Now, if HDS were to add thin provisioning to the USP, how good would that be....

1 comment:

NJ-ESS said...

I would like to suggest another reason for low utilization: To many containers. For example, in UNIX a typical server has many Volume Groups and each of those Volume Groups have many, many file systems.

Typically each File System will have room for at 30-50% Growth. Even though each file system could be very small - they all add up to a big number. I think was we need is some good White Papers to share with our DBA folks on how best to design these types of containers for maximum efficiency.

If anyone can post a link to papers on this subject -or anything on this subject - I would be much obliged!