Monday 10 July 2006

Performance End to End

Performance Management is a recurring theme in the storage world. As fibre channel SANs grow and become more complex, the very nature of a shared infrastructure becomes prone to performance bottlenecks. Worse still, without sensible design (e.g. things like not mixing development data in with production) production performance can be unnecessarily compromised.

The problem is, there aren't really the tools to manage and monitor performance to the degree I'd like. Here's why. Back in the "old days" of the mainframe, we could do end to end performance management - issue an I/O and you could break down the I/O transaction into the constituent parts; you could see the connect time (time the data was being transferred), disconnect time (time waiting for the data to be ready so it can be returned to the host) and other things which aren't quite as relevant now like seek time and rotational delay. This was all possible because the mainframe entity was a single infrastructure; there was a single time clock against which the I/O transaction could be measured - also the I/O protocol catered for collecting the I/O data response times.

SANs are somewhat different. Firstly the protocol doesn't cater for collecting in-flight performance statistics, so all performance measurements are based on observations from tracing the entire environment. The vendors will tell you they can do performance measurements and it is true, they can collect whatever the host, storage and SAN components offer - the trouble is, those figures are likely to be averages and either not possible or not easy to relate the figures to specific LUNs on specific hosts.

For you storage and fabric vendors out there, here's what I'd like; first I want to trace an entire I/O from host to storage and back; I want to know at each point in the I/O exchange what the response time was. I still want to total and average those figures. Don't forget replication - TrueCopy/SRDF/PPRC, I still want to know that part of the I/O.

One thought, I have a feeling fabric virtualisation products might be able to produce some of this information. After all, if they are receiving an I/O request for a virtual device and returning it to the host, the environment is there to map the I/O response to the LUN. Perhaps that exists today?

No comments: