Friday 21 September 2007

Problems Problems

This week I've been working on two interesting (ish) problems. Well, one more interesting than the other, one a case of the vendor needing to think about requirements more.

Firstly, Tuning Manager (my old software nemesis) strikes again. Within Tuning Manager it is possible to track performance for all LUNs in an array. The gotcha I found this week is that the list of monitored LUNs represents only those allocated to hosts and is a static list which must be refreshed each time an allocation is performed!

This is just lack of thought on behalf of the developers not to provide a "track everything" option so it isn't necessary to keep going into the product, selecting the agent, refreshing the LUN list and tagging them all over again. No wonder allocations can take so long and be fraught with mistakes when Storage Admins have to include in their process the requirement to manually update the tuning product. I'm still waiting for confirmation that there isn't a way to automatically report on all LUNs. If there isn't then a product enhancement will be required to meet what I want. In the meantime, I'll have to ensure things are updated manually. So if you configured Tuning Manager and the LUN list when you first installed an array, have a quick look to see if you're monitoring everything or not.

I'm sure some of you out there will point out, with good reason, why HTnM doesn't automatically scan all LUNs, but from my perspective, I'm never asked by senior management to monitor a performance issue *before* it has occurred, so I always prefer to have monitoring enabled for all devices and all subsystems if it doesn't have an adverse affect on performance.

Second was an issue with the way NTFS works. A number of filesystems on our SQL Server machines show high levels of fragmentation, despite there being plenty of freespace on the volumes in question. This fragmentation issue seems to occur even when a volume is cleared and files are reallocated from scratch.

A quick trawl around the web found me various assertions that NTFS deliberately leaves file clusters between files in order to provide an initial bit of expansion. I'm not sure this is true as I can't find a trusted source to indicate this is standard behaviour. In addition I wonder if it the way in which some products allocate files; for instance if a SQL backup starts to create a backup file it has no real idea how big the file will become. NTFS (I assume) will choose the largest block of freespace available and allocate the file there. If another process allocates a file almost immediately, then it will get allocated just after the first file (which may only be a few clusters in size at this stage). Then the first file gets extended and "leapfrogs" the second file, and so on, producing fragmentation in both files.

I'm not sure if this is what is happening, but if this is the way NTFS is working then it would explain the levels of fragmentation we see (some files have 200,000+ fragments in a 24GB file). In addition, I don't know for definite that the fragmentation is having a detremental impact on performance (these are SAN connected LUNs). Everything is still speculation. I guess I need to do more investigation...

No comments: