Friday, 31 October 2008

Who Ya Gonna Call?

Here's a quality piece of reporting from TechCrunch on the state of Facebook and their data problems. I mentioned just last week in this post about their data growth. It's incredible that they're purchasing a new Netapp 3070 filer each week!

I'm surprised that Facebook would be continually purchasing NAS filers to grow their content. There must be a rolling set of pictures, thumbnails and so on that are frequently looked at, but there also must be a significant amount that aren't and could be archived to super-dense nearline type technology akin to the Copan products.

Unfortunately when data growth is so intense, it isn't always easy to see the wood for the trees and from previous and current experience, using Netapp creates the risk of wasted resources.

In my experience, looking at just block-based arrays, I've always seen around 10-15% of orphan or unused resources and sometimes higher. When host-based wastage is taken into consideration, the figure can be much worse, although host reclamation is a much more intense process.

I'm willing to offer to anyone out there who has more than 50TB of storage on storage arrays a free analysis of their environment - for a 50:50 split of any savings that can be made. As budgets tighten, I think there will be more and more focus on this kind of work.


DCed said...

Hi Chris,

HAve you already worked with BlueArc. We should start a project/POC by beginning of next year around this product. It looks really efficient (i'm not from BlueArc) thanks to the FPGA approach.


Anil Gupta said...


If my information is correct, Facebook has wares from quite a few storage vendors including NetApp, Isilon and Copan.


Chris M Evans said...

Anil, I'd hope/expect Facebeook to be using stuff from other vendors like Copan. I'd expect that they would also have a unified file system which would allow them to shift the inactive data off to the more highly dense storage automatically so that as stuff comes online and ages, the growth is all in archive tier storage. Perhaps there's also a general growth in active data which accounts for the Netapp purchase and that in addition they're purchasing other vendors too.

Chris M Evans said...

dced, yes, I've been previously briefed and seen BlueArc but not actually used it.

Anonymous said...

hi Chris,

Could you explain in a technical manner how you have experienced using Netapp is not resourceful? I read that this is your opinion however I would like to know where and how you experienced this?

It would be most interesting as we are about to implement tumbleweed on our NAS boxes.

I must admit I cannot see any problems using Netapp as Netapp dedupe works great so I am looking forward to your explanation.