About 6 months ago I bought a Firewire Drobo with wide eyes and big expectations. I’ve had a growing video and audio library that was getting tedious to keep all on one drive. The idea of having a machine that automatically expands by popping in a new drive was awfully sexy.
In the past I’ve messed around with various flavors of RAID and even went so far as to built up a Linux RAID file server. And while that wasn’t an awful experience, I wanted to spend less time administering a storage solution and instead just have one work.
Thus begins our tale of woe.
For the first few weeks, all was happy. I slapped in a 1TB drive along with a few 300GB or so drives and loaded it up with data. I was immediately concerned about how slow it was. I had read reports of people happily getting 30-40 MB/sec read speeds and I was seeing nothing of the sort. I was languishing in the 8 to 12 range.
Now I run a hackintosh rig, and there have been some complaints about FW speeds on them in certain configurations (which I’ve since gotten a bead on), so I just moved the Drobo over to the USB bus and let it live there where it seemed to stay pretty solid at 12MB/sec reads and writes. That was fast enough to serve up my itunes library and back up my home folder and I just left it at that.
Until 2 weeks ago when the drobo started blinking its lights yellow and green. For some reason it was doing a re-layout. Okay. So that completed. Then it just started randomly dropping off of the USB bus when copying files. Awesome.
So I finally sent in an e-mail to drobo tech support. They had me send in a diagnostic file and some information and told me they would get back to me.
In the meantime, the drobo was in an unhappy place. randomly doing rebuilds. I decided to just pull everything off of it and reformat it. Pulling off the data took 3 days between slow speeds and random bus drops.
After a few days I called up Data Robotics to check on the ticket. Apparently it was with tier 3 support and they hadn’t got to it yet. The guy on the phone promised to send them an IM that I had enquired. Okay.
So I reformat the thing, and start putting data back on a little at a bit. I notice excellent speeds from the firewire bus. i.e. 28MB/sec writes. Coolness! Except after about 4 hours of so, it was right back to lousy speeds. I listened to the unit and noticed that even with no access from the computer, the hard drives were doing something. But the drobo gave no indication that anything was happening. All green lights and happy times. I had noticed this before, but just wrote it off as wonkyness.
So I call Data Robotics again after 6 days of no status reports. Again I get the “tier 3 has it, and hasn’t looked at it and all I can do is send them an IM” song and dance. I ask to talk with a manager. Apparently all the managers have gone home (it was 6pm their time) so they promise me a call back on friday morning.
Well low and behold the very next day tier 3 has a chance to look at my ticket. Go figure. So I get a new unreleased firmware to prevent drops off of the USB bus, and a recommendation to remove a 320GB drive that is in the unit as apparently it’s been disappearing to the drobo and coming back randomly.
From talking with the support rep, apparently the drobo won’t mark a drive as bad unless it meets some specific conditions, yet it will continuously re-layout the drobo when a drive disappears, reappears disappears…etc. He said a failing drive controller can cause the issue.
I pulled the drive and wrote zeros across the whole thing with no issues, but maybe there is some weirdness between the drobo and the drive. For 320GB, I don’t care.
Without the drive, the drobo is no longer constantly clicking away, and speeds are pretty great with 3 1TB drives. 28MB/sec writes and 40MB/sec reads. It did hard freeze one time while moving all of my data back to it, but it hasn’t since so I’m keeping my fingers crossed.
Some thoughts and reflections on Drobo troubles:
The drobo diagnostic file is either encrypted or an unreadable binary, which means the end user has no way to troubleshoot problems with the drobo themselves without involving DR. This sucks.
The drobo can have a problem with a drive that massively affects its performance without reporting it in any way, either by lights or by the dashboard. DR could easily set a threshold for “edge events” that mark a drive as questionable to the user without calling it failed.
The drobo will do re-layouts (or some kind of drive maintenance magic) without blinking any lights or giving the user any indication of what its doing. This wouldn’t be a big deal if it didn’t hurt performance.
Drobo support is really slow. While I had no data on the drobo that I couldn’t lose, I could imagine being in big freakout mode if my data was at risk and my vendor didn’t even start working on my support ticket for a week. This might not have been a huge deal if I was given any indication up front of how long it would take, or the phone reps offered anything other than a shrug of the shoulders when I called.
Data Robotics has since RMA’d my drobo.
With the new unit, I plugged in my drives and after about half an hour, it decided to do a re-layout with blinking green and yellow lights.
Data Robotics closed my case and sent an e-mail. It noted that if I needed the case re-opened I should just respond to that e-mail. I responded that the machine did a green/yellow re-layout immediately, and included the diagnostic file. They have not responded in over a week.
Figuring that DR has washed their hands of me now that they’ve given me a replacement unit, I wanted to get a sense of when the drobo does its silent re-layouts.
I popped in 2 drives that I thought might be problematic, 2 samsung spinpoint 1TB drives. I formatted and loaded up my 700GB of actual space. Files transfered and everything was peachy. Did a bunch of reading and writing to the drive and no re-layouts. I left it like this for a few days just to make sure.
Once I was confident of these drives I popped in a 750Gb drive as well. Now, when you add a drive, its space is instantly available in the pool, but of course, its not yet acting like a RAID drive because a whole lot of data from the other drives has to be transfered to it to build up the parity needed for recovery. As I sat and listened, this process didn’t start immediately. I transfered over some files, and once that stopped, bingo, silent re-layout.
So I think I’m starting to get to the heart of the problem.
Adding a 750GB drive to the drobo is just like replacing a 750GB drive (and takes the same amount of time), except it won’t show blinking lights while doing it. This process from what I can tell takes something like 13 hours.
But here’s the catch. If your computer goes to sleep, or shuts down or you eject it and unplug the usb or firewire cable, the drobo will sleep itself and the re-layout stops. Wake your machine up and the re-layout resumes.
This means if you only use the machine that the drobo is connected to for an hour or two each day, it may take the better part of a week for this re-layout to complete. And every time you use your machine you will constantly hear the drives being accessed, and speeds will be slow. And you will have no indication in the dashboard or on the drobo itself that this is happening.
As a quick aside, while researching this stuff I’ve read a couple of articles on why there are RAID class SATA drives. Apparently standard desktop drives can become unresponsive for multiple seconds when they encounter and remap a bad sector on the drive. If the time that they are unresponsive exceeds the limits of the software or hardware responsible for the RAID, it will be dropped from the pool as failed. RAID class drives don’t become unresponsive while fixing bad sectors, and so don’t get dropped from the pool.
Now it’s not clear if this happens with the drobo or not, but it certainly seems like it could be a possibility given what they said about my 320GB dropping from the bus repeatedly.
You can avoid this on new drives by writing zeros to every sector of the drive when you first buy it. That will force the drive to remap its bad sectors up front, and barring a bad drive, you shouldn’t have any more for a while, so no dropping from the bus and no re-layouts.
So if you do have a drive that let’s say you didn’t write zeros to, and 3 months into using it, it drops off the bus on the drobo. Everything is still fine, the sector has been remapped, but the drobo is going to begin a re-layout. If that’s a 1TB drive, it’s going to take the better part of 15 hours straight. Another week or two of re-layout if your machine is sleeping a lot.
After adding another 1TB drive (maxtor) and letting it re-layout for another 15 hours or so, the drobo is no longer crunching away, and speeds are back into the decent range.
I’ll update if there is more weirdness that pops up as I go along.
But I should note here, since I didn’t mention it anywhere else in the post: Noise. When the drobo is really working the fans can kick up and be pretty noticeable. But for the most part it just purrs quietly away.
What you will notice however is a loud oscillating hum that happens when a drobo is fully loaded with drives. Every second or so, the drobo will pulse with a loud hum and repeat until you put your hand on the case and shift it a little bit. This will usually stop it for a little while, but it will eventually return with use.
I’m guessing this is caused by the spinning of the drives getting into a resonant frequency and the drive slots themselves not having a lot of padding against drive vibration. The internals of the drobo could probably be padded a little bit to prevent this vibration from being transfer to the externals of the case, but I haven’t cracked mine open to see yet. Unfortunately you can’t just put the drobo on a carpeted floor and hope it dampens it out. It doesn’t. The whole case resonates with the hum.
Both drobos that I’ve had display this problem.
The sound is particularly annoying because it’s not a constant hum that you can tune out. Like a dripping faucet, you have to do something about it.