Re: Problems with software RAID + iSCSI or GNBD

Christopher Smith <csmith@xxxxxxxxxxxxxxxx> · Wed, 29 Jun 2005 12:09:45 +1000

Michael Stumpf wrote:
I probably can't help much at the moment, but...

I didn't realize that NBD type things had advanced to even this level of 
stability.  This is good news.  I've been wanting to do something like 
what you're trying for some time to overcome the bounds of  
power/busses/heat/space that limit you to a single machine when building 
a large md or LVM.  Looking at the GNBD project page, it still seems 
pretty raw, although a related project DDRAID seems to hold some promise.

I'd be pretty timid about putting anything close to production on these 
drivers, though.

What distro / kernel version / level of GNBD are you using?

Well, I don't know if they have yet - the main reason I'm fiddling with 
this is to see if it's feasible :).

However, I have belted tens of gigabytes of data at the mirror-over-GNBD 
and mirror-over-iSCSI using various benchmarking tools without any 
kernel panics or (apparent) data corruption, so I'm gaining confidence 
that it's a workable solution.  I haven't yet started the same level of 
testing with Windows and Linux clients sitting above the 
initiator/bridge level yet, however, as I want to make sure the back end 
is pretty stable before moving on (as it will become a - relatively - 
single point of failure for most of the important machines in our 
network, and hence the entire company).

I'm just using a stock Fedora Core 4 and the GNBD it includes.  A bit 
bleeding edge, I know, but I figured since it had just been released 
when I started on this project, why not ;).

With regards to the problem I was having with node failures, at least 
with iSCSI the solution was setting a timeout so that a "disk failed" 
error was actually returned - by default the iSCSI initiator assumes any 
disconnection errors are network-related and transient, so it simply 
stops any IO to the iSCSI target until it reappears.  Now that I've 
specified a timeout, node "failures" behave as expected and the mirror 
goes into degraded mode.

I assume I need to do something similar with GNBD so that it really does 
"fail", rather than "hang", but I've been too busy over the last few 
days to actually look into it.

CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html