Replication not working on server hang

mark at mark.mielke.cc (Mark Mielke) · Fri, 28 Aug 2009 09:45:26 -0400

On 08/28/2009 09:28 AM, Stephan von Krawczynski wrote:
> On Fri, 28 Aug 2009 14:28:51 +0200
> David Saez Padros<david at ols.es>  wrote:
>
>    
>> Hi
>>
>> well, that never hapen before when using nfs with the same
>> computers, same disk, etc ... for almost 2 years, so it's more
>> than possible that is glusterfs the one which is triggering this
>> suposed ext3 bug, but appart from this:
>>      

FYI: NFS will have problems in such a situation as well. If NFS tries to 
write to a local file system which hangs, NFS will hang too. With soft 
mounts, these can time out, but that's not clearly the better option 
either. Anybody who has worked with NFS for some time has seen this 
before. Once upon a time at our company (1990?) all mounts were hard 
mounts, and when a problem occurred, the processes completely locked out 
the user. Control-C wouldn't even work!

NFS doesn't offer the options that GlusterFS does, so comparing 1:1 
doesn't really make sense.

For "not experiencing it in 2 years" - I think people really need to 
understand that GlusterFS is a *user space application*. Most 
specifically, this means that it *only* runs standard system calls, that 
any other program such as /bin/cat, /bin/tar, or /bin/du would run. 
There is no legitimate reason at all that the underlying file system 
should lock up as a result of this. For some file systems, there is a 
hang, not a lock up, until some critical section of the disk or in 
memory representation is finished being worked on. I understand that 
ext3 removing a large file is an example of something that might trigger 
this for a period.

I think GlusterFS should try to be more resilient to this sort of thing 
as well - but comparisons to NFS are invalid, and treating this as a 
GlusterFS problem only (i.e. not tracking down the FS vendor and having 
them fix their FS) is also invalid.

It's not a simple problem to solve. But, it should be solved. For RAID 
disks, for example, they are often tuned to significantly reduce the 
retry attempts and timeouts, so that the system remains responsive even 
when the disk is failing. GlusterFS should do the same. It's not a 
perfect solution - having a long running operation time out to soon 
incorrectly is a risk - but it's a necessary solution for any large 
scale cluster.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>