client reconnect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I mentioned in a previous email that client reconnection may not be 100%. I encountered this again in the following scenario: one of my servers (in a multiserver unify/afr) was trying to format a bad drive, and this knocked out access to all my 3ware disks which were being exported by GlusterFS from that machine. While in this condition, a couple of clients tried to ls directories on a filesystem that uses this server (and its mirror). I suspect they were able to contact the glusterfsd of the "bad" machine, but glusterfsd deadlocked trying to access the disk. I ended up rebooting the server, but the clients that were trying to ls never returned and had to be killed. The mountpoints had to be unmounted and the filesystem remounted.

It seems to me (you will probably come up with something much better) that if the client successfully communicates a request to a server but the server doesn't complete the request, the client needs to timeout the I/O request that it was waiting on and try again. In the case of afr, it should also check to see if the mirror host can satisfy the request, instead.

Thanks,

Brent




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux