Re: Complete freeze of a cephfs client (unavoidable hard reboot)

Francois Lafont <flafdivers@xxxxxxx> · Tue, 09 Jun 2015 02:20:58 +0200

Hi,

On 27/05/2015 22:34, Gregory Farnum wrote:

> Sorry for the delay; I've been traveling.

No problem, me too, I'm not really fast to answer. ;)

>> Ok, I see. According to the online documentation, the way to close
>> a cephfs client session is:
>>
>> ceph daemon mds.$id session ls             # to get the $session_id and the $address
>> ceph osd blacklist add $address
>> ceph osd dump                              # to get the $epoch
>> ceph daemon mds.$id osdmap barrier $epoch
>> ceph daemon mds.$id session evict $session_id
>>
>> Is it correct?
>>
>> With the commands above, could I reproduce the client freeze in my testing
>> cluster?
> 
> Yes, I believe so.

In fact, after some tests, the commands above evicts correctly the client
(`ceph daemon mds.1 session ls` returns an empty array) but in the client
side a new connection is automatically established as soon as the cephfs
mountpoint is requested. In fact, I haven't succeeded in reproducing the
freeze. ;) I have tried to stop the network in the client side (ifdown -a)
and after few minutes (more than 60 seconds though), I have seen in the
mds log "closing stale session client". But after a `ifup -a`, I have
get back a cephfs connection and a mountpoint in good health.

>> And could it be conceivable one day (for instance with an option) to be
>> able to change the behavior of cephfs to be *not*-strictly-consistent,
>> like NFS for instance? It seems to me it could improve performances of
>> cephfs and cephfs could be more flexible concerning short network failure
>> (not really sure for this second point). Ok it's just a remark of a simple
>> and unqualified ceph-user ;) but it seems to me that NFS isn't strictly
>> consistent and generally this not a problem in many use cases. Am I wrong?
> 
> Mmm, this is something we're pretty resistant to.

Ah ok, so I don't insist. ;)

> In particular NFS
> just doesn't make any efforts to be consistent when there are multiple
> writers, and CephFS works *really hard* to behave properly in that
> case. For many use cases it's not a big deal, but for others it is,
> and we target some of them.

Ok. Thanks Greg for your answer.

-- 
François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com