Re: ceph-fuse clients taking too long to update dir sizes

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Tue, 6 Dec 2016 01:59:49 +0000

Hi John...

>> We are running ceph/cephfs in 10.2.2. All infrastructure is in the same
>> version (rados cluster, mons, mds and cephfs clients). We mount cephfs using
>> ceph-fuse.
>>
>> Last week I triggered some of my heavy users to delete data. In the
>> following example, the user in question decreased his usage from ~4.5TB to ~
>> 600TB. However, some clients still did not update the usage (although
>> several days have passed by) while others are ok.
>>
>> From a point of view of the MDS, both types of client have healthy sessions.
>> See detailed info after this email.
>>
>> Trying to kick the session does not solve the issue. Probably only a remount
>> but users are heavily using the filesystem and I do not want to break things
>> for them now.
>>
>>
>> The only difference I can actually dig out between "good"/"bad" clients is
>> that the user continues with active bash sessions in the "bad" client (from
>> where he triggered the deletions)
>
>You're saying that the clients that actually did the deletions are the
>ones with the bad rstats, but the other clients are getting the
>updates?  Really weird.

The client which was in bad shape is the one where the user did
all the deletions. However, it is also the most used client by our users
and is normally under heavier I/O load than others. I think the issue might be
more related to the later, 

>Recursive statistics are meant to be updated somewhat lazily, but
>obviously they are meant to *eventually* update, so if days are going
>by without them catching up then that's a bug.
>
>Could you try and come up with a simple reproducer, perhaps with just
>two clients involved?

Finally, the client, after 3 days, updated the value. I think it will be really hard 
to replicate this since i think it will depend on the actual load on the client. 

However, I was looking for some pointers on how to understand the delay
and collect more info to provide you guys. The standard client and mds 
queries did not show anything abnormal.

When this happens again, where / what  should I be looking ? 

Cheers
Goncalo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com