Re: CephFs kernel client metadata caching

Denes Dolhay <denke@xxxxxxxxxxxx> · Fri, 13 Oct 2017 14:26:20 +0200

Hi,

Thank you for your fast response!

Is there a way -that You know of- to list these locks?

I write to the file with echo "foo" >> /mnt/ceph/...something... so if 
there is any locking, should not it be released after the append is done?

The strange thing is, that this -increased traffic- stage went on for 
hours, tried many times, and after I stop the watch for ~5s (not tried 
different intervals) and restart it, the traffic is gone, and there is 
normal -I think some keepalive- comm between mds and client, two packets 
in ~5s (request, response)

As if the metadata cache would only be populated in a timer, (between 1s 
and 5s) which is never reached because of the repeated watch ls query 
.... just a blind shot in the dark...

Thanks:

Denes.

On 10/13/2017 01:32 PM, Burkhard Linke wrote:
Hi,

On 10/13/2017 12:36 PM, Denes Dolhay wrote:
Dear All,

First of all, this is my first post, so please be lenient :)

For the last few days I have been testing ceph, and cephfs, deploying 
a PoC cluster.

I have been testing the cephfs kernel client caching, when I came 
across something strange, and I cannot decide if it is a bug or I 
just messed up something.

Steps given client1 and client2 both mounded the same cephfs, extra 
mount option, noatime:

Client 1: watch -n 1 ls -lah /mnt/cephfs

-in tcpdump I can see that the directory is being listed once and 
only once, all the following ls requests are served from the client 
cache

Client 2: make any modification for example append to a file, or 
delete a file directly under /mnt/cephfs

-The operation is done, and client1 is informed about the change OK.

-Client1 does not seem to cache the new metadata information received 
from the metadata server, now it communicates every second with the mds.

Client 1: stop watch ls... command, wait a few sec and restart it

-The communication stops, client1 serves ls data from cache

Please help, if it is intentional then why, if not, how can I debug it?

This is probably the intended behaviour. CephFS is a posix compliant 
filesystem, and uses capabilities (similar to locks) to control 
concurrent access to directories and files.

In your first step, a capibility for directory access is granted to 
client1. As soon as client2 wants to access the directory (probably 
read-only first for listing, write access later), the MDS has to check 
the capability requests with client1. I'm not sure about the details, 
but something similar to "write lock" should be granted to client2, 
and client1 is granted a read lock or a "I have this entry in cache 
and need the MDS to know it" lock. That's also the reason why client1 
has to ask the MDS every second whether its cache content is still 
valid. client2 probably still holds the necessary capabilities, so you 
might also see some traffic between MDS and client2.

I'm not sure why client1 does not continue to ask the MDS in the last 
step. Maybe the capability in client2 has expired and it was granted 
to client1. Others with more insight into the details of capabilities 
might be able to give you more details.

Short version: CephFS has a strict posix locking semantic implemented 
by capabilities, and you need to be aware of this fact (especially if 
you are used to NFS...)

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com