Inconsistent directory content in cephfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


a user just stumbled across a problem with directory content in cephfs (kernel client, ceph 12.2.8, one active, one standby-replay instance):


root@host1:~# ls /ceph/sge-tmp/db/work/06/ | wc -l
224
root@host1:~# uname -a
Linux host1 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


root@host2:~# ls /ceph/sge-tmp/db/work/06/ | wc -l
224
root@host2:~# uname -a
Linux host2 4.15.0-32-generic #35~16.04.1-Ubuntu SMP Fri Aug 10 21:54:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


root@host3:~# ls /ceph/sge-tmp/db/work/6c | wc -l
225
root@host3:~# uname -a
Linux host3 4.13.0-19-generic #22~16.04.1-Ubuntu SMP Mon Dec 4 15:35:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Three hosts, different kernel versions, and one extra directory entry on the third host. All host used the same mount configuration:

# mount | grep ceph
<monitors>:/volumes on /ceph type ceph (rw,relatime,name=volumes,secret=<hidden>,acl,readdir_max_entries=8192,readdir_max_bytes=4104304)

MDS logs only contain '2018-10-05 12:43:55.565598 7f2b7c578700  1 mds.ceph-storage-04 Updating MDS map to version 325550 from mon.0' about every few minutes, with increasing version numbers. ceph -w also shows the following warnings:

2018-10-05 12:25:06.955085 mon.ceph-storage-03 [WRN] Health check failed: 2 clients failing to respond to cache pressure (MDS_CLIENT_RECALL) 2018-10-05 12:26:18.895358 mon.ceph-storage-03 [INF] MDS health message cleared (mds.0): Client host1:volumes failing to respond to cache pressure 2018-10-05 12:26:18.895401 mon.ceph-storage-03 [INF] MDS health message cleared (mds.0): Client cb-pc10:volumes failing to respond to cache pressure 2018-10-05 12:26:19.415890 mon.ceph-storage-03 [INF] Health check cleared: MDS_CLIENT_RECALL (was: 2 clients failing to respond to cache pressure)
2018-10-05 12:26:19.415919 mon.ceph-storage-03 [INF] Cluster is now healthy

Timestamps of the MDS log messages and the messages about cache pressure are equal, so I assume that the MDS map has a list of failing clients and thus gets updated.


But this does not explain the difference in the directory content. All entries are subdirectories. I also tried to enforce renewal of cached information by drop the kernel caches on the affected host, but to no avail yet. Caps on the MDS have dropped from 3.2 million to 800k, so dropping was effective.


Any hints on the root cause for this problem? I've also tested various other clients....some show 224 entries, some 225.


Regards,

Burkhard


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux