Try running a scrub on that directory, that might yield more information. ceph daemon mds.XXX scrub_path /path/in/cephfs recursive Afterwards you can maybe try to repair it if it finds the error. Could also be something completely different like a bug in the clients. Paul Am Fr., 5. Okt. 2018 um 12:57 Uhr schrieb Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>: > > Hi, > > > a user just stumbled across a problem with directory content in cephfs > (kernel client, ceph 12.2.8, one active, one standby-replay instance): > > > root@host1:~# ls /ceph/sge-tmp/db/work/06/ | wc -l > 224 > root@host1:~# uname -a > Linux host1 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > root@host2:~# ls /ceph/sge-tmp/db/work/06/ | wc -l > 224 > root@host2:~# uname -a > Linux host2 4.15.0-32-generic #35~16.04.1-Ubuntu SMP Fri Aug 10 21:54:34 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > root@host3:~# ls /ceph/sge-tmp/db/work/6c | wc -l > 225 > root@host3:~# uname -a > Linux host3 4.13.0-19-generic #22~16.04.1-Ubuntu SMP Mon Dec 4 15:35:18 > UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > > Three hosts, different kernel versions, and one extra directory entry on > the third host. All host used the same mount configuration: > > # mount | grep ceph > <monitors>:/volumes on /ceph type ceph > (rw,relatime,name=volumes,secret=<hidden>,acl,readdir_max_entries=8192,readdir_max_bytes=4104304) > > MDS logs only contain '2018-10-05 12:43:55.565598 7f2b7c578700 1 > mds.ceph-storage-04 Updating MDS map to version 325550 from mon.0' about > every few minutes, with increasing version numbers. ceph -w also shows > the following warnings: > > 2018-10-05 12:25:06.955085 mon.ceph-storage-03 [WRN] Health check > failed: 2 clients failing to respond to cache pressure (MDS_CLIENT_RECALL) > 2018-10-05 12:26:18.895358 mon.ceph-storage-03 [INF] MDS health message > cleared (mds.0): Client host1:volumes failing to respond to cache pressure > 2018-10-05 12:26:18.895401 mon.ceph-storage-03 [INF] MDS health message > cleared (mds.0): Client cb-pc10:volumes failing to respond to cache pressure > 2018-10-05 12:26:19.415890 mon.ceph-storage-03 [INF] Health check > cleared: MDS_CLIENT_RECALL (was: 2 clients failing to respond to cache > pressure) > 2018-10-05 12:26:19.415919 mon.ceph-storage-03 [INF] Cluster is now healthy > > Timestamps of the MDS log messages and the messages about cache pressure > are equal, so I assume that the MDS map has a list of failing clients > and thus gets updated. > > > But this does not explain the difference in the directory content. All > entries are subdirectories. I also tried to enforce renewal of cached > information by drop the kernel caches on the affected host, but to no > avail yet. Caps on the MDS have dropped from 3.2 million to 800k, so > dropping was effective. > > > Any hints on the root cause for this problem? I've also tested various > other clients....some show 224 entries, some 225. > > > Regards, > > Burkhard > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com