On Fri, Oct 5, 2018 at 6:57 PM Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > > a user just stumbled across a problem with directory content in cephfs > (kernel client, ceph 12.2.8, one active, one standby-replay instance): > > > root@host1:~# ls /ceph/sge-tmp/db/work/06/ | wc -l > 224 > root@host1:~# uname -a > Linux host1 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > root@host2:~# ls /ceph/sge-tmp/db/work/06/ | wc -l > 224 > root@host2:~# uname -a > Linux host2 4.15.0-32-generic #35~16.04.1-Ubuntu SMP Fri Aug 10 21:54:34 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > root@host3:~# ls /ceph/sge-tmp/db/work/6c | wc -l > 225 > root@host3:~# uname -a > Linux host3 4.13.0-19-generic #22~16.04.1-Ubuntu SMP Mon Dec 4 15:35:18 > UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > > Three hosts, different kernel versions, and one extra directory entry on > the third host. All host used the same mount configuration: > which kernel versions? > # mount | grep ceph > <monitors>:/volumes on /ceph type ceph > (rw,relatime,name=volumes,secret=<hidden>,acl,readdir_max_entries=8192,readdir_max_bytes=4104304) > > MDS logs only contain '2018-10-05 12:43:55.565598 7f2b7c578700 1 > mds.ceph-storage-04 Updating MDS map to version 325550 from mon.0' about > every few minutes, with increasing version numbers. ceph -w also shows > the following warnings: > > 2018-10-05 12:25:06.955085 mon.ceph-storage-03 [WRN] Health check > failed: 2 clients failing to respond to cache pressure (MDS_CLIENT_RECALL) > 2018-10-05 12:26:18.895358 mon.ceph-storage-03 [INF] MDS health message > cleared (mds.0): Client host1:volumes failing to respond to cache pressure > 2018-10-05 12:26:18.895401 mon.ceph-storage-03 [INF] MDS health message > cleared (mds.0): Client cb-pc10:volumes failing to respond to cache pressure > 2018-10-05 12:26:19.415890 mon.ceph-storage-03 [INF] Health check > cleared: MDS_CLIENT_RECALL (was: 2 clients failing to respond to cache > pressure) > 2018-10-05 12:26:19.415919 mon.ceph-storage-03 [INF] Cluster is now healthy > > Timestamps of the MDS log messages and the messages about cache pressure > are equal, so I assume that the MDS map has a list of failing clients > and thus gets updated. > > > But this does not explain the difference in the directory content. All > entries are subdirectories. I also tried to enforce renewal of cached > information by drop the kernel caches on the affected host, but to no > avail yet. Caps on the MDS have dropped from 3.2 million to 800k, so > dropping was effective. > > > Any hints on the root cause for this problem? I've also tested various > other clients....some show 224 entries, some 225. > > > Regards, > > Burkhard > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com