On Wed, Apr 15, 2020 at 5:13 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > On Wed, Apr 15, 2020 at 2:33 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > Hi all, > > > > Following some cephfs issues today we have a stable cluster but the > > num_strays is incorrect. > > After starting the mds, the values are reasonable, but they very soon > > underflow and start showing 18E (2^64 - a few) > > > > ---------------mds---------------- --mds_cache--- ------mds_log------ > > -mds_mem- ----mds_server----- mds_ ---objecter--- > > req rlat fwd inos caps exi imi |stry recy recd|subm evts segs > > repl|ino dn |hcr hcs hsr cre |sess|actv rd wr | > > 129 0 0 253k 1.9k 0 0 |246 0 0 | 5 4.0k 5 0 > > |253k 254k|129 0 0 0 |119 | 3 1 2 > > 8.2k 0 0 253k 1.9k 0 0 |129 0 0 |395 4.4k 7 0 > > |253k 254k|8.2k 11 0 0 |119 | 0 33 517 > > 9.7k 0 0 253k 1.8k 0 0 |181 0 0 |302 4.7k 7 0 > > |253k 254k|9.7k 5 0 0 |119 | 1 44 297 > > 10k 0 0 253k 1.8k 0 0 |217 0 0 |382 5.1k 7 0 > > |253k 254k| 10k 11 0 0 |119 | 0 54 405 > > 9.0k 0 0 253k 1.7k 0 0 |205 0 0 |386 5.5k 8 0 > > |253k 254k|9.0k 4 0 0 |119 | 1 46 431 > > 8.2k 0 0 253k 1.7k 0 0 |161 0 0 |326 5.8k 8 0 > > |253k 254k|8.2k 6 0 0 |119 | 1 37 397 > > 8.0k 0 0 253k 1.6k 0 0 |135 0 0 |279 6.1k 8 0 > > |253k 254k|8.0k 4 0 0 |119 | 1 31 317 > > 9.2k 0 0 253k 1.6k 0 0 | 18E 0 0 |153 6.2k 8 0 > > |253k 254k|9.2k 6 0 0 |119 | 1 2 265 > > 8.2k 0 0 253k 1.7k 0 0 | 18E 0 0 | 40 6.3k 8 0 > > |253k 254k|8.2k 5 0 0 |119 | 3 3 17 > > > > Is there a way to reset the num_strays to the correct number of strays ? > > > > try command 'ceph daemon <mds of rank 0> scrub_path '~mdsdir' force > recursive repair' thanks for the reply. Here's the ceph log from this repair: https://termbin.com/o8tc The active mds (single active only) still showed 18E, so I failed over to a standby and it seems a bit better, but still occasionally dropping below zero to 18E. I ran scrub_path a few times and it finds errors each time... -- Dan > > > # for i in `seq 600 609`; do rados --cluster=dwight listomapvals -p > > cephfs_metadata ${i}.00000000 2>&1 | wc -l; done > > 95 > > 220 > > 91 > > 173 > > 567 > > 1578 > > 1042 > > 754 > > 445 > > 261 > > > > Cheers, Dan > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx