Re: how to fix num_strays?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 15 Apr 2020 18:15:09 +0200

On Wed, Apr 15, 2020 at 5:13 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>
> On Wed, Apr 15, 2020 at 2:33 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > Following some cephfs issues today we have a stable cluster but the
> > num_strays is incorrect.
> > After starting the mds, the values are reasonable, but they very soon
> > underflow and start showing 18E  (2^64 - a few)
> >
> > ---------------mds---------------- --mds_cache--- ------mds_log------
> > -mds_mem- ----mds_server----- mds_ ---objecter---
> > req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs
> > repl|ino  dn  |hcr  hcs  hsr  cre |sess|actv rd   wr  |
> > 129    0    0  253k 1.9k   0    0 |246    0    0 |  5  4.0k   5    0
> > |253k 254k|129    0    0    0 |119 |  3    1    2
> > 8.2k   0    0  253k 1.9k   0    0 |129    0    0 |395  4.4k   7    0
> > |253k 254k|8.2k  11    0    0 |119 |  0   33  517
> > 9.7k   0    0  253k 1.8k   0    0 |181    0    0 |302  4.7k   7    0
> > |253k 254k|9.7k   5    0    0 |119 |  1   44  297
> >  10k   0    0  253k 1.8k   0    0 |217    0    0 |382  5.1k   7    0
> > |253k 254k| 10k  11    0    0 |119 |  0   54  405
> > 9.0k   0    0  253k 1.7k   0    0 |205    0    0 |386  5.5k   8    0
> > |253k 254k|9.0k   4    0    0 |119 |  1   46  431
> > 8.2k   0    0  253k 1.7k   0    0 |161    0    0 |326  5.8k   8    0
> > |253k 254k|8.2k   6    0    0 |119 |  1   37  397
> > 8.0k   0    0  253k 1.6k   0    0 |135    0    0 |279  6.1k   8    0
> > |253k 254k|8.0k   4    0    0 |119 |  1   31  317
> > 9.2k   0    0  253k 1.6k   0    0 | 18E   0    0 |153  6.2k   8    0
> > |253k 254k|9.2k   6    0    0 |119 |  1    2  265
> > 8.2k   0    0  253k 1.7k   0    0 | 18E   0    0 | 40  6.3k   8    0
> > |253k 254k|8.2k   5    0    0 |119 |  3    3   17
> >
> > Is there a way to reset the num_strays to the correct number of strays ?
> >
>
> try command 'ceph daemon <mds of rank 0> scrub_path '~mdsdir' force
> recursive repair'

thanks for the reply. Here's the ceph log from this repair:

https://termbin.com/o8tc

The active mds (single active only) still showed 18E, so I failed over
to a standby and it seems a bit better, but still occasionally
dropping below zero to 18E.
I ran scrub_path a few times and it finds errors each time...

-- Dan

>
> > # for i in `seq 600 609`; do rados --cluster=dwight listomapvals -p
> > cephfs_metadata ${i}.00000000 2>&1 | wc -l; done
> > 95
> > 220
> > 91
> > 173
> > 567
> > 1578
> > 1042
> > 754
> > 445
> > 261
> >
> > Cheers, Dan
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx