Re: how to fix num_strays?

"Yan, Zheng" <ukernel@xxxxxxxxx> · Thu, 16 Apr 2020 09:53:05 +0800

On Thu, Apr 16, 2020 at 12:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Apr 15, 2020 at 5:13 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> >
> > On Wed, Apr 15, 2020 at 2:33 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi all,
> > >
> > > Following some cephfs issues today we have a stable cluster but the
> > > num_strays is incorrect.
> > > After starting the mds, the values are reasonable, but they very soon
> > > underflow and start showing 18E  (2^64 - a few)
> > >
> > > ---------------mds---------------- --mds_cache--- ------mds_log------
> > > -mds_mem- ----mds_server----- mds_ ---objecter---
> > > req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs
> > > repl|ino  dn  |hcr  hcs  hsr  cre |sess|actv rd   wr  |
> > > 129    0    0  253k 1.9k   0    0 |246    0    0 |  5  4.0k   5    0
> > > |253k 254k|129    0    0    0 |119 |  3    1    2
> > > 8.2k   0    0  253k 1.9k   0    0 |129    0    0 |395  4.4k   7    0
> > > |253k 254k|8.2k  11    0    0 |119 |  0   33  517
> > > 9.7k   0    0  253k 1.8k   0    0 |181    0    0 |302  4.7k   7    0
> > > |253k 254k|9.7k   5    0    0 |119 |  1   44  297
> > >  10k   0    0  253k 1.8k   0    0 |217    0    0 |382  5.1k   7    0
> > > |253k 254k| 10k  11    0    0 |119 |  0   54  405
> > > 9.0k   0    0  253k 1.7k   0    0 |205    0    0 |386  5.5k   8    0
> > > |253k 254k|9.0k   4    0    0 |119 |  1   46  431
> > > 8.2k   0    0  253k 1.7k   0    0 |161    0    0 |326  5.8k   8    0
> > > |253k 254k|8.2k   6    0    0 |119 |  1   37  397
> > > 8.0k   0    0  253k 1.6k   0    0 |135    0    0 |279  6.1k   8    0
> > > |253k 254k|8.0k   4    0    0 |119 |  1   31  317
> > > 9.2k   0    0  253k 1.6k   0    0 | 18E   0    0 |153  6.2k   8    0
> > > |253k 254k|9.2k   6    0    0 |119 |  1    2  265
> > > 8.2k   0    0  253k 1.7k   0    0 | 18E   0    0 | 40  6.3k   8    0
> > > |253k 254k|8.2k   5    0    0 |119 |  3    3   17
> > >
> > > Is there a way to reset the num_strays to the correct number of strays ?
> > >
> >
> > try command 'ceph daemon <mds of rank 0> scrub_path '~mdsdir' force
> > recursive repair'
>
> thanks for the reply. Here's the ceph log from this repair:
>
> https://termbin.com/o8tc
>
> The active mds (single active only) still showed 18E, so I failed over
> to a standby and it seems a bit better, but still occasionally
> dropping below zero to 18E.
> I ran scrub_path a few times and it finds errors each time...
>

do you mean scrub fixed the error, but the stat error keeps happening?
which version of mds do you use?

> -- Dan
>
> >
> > > # for i in `seq 600 609`; do rados --cluster=dwight listomapvals -p
> > > cephfs_metadata ${i}.00000000 2>&1 | wc -l; done
> > > 95
> > > 220
> > > 91
> > > 173
> > > 567
> > > 1578
> > > 1042
> > > 754
> > > 445
> > > 261
> > >
> > > Cheers, Dan
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx