Re: MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 16, 2018 at 10:55 AM Jonathan Woytek <woytek@xxxxxxxxxxx> wrote:
>
> ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
>
>

Try deleting mds0_openfiles.0 (mds1_openfiles.0 and so on if you have
multiple active mds)  from metadata pool of your filesystem. Records
in these files are open files hints. It's safe to delete them.

> On Wed, Aug 15, 2018 at 10:51 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > On Thu, Aug 16, 2018 at 10:50 AM Jonathan Woytek <woytek@xxxxxxxxxxx> wrote:
> >>
> >> Actually, I missed it--I do see the wipe start, wipe done in the log.
> >> However, it is still doing verify_diri_backtrace, as described
> >> previously.
> >>
> >
> > which version of mds do you use?
> >
> >> jonathan
> >>
> >> On Wed, Aug 15, 2018 at 10:42 PM, Jonathan Woytek <woytek@xxxxxxxxxxx> wrote:
> >> > On Wed, Aug 15, 2018 at 9:40 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> >> >> How many client reconnected when mds restarts?  The issue is likely
> >> >> because reconnected clients held two many inodes, mds was opening
> >> >> these inodes in rejoin state.  Try  starting mds with option
> >> >> mds_wipe_sessions = true. The option makes mds ignore old clients
> >> >> during recovery.  You need to unset the option and  remount clients
> >> >> after mds become 'active'
> >> >
> >> >
> >> > Thank you for the suggestion! I set that in the global section of
> >> > ceph.conf on the node where I am starting ceph-mds. After setting it
> >> > and starting ceph-mds, I'm not seeing markedly different behavior.
> >> > After flying through replay and then flying through a bunch of the
> >> > messages posted earlier, it begins to eat up memory again and slows
> >> > down, still outputting the log messages as in the original post.
> >> > Looking in the ceph-mds...log, I'm not seeing any reference to 'wipe',
> >> > so I'm not sure if it is being honored. Am I putting that in the right
> >> > place?
> >> >
> >> > jonathan
> >> > --
> >> > Jonathan Woytek
> >> > http://www.dryrose.com
> >> > KB3HOZ
> >> > PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
> >>
> >>
> >>
> >> --
> >> Jonathan Woytek
> >> http://www.dryrose.com
> >> KB3HOZ
> >> PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
>
>
>
> --
> Jonathan Woytek
> http://www.dryrose.com
> KB3HOZ
> PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux