Hi Janek, The PR venky mentioned makes use of OSD's laggy parameters (laggy_interval and laggy_probability) to find if any OSD is laggy or not. These laggy parameters can reset to 0 if the interval between the last modification done to OSDMap and the time stamp when OSD was marked down exceeds the grace interval threshold which is the value we get by `mon_osd_laggy_halflife * 48` where mon_osd_laggy_halflife is a configurable value which is by default 3600 so only if the interval I talked about exceeds 172800; the laggy parameters would reset to 0. I'd recommend taking a look at what your configured value is(using cmd: ceph config get osd mon_osd_laggy_halflife). There is also a "hack" to reset the parameters manually( *Not recommended, justfor info*): set mon_osd_laggy_weight to 1 using `ceph config set osd mon_osd_laggy_weight 1` and reboot the OSD(s) which is/are being said laggy and you will see the lagginess go away. *Dhairya Parmar* Associate Software Engineer, CephFS Red Hat Inc. <https://www.redhat.com/> dparmar@xxxxxxxxxx <https://www.redhat.com/> On Wed, Sep 20, 2023 at 3:25 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > Hey Janek, > > I took a closer look at various places where the MDS would consider a > client as laggy and it seems like a wide variety of reasons are taken > into consideration and not all of them might be a reason to defer client > eviction, so the warning is a bit misleading. I'll post a PR for this. In > the meantime, could you share the debug logs stated in my previous email? > > On Wed, Sep 20, 2023 at 3:07 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > > Hi Janek, > > > > On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff < > > janek.bevendorff@xxxxxxxxxxxxx> wrote: > > > >> Hi Venky, > >> > >> As I said: There are no laggy OSDs. The maximum ping I have for any OSD > >> in ceph osd perf is around 60ms (just a handful, probably aging disks). > The > >> vast majority of OSDs have ping times of less than 1ms. Same for the > host > >> machines, yet I'm still seeing this message. It seems that the affected > >> hosts are usually the same, but I have absolutely no clue why. > >> > > > > It's possible that you are running into a bug which does not clear the > > laggy clients list which the MDS sends to monitors via beacons. Could you > > help us out with debug mds logs (by setting debug_mds=20) for the active > > mds for around 15-20 seconds and share the logs please? Also reset the > log > > level once done since it can hurt performance. > > > > # ceph config set mds.<> debug_mds 20 > > > > and reset via > > > > # ceph config rm mds.<> debug_mds > > > > > >> Janek > >> > >> > >> On 19/09/2023 12:36, Venky Shankar wrote: > >> > >> Hi Janek, > >> > >> On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff < > >> janek.bevendorff@xxxxxxxxxxxxx> wrote: > >> > >>> Thanks! However, I still don't really understand why I am seeing this. > >>> > >> > >> This is due to a changes that was merged recently in pacific > >> > >> https://github.com/ceph/ceph/pull/52270 > >> > >> The MDS would not evict laggy clients if the OSDs report as laggy. Laggy > >> OSDs can cause cephfs clients to not flush dirty data (during cap > revokes > >> by the MDS) and thereby showing up as laggy and getting evicted by the > MDS. > >> This behaviour was changed and therefore you get warnings that some > client > >> are laggy but they are not evicted since the OSDs are laggy. > >> > >> > >>> The first time I had this, one of the clients was a remote user > dialling > >>> in via VPN, which could indeed be laggy. But I am also seeing it from > >>> neighbouring hosts that are on the same physical network with reliable > ping > >>> times way below 1ms. How is that considered laggy? > >>> > >> Are some of your OSDs reporting laggy? This can be check via `perf > dump` > >> > >> > ceph tell mds.<> perf dump > >> (search for op_laggy/osd_laggy) > >> > >> > >>> On 18/09/2023 18:07, Laura Flores wrote: > >>> > >>> Hi Janek, > >>> > >>> There was some documentation added about it here: > >>> https://docs.ceph.com/en/pacific/cephfs/health-messages/ > >>> > >>> There is a description of what it means, and it's tied to an mds > >>> configurable. > >>> > >>> On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff < > >>> janek.bevendorff@xxxxxxxxxxxxx> wrote: > >>> > >>>> Hey all, > >>>> > >>>> Since the upgrade to Ceph 16.2.14, I keep seeing the following > warning: > >>>> > >>>> 10 client(s) laggy due to laggy OSDs > >>>> > >>>> ceph health detail shows it as: > >>>> > >>>> [WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs > >>>> mds.***(mds.3): Client *** is laggy; not evicted because some > >>>> OSD(s) is/are laggy > >>>> more of this... > >>>> > >>>> When I restart the client(s) or the affected MDS daemons, the message > >>>> goes away and then comes back after a while. ceph osd perf does not > >>>> list > >>>> any laggy OSDs (a few with 10-60ms ping, but overwhelmingly < 1ms), so > >>>> I'm on a total loss what this even means. > >>>> > >>>> I have never seen this message before nor was I able to find anything > >>>> about it. Do you have any idea what this message actually means and > how > >>>> I can get rid of it? > >>>> > >>>> Thanks > >>>> Janek > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>> > >>> > >>> > >>> -- > >>> > >>> Laura Flores > >>> > >>> She/Her/Hers > >>> > >>> Software Engineer, Ceph Storage <https://ceph.io> > >>> > >>> Chicago, IL > >>> > >>> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx> > >>> M: +17087388804 > >>> > >>> > >>> -- > >>> Bauhaus-Universität Weimar > >>> Bauhausstr. 9a, R308 > >>> 99423 Weimar, Germany > >>> > >>> Phone: +49 3643 58 3577www.webis.de > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > >> > >> > >> -- > >> Cheers, > >> Venky > >> > >> -- > >> Bauhaus-Universität Weimar > >> Bauhausstr. 9a, R308 > >> 99423 Weimar, Germany > >> > >> Phone: +49 3643 58 3577www.webis.de > >> > >> > > > > -- > > Cheers, > > Venky > > > > > -- > Cheers, > Venky > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx