Hey Janek, I took a closer look at various places where the MDS would consider a client as laggy and it seems like a wide variety of reasons are taken into consideration and not all of them might be a reason to defer client eviction, so the warning is a bit misleading. I'll post a PR for this. In the meantime, could you share the debug logs stated in my previous email? On Wed, Sep 20, 2023 at 3:07 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > Hi Janek, > > On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff < > janek.bevendorff@xxxxxxxxxxxxx> wrote: > >> Hi Venky, >> >> As I said: There are no laggy OSDs. The maximum ping I have for any OSD >> in ceph osd perf is around 60ms (just a handful, probably aging disks). The >> vast majority of OSDs have ping times of less than 1ms. Same for the host >> machines, yet I'm still seeing this message. It seems that the affected >> hosts are usually the same, but I have absolutely no clue why. >> > > It's possible that you are running into a bug which does not clear the > laggy clients list which the MDS sends to monitors via beacons. Could you > help us out with debug mds logs (by setting debug_mds=20) for the active > mds for around 15-20 seconds and share the logs please? Also reset the log > level once done since it can hurt performance. > > # ceph config set mds.<> debug_mds 20 > > and reset via > > # ceph config rm mds.<> debug_mds > > >> Janek >> >> >> On 19/09/2023 12:36, Venky Shankar wrote: >> >> Hi Janek, >> >> On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff < >> janek.bevendorff@xxxxxxxxxxxxx> wrote: >> >>> Thanks! However, I still don't really understand why I am seeing this. >>> >> >> This is due to a changes that was merged recently in pacific >> >> https://github.com/ceph/ceph/pull/52270 >> >> The MDS would not evict laggy clients if the OSDs report as laggy. Laggy >> OSDs can cause cephfs clients to not flush dirty data (during cap revokes >> by the MDS) and thereby showing up as laggy and getting evicted by the MDS. >> This behaviour was changed and therefore you get warnings that some client >> are laggy but they are not evicted since the OSDs are laggy. >> >> >>> The first time I had this, one of the clients was a remote user dialling >>> in via VPN, which could indeed be laggy. But I am also seeing it from >>> neighbouring hosts that are on the same physical network with reliable ping >>> times way below 1ms. How is that considered laggy? >>> >> Are some of your OSDs reporting laggy? This can be check via `perf dump` >> >> > ceph tell mds.<> perf dump >> (search for op_laggy/osd_laggy) >> >> >>> On 18/09/2023 18:07, Laura Flores wrote: >>> >>> Hi Janek, >>> >>> There was some documentation added about it here: >>> https://docs.ceph.com/en/pacific/cephfs/health-messages/ >>> >>> There is a description of what it means, and it's tied to an mds >>> configurable. >>> >>> On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff < >>> janek.bevendorff@xxxxxxxxxxxxx> wrote: >>> >>>> Hey all, >>>> >>>> Since the upgrade to Ceph 16.2.14, I keep seeing the following warning: >>>> >>>> 10 client(s) laggy due to laggy OSDs >>>> >>>> ceph health detail shows it as: >>>> >>>> [WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs >>>> mds.***(mds.3): Client *** is laggy; not evicted because some >>>> OSD(s) is/are laggy >>>> more of this... >>>> >>>> When I restart the client(s) or the affected MDS daemons, the message >>>> goes away and then comes back after a while. ceph osd perf does not >>>> list >>>> any laggy OSDs (a few with 10-60ms ping, but overwhelmingly < 1ms), so >>>> I'm on a total loss what this even means. >>>> >>>> I have never seen this message before nor was I able to find anything >>>> about it. Do you have any idea what this message actually means and how >>>> I can get rid of it? >>>> >>>> Thanks >>>> Janek >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> >>> >>> -- >>> >>> Laura Flores >>> >>> She/Her/Hers >>> >>> Software Engineer, Ceph Storage <https://ceph.io> >>> >>> Chicago, IL >>> >>> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx> >>> M: +17087388804 >>> >>> >>> -- >>> Bauhaus-Universität Weimar >>> Bauhausstr. 9a, R308 >>> 99423 Weimar, Germany >>> >>> Phone: +49 3643 58 3577www.webis.de >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> >> >> -- >> Cheers, >> Venky >> >> -- >> Bauhaus-Universität Weimar >> Bauhausstr. 9a, R308 >> 99423 Weimar, Germany >> >> Phone: +49 3643 58 3577www.webis.de >> >> > > -- > Cheers, > Venky > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx