Hi Venky,
As I said: There are no laggy OSDs. The maximum ping I have for
any OSD in ceph osd perf is around 60ms (just a handful, probably
aging disks). The vast majority of OSDs have ping times of less
than 1ms. Same for the host machines, yet I'm still seeing this
message. It seems that the affected hosts are usually the same,
but I have absolutely no clue why.
Janek
Hi Janek,
On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote:
Thanks! However, I still don't really understand why I am seeing this.
This is due to a changes that was merged recently in pacific
The MDS would not evict laggy clients if the OSDs report as laggy. Laggy OSDs can cause cephfs clients to not flush dirty data (during cap revokes by the MDS) and thereby showing up as laggy and getting evicted by the MDS. This behaviour was changed and therefore you get warnings that some client are laggy but they are not evicted since the OSDs are laggy.The first time I had this, one of the clients was a remote user dialling in via VPN, which could indeed be laggy. But I am also seeing it from neighbouring hosts that are on the same physical network with reliable ping times way below 1ms. How is that considered laggy?
Are some of your OSDs reporting laggy? This can be check via `perf dump`
> ceph tell mds.<> perf dump(search for op_laggy/osd_laggy)
_______________________________________________
On 18/09/2023 18:07, Laura Flores wrote:
Hi Janek,
There was some documentation added about it here: https://docs.ceph.com/en/pacific/cephfs/health-messages/
There is a description of what it means, and it's tied to an mds configurable.
On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote:
Hey all,
Since the upgrade to Ceph 16.2.14, I keep seeing the following warning:
10 client(s) laggy due to laggy OSDs
ceph health detail shows it as:
[WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs
mds.***(mds.3): Client *** is laggy; not evicted because some
OSD(s) is/are laggy
more of this...
When I restart the client(s) or the affected MDS daemons, the message
goes away and then comes back after a while. ceph osd perf does not list
any laggy OSDs (a few with 10-60ms ping, but overwhelmingly < 1ms), so
I'm on a total loss what this even means.
I have never seen this message before nor was I able to find anything
about it. Do you have any idea what this message actually means and how
I can get rid of it?
Thanks
Janek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage
Chicago, IL
lflores@xxxxxxx | lflores@xxxxxxxxxx
M: +17087388804
-- Bauhaus-Universität Weimar Bauhausstr. 9a, R308 99423 Weimar, Germany Phone: +49 3643 58 3577 www.webis.de
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Cheers,Venky
-- Bauhaus-Universität Weimar Bauhausstr. 9a, R308 99423 Weimar, Germany Phone: +49 3643 58 3577 www.webis.de
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx