Re: CephFS warning: clients laggy due to laggy OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Janek,

On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <
janek.bevendorff@xxxxxxxxxxxxx> wrote:

> Thanks! However, I still don't really understand why I am seeing this.
>

This is due to a changes that was merged recently in pacific

        https://github.com/ceph/ceph/pull/52270

The MDS would not evict laggy clients if the OSDs report as laggy. Laggy
OSDs can cause cephfs clients to not flush dirty data (during cap revokes
by the MDS) and thereby showing up as laggy and getting evicted by the MDS.
This behaviour was changed and therefore you get warnings that some client
are laggy but they are not evicted since the OSDs are laggy.


> The first time I had this, one of the clients was a remote user dialling
> in via VPN, which could indeed be laggy. But I am also seeing it from
> neighbouring hosts that are on the same physical network with reliable ping
> times way below 1ms. How is that considered laggy?
>
 Are some of your OSDs reporting laggy? This can be check via `perf dump`

> ceph tell mds.<> perf dump
(search for op_laggy/osd_laggy)


> On 18/09/2023 18:07, Laura Flores wrote:
>
> Hi Janek,
>
> There was some documentation added about it here:
> https://docs.ceph.com/en/pacific/cephfs/health-messages/
>
> There is a description of what it means, and it's tied to an mds
> configurable.
>
> On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff <
> janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
>> Hey all,
>>
>> Since the upgrade to Ceph 16.2.14, I keep seeing the following warning:
>>
>> 10 client(s) laggy due to laggy OSDs
>>
>> ceph health detail shows it as:
>>
>> [WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs
>>      mds.***(mds.3): Client *** is laggy; not evicted because some
>> OSD(s) is/are laggy
>>      more of this...
>>
>> When I restart the client(s) or the affected MDS daemons, the message
>> goes away and then comes back after a while. ceph osd perf does not list
>> any laggy OSDs (a few with 10-60ms ping, but overwhelmingly < 1ms), so
>> I'm on a total loss what this even means.
>>
>> I have never seen this message before nor was I able to find anything
>> about it. Do you have any idea what this message actually means and how
>> I can get rid of it?
>>
>> Thanks
>> Janek
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage <https://ceph.io>
>
> Chicago, IL
>
> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
> M: +17087388804
>
>
> --
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577www.webis.de
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux