Re: CephFS warning: clients laggy due to laggy OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Janek,

On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff <
janek.bevendorff@xxxxxxxxxxxxx> wrote:

> Hi Venky,
>
> As I said: There are no laggy OSDs. The maximum ping I have for any OSD in
> ceph osd perf is around 60ms (just a handful, probably aging disks). The
> vast majority of OSDs have ping times of less than 1ms. Same for the host
> machines, yet I'm still seeing this message. It seems that the affected
> hosts are usually the same, but I have absolutely no clue why.
>

It's possible that you are running into a bug which does not clear the
laggy clients list which the MDS sends to monitors via beacons. Could you
help us out with debug mds logs (by setting debug_mds=20) for the active
mds for around 15-20 seconds and share the logs please? Also reset the log
level once done since it can hurt performance.

# ceph config set mds.<> debug_mds 20

and reset via

# ceph config rm mds.<> debug_mds


> Janek
>
>
> On 19/09/2023 12:36, Venky Shankar wrote:
>
> Hi Janek,
>
> On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <
> janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
>> Thanks! However, I still don't really understand why I am seeing this.
>>
>
> This is due to a changes that was merged recently in pacific
>
>         https://github.com/ceph/ceph/pull/52270
>
> The MDS would not evict laggy clients if the OSDs report as laggy. Laggy
> OSDs can cause cephfs clients to not flush dirty data (during cap revokes
> by the MDS) and thereby showing up as laggy and getting evicted by the MDS.
> This behaviour was changed and therefore you get warnings that some client
> are laggy but they are not evicted since the OSDs are laggy.
>
>
>> The first time I had this, one of the clients was a remote user dialling
>> in via VPN, which could indeed be laggy. But I am also seeing it from
>> neighbouring hosts that are on the same physical network with reliable ping
>> times way below 1ms. How is that considered laggy?
>>
>  Are some of your OSDs reporting laggy? This can be check via `perf dump`
>
> > ceph tell mds.<> perf dump
> (search for op_laggy/osd_laggy)
>
>
>> On 18/09/2023 18:07, Laura Flores wrote:
>>
>> Hi Janek,
>>
>> There was some documentation added about it here:
>> https://docs.ceph.com/en/pacific/cephfs/health-messages/
>>
>> There is a description of what it means, and it's tied to an mds
>> configurable.
>>
>> On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff <
>> janek.bevendorff@xxxxxxxxxxxxx> wrote:
>>
>>> Hey all,
>>>
>>> Since the upgrade to Ceph 16.2.14, I keep seeing the following warning:
>>>
>>> 10 client(s) laggy due to laggy OSDs
>>>
>>> ceph health detail shows it as:
>>>
>>> [WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs
>>>      mds.***(mds.3): Client *** is laggy; not evicted because some
>>> OSD(s) is/are laggy
>>>      more of this...
>>>
>>> When I restart the client(s) or the affected MDS daemons, the message
>>> goes away and then comes back after a while. ceph osd perf does not list
>>> any laggy OSDs (a few with 10-60ms ping, but overwhelmingly < 1ms), so
>>> I'm on a total loss what this even means.
>>>
>>> I have never seen this message before nor was I able to find anything
>>> about it. Do you have any idea what this message actually means and how
>>> I can get rid of it?
>>>
>>> Thanks
>>> Janek
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage <https://ceph.io>
>>
>> Chicago, IL
>>
>> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
>> M: +17087388804
>>
>>
>> --
>> Bauhaus-Universität Weimar
>> Bauhausstr. 9a, R308
>> 99423 Weimar, Germany
>>
>> Phone: +49 3643 58 3577www.webis.de
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
>
> --
> Cheers,
> Venky
>
> --
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577www.webis.de
>
>

-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux