What writefull actually means, haven't really found anything in ceph doc and just some flowchart which not really clear in google :/ This is when it started and have this kind of messages after: 2022-02-18T08:47:46.694274+0700 osd.148 (osd.148) 1701 : cluster [WRN] slow request osd_op(client.524665199.0:984004907 24.74es0 24:72fb53ba:::9213182a-14ba-48ad-bde9-289a1c0c0de8.21813 595.2__shadow_.6taCcYVmTAl87Ai0ZggPc-b_el2M4A7_3:head [writefull 0~4194304 in=4194304b] snapc 0=[] ondisk+write+known_if_redirected e63105) initiated 2022-02-18T08:47:15.758941+0700 cur rently started Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo@xxxxxxxxx --------------------------------------------------- -----Original Message----- From: Anthony D'Atri <anthony.datri@xxxxxxxxx> Sent: Friday, February 18, 2022 1:44 PM To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> Subject: Re: Slow ops on 1 host Email received from the internet. If in doubt, don't click any link nor open any attachment ! ________________________________ Look at the OSD logs, figure out which OSD. Correlate with syslog / dmesg and smartctl etc. SATA? If it’s all OSDs on the node likely the HBA. If just one, likely the drive. > On Feb 17, 2022, at 9:44 PM, Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> wrote: > > Hi, > > Is there a way to identify what and where is stuck on 1 host that causing slow operation? > If I shutdown that host where the slow ops is coming, the cluster got back to normal operation, if I start back, slow ops coming. > > I'm planning to purge all osds on that host but I scare a bit the slow ops will move to another host. > > Thank you > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx