Re: Slow ops on OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, some disks are spiking near 100%... The delay I see with the iostat
(r_await) seems to be synchronised with the delays between queued_for_pg
and reached_pg events.
The NVMe disks are not spiking, just the spinner disks.

I know the rocksdb is only partial on the NVMe. The read-ahead is also
128kb (os level) (for spinner disks). As we are dealing with smaller files,
this might also lead to a decrease of the performance.

I'm still investigating, but I'm wondering if the system is also reading
from disk for finding the KV pairs.



Op di 6 okt. 2020 om 11:23 schreef Igor Fedotov <ifedotov@xxxxxxx>:

> Hi Kristof,
>
> are you seeing high (around 100%) OSDs' disks (main or DB ones)
> utilization along with slow  ops?
>
>
> Thanks,
>
> Igor
>
> On 10/6/2020 11:09 AM, Kristof Coucke wrote:
> > Hi all,
> >
> > We have a Ceph cluster which has been expanded from 10 to 16 nodes.
> > Each node has between 14 and 16 OSDs of which 2 are NVMe disks.
> > Most disks (except NVMe's) are 16TB large.
> >
> > The expansion of 16 nodes went ok, but we've configured the system to
> > prevent auto balance towards the new disks (weight was set to 0) so we
> > could control the expansion.
> >
> > We started adding 6 disks last week (1 disk on each new node) which
> didn't
> > give a lot of issues.
> > When the Ceph status indicated the PG degraded was almost finished, we've
> > added 2 disks on each node again.
> >
> > All seemed to go fine, till yesterday morning... IOs towards the system
> > were slowing down.
> >
> > Diving onto the nodes we could see that the OSD daemons are consuming the
> > CPU power, resulting in average CPU loads going near 10 (!).
> >
> > The RGWs nor monitors nor other involved servers are having CPU issues
> > (except for the management server which is fighting with Prometheus), so
> > it's latency seems to be related to the ODS hosts.
> > All of the hosts are interconnected with 25Gbit connections, no
> bottlenecks
> > are reached on the network either.
> >
> > Important piece of information: We are using erasure coding (6/3), and we
> > do have a lot of small files...
> > The current health detail indicates degraded health redundancy where
> > 1192911/103387889228 objects are degraded. (1 pg degraded, 1 pg
> undersized).
> >
> > Diving into the historic ops of an OSD we can see that the main latency
> is
> > found between the event "queued_for_pg" and "reached_pg". (Averaging +/-
> 3
> > secs)
> >
> > As the system load is quite high I assume the systems are busy
> > recalculating the code chunks for using the new disks we've added (though
> > not sure), but I was wondering how I can better fine tune the system or
> > pinpoint the exact bottle neck.
> > Latency towards the disks doesn't seem an issue at first sight...
> >
> > We are running Ceph 14.2.11
> >
> > Who can give me some thoughts on how I can better pinpoint the bottle
> neck?
> >
> > Thanks
> >
> > Kristof
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux