Re: Please discuss about Slow Peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We are using the read-intensive kioxia drives (octopus cluster) in RBD pools and are very happy with them. I don't think its the drives.

The last possibility I could think of is CPU. We run 4 OSDs per 1.92TB Kioxia drive to utilize their performance (single OSD per disk doesn't cut it at all) and have 2x16-core Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz per server. During normal operations the CPU is only lightly loaded. During peering this load peaks at or above 100%. If not enough CPU power is available, peering will be hit very badly. What you could check is:

- number of cores: at least 1 HT per OSD, better 1 core per OSD.
- cstates disabled: we run with virtualization-performance profile and the CPU is basically always at all-core boost (3.2GHz)
- sufficient RAM: we run these OSDs with 6G memory limit, that's 24G per disk! Still, the servers have 50% OSD RAM utilisation and 50% buffers, so there is enough for fast peak allocations during peering.
- check vm.min_free_kbytes: the default is way too low for OSD hosts, we use vm.min_free_kbytes=4194304 (4G), this can have latency impact for network connections
- swap disabled: disable swap on OSD hosts
- sysctl network tuning: check that your network parameters are appropriate for your network cards, the kernel defaults are still for 1G connections, there are great tuning guides on-line, here some of our settings for 10G NICs:

# Increase autotuning TCP buffer limits
# 10G fiber/64MB buffers (67108864)
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 22500       218450 67108864
net.ipv4.tcp_wmem = 22500  81920 67108864

- last check: are you using WPQ or MCLOCK? The mclock scheduler still has serious issues and switching to WPQ might help.

If none of these help, I'm out of ideas. For us the Kioxia drives work like a charm, its the pool that is easiest to manage and maintain with super-fast recovery and really good sustained performance.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: 서민우 <smw940219@xxxxxxxxx>
Sent: Tuesday, May 21, 2024 11:25 AM
To: Anthony D'Atri
Cc: Frank Schilder; ceph-users@xxxxxxx
Subject: Re:  Please discuss about Slow Peering

We used the "kioxia kcd6xvul3t20" model.
Any infamous information of this Model?

2024년 5월 17일 (금) 오전 2:58, Anthony D'Atri <anthony.datri@xxxxxxxxx<mailto:anthony.datri@xxxxxxxxx>>님이 작성:
If using jumbo frames, also ensure that they're consistently enabled on all OS instances and network devices.

> On May 16, 2024, at 09:30, Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
>
> This is a long shot: if you are using octopus, you might be hit by this pglog-dup problem: https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't mention slow peering explicitly in the blog, but its also a consequence because the up+acting OSDs need to go through the PG_log during peering.
>
> We are also using octopus and I'm not sure if we have ever seen slow ops caused by peering alone. It usually happens when a disk cannot handle load under peering. We have, unfortunately, disks that show random latency spikes (firmware update pending). You can try to monitor OPS latencies for your drives when peering and look for something that sticks out. People on this list were reporting quite bad results for certain infamous NVMe brands. If you state your model numbers, someone else might recognize it.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: 서민우 <smw940219@xxxxxxxxx<mailto:smw940219@xxxxxxxxx>>
> Sent: Thursday, May 16, 2024 7:39 AM
> To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> Subject:  Please discuss about Slow Peering
>
> Env:
> - OS: Ubuntu 20.04
> - Ceph Version: Octopus 15.0.0.1
> - OSD Disk: 2.9TB NVMe
> - BlockStorage (Replication 3)
>
> Symptom:
> - Peering when OSD's node up is very slow. Peering speed varies from PG to
> PG, and some PG may even take 10 seconds. But, there is no log for 10
> seconds.
> - I checked the effect of client VM's. Actually, Slow queries of mysql
> occur at the same time.
>
> There are Ceph OSD logs of both Best and Worst.
>
> Best Peering Case (0.5 Seconds)
> 2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
> state<Start>: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> state<Started/Primary/Peering>: Peering, affected_by_map, going to Reset
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
> acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
> state<Start>: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
> start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
> acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting
>
> Worst Peering Case (11.6 Seconds)
> 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
> state<Start>: transitioning to Stray
> 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
> start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
> 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
> state<Start>: transitioning to Stray
> 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
> start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
> 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
> state<Start>: transitioning to Stray
> 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
> start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
>
> *I wish to know about*
> - Why some PG's take 10 seconds until Peering finishes.
> - Why Ceph log is quiet during peering.
> - Is this symptom intended in Ceph.
>
> *And please give some advice,*
> - Is there any way to improve peering speed?
> - Or, Is there a way to not affect the client when peering occurs?
>
> P.S
> - I checked the symptoms in the following environments.
> -> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux