Re: Please discuss about Slow Peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a long shot: if you are using octopus, you might be hit by this pglog-dup problem: https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't mention slow peering explicitly in the blog, but its also a consequence because the up+acting OSDs need to go through the PG_log during peering.

We are also using octopus and I'm not sure if we have ever seen slow ops caused by peering alone. It usually happens when a disk cannot handle load under peering. We have, unfortunately, disks that show random latency spikes (firmware update pending). You can try to monitor OPS latencies for your drives when peering and look for something that sticks out. People on this list were reporting quite bad results for certain infamous NVMe brands. If you state your model numbers, someone else might recognize it.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: 서민우 <smw940219@xxxxxxxxx>
Sent: Thursday, May 16, 2024 7:39 AM
To: ceph-users@xxxxxxx
Subject:  Please discuss about Slow Peering

Env:
- OS: Ubuntu 20.04
- Ceph Version: Octopus 15.0.0.1
- OSD Disk: 2.9TB NVMe
- BlockStorage (Replication 3)

Symptom:
- Peering when OSD's node up is very slow. Peering speed varies from PG to
PG, and some PG may even take 10 seconds. But, there is no log for 10
seconds.
- I checked the effect of client VM's. Actually, Slow queries of mysql
occur at the same time.

There are Ceph OSD logs of both Best and Worst.

Best Peering Case (0.5 Seconds)
2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
state<Start>: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
state<Started/Primary/Peering>: Peering, affected_by_map, going to Reset
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
state<Start>: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting

Worst Peering Case (11.6 Seconds)
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting

*I wish to know about*
- Why some PG's take 10 seconds until Peering finishes.
- Why Ceph log is quiet during peering.
- Is this symptom intended in Ceph.

*And please give some advice,*
- Is there any way to improve peering speed?
- Or, Is there a way to not affect the client when peering occurs?

P.S
- I checked the symptoms in the following environments.
-> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux