Please discuss about Slow Peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Env:
- OS: Ubuntu 20.04
- Ceph Version: Octopus 15.0.0.1
- OSD Disk: 2.9TB NVMe
- BlockStorage (Replication 3)

Symptom:
- Peering when OSD's node up is very slow. Peering speed varies from PG to
PG, and some PG may even take 10 seconds. But, there is no log for 10
seconds.
- I checked the effect of client VM's. Actually, Slow queries of mysql
occur at the same time.

There are Ceph OSD logs of both Best and Worst.

Best Peering Case (0.5 Seconds)
2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
state<Start>: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
state<Started/Primary/Peering>: Peering, affected_by_map, going to Reset
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
state<Start>: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting

Worst Peering Case (11.6 Seconds)
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
state<Start>: transitioning to Stray
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting

*I wish to know about*
- Why some PG's take 10 seconds until Peering finishes.
- Why Ceph log is quiet during peering.
- Is this symptom intended in Ceph.

*And please give some advice,*
- Is there any way to improve peering speed?
- Or, Is there a way to not affect the client when peering occurs?

P.S
- I checked the symptoms in the following environments.
-> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux