Slow ops incident with 2 Bluestore OSD

Eneko Lacunza <elacunza@xxxxxxxxx> · Wed, 12 Jan 2022 11:20:00 +0100

Hi All,

Tonight we had an incident in our hyperconverged Proxmox cluster with 
Ceph v16.2.6 .

At 04:56:08 OSD.12 in node1 started to report slow ops.
At 04:56.09 OSD.15 in node4 started to report slow ops.

ID   CLASS  WEIGHT    TYPE NAME             STATUS  REWEIGHT PRI-AFF
 -1         16.98720  root default
-13          3.56488      host node1
  2    hdd   0.90919          osd.2             up   1.00000 1.00000
  3    hdd   0.90919          osd.3             up   1.00000 1.00000
 12    ssd   1.74649          osd.12            up   1.00000 1.00000
-15          4.47406      host node2
  0    hdd   0.90919          osd.0             up   1.00000 1.00000
  4    hdd   0.90918          osd.4             up   1.00000 1.00000
  9    hdd   0.90919          osd.9             up   1.00000 1.00000
 13    ssd   1.74649          osd.13            up   1.00000 1.00000
 -6          4.47408      host node3
  8    hdd   0.90919          osd.8             up   1.00000 1.00000
 10    hdd   0.90919          osd.10            up   1.00000 1.00000
 11    hdd   0.90919          osd.11            up   1.00000 1.00000
 14    ssd   1.74649          osd.14            up   1.00000 1.00000
 -5          4.47418      host node4
  1    hdd   0.90919          osd.1             up   1.00000 1.00000
  6    hdd   0.90919          osd.6             up   1.00000 1.00000
  7    hdd   0.90919          osd.7             up   1.00000 1.00000
 15    ssd   1.74660          osd.15            up   1.00000 1.00000

Ceph network is 10Gbps. We have one SSD replicated pool (size=3,min=2), 
and one HDD replicated pool (size=3,min=1). Autoscale is in warning 
mode. Both pools are about 25% occupied.

We can see in node1 network inbound bandwith usage that peaked to about 
83 Mb/s stable about that time.
We can see in node4 network outbound bandwith usage that peaked to about 
85 Mb/s stable about that time.

Issue was resolved restarting OSD.15 . Within about a minute, bandwith 
went back to normal and slow ops disappeared. Some VMs had to be 
rebooted, but so far so good :)

It's like OSD.15 was sending data to OSD.12 but it was refused? :-)

Both OSD have a SSD backing disk, wal/db/data is in the same disk. 
Format is bluestore for all SSD OSDs, and for some of the HDD.

No disk issues are reported in syslog nor dmesg for those two SSDs.

Logs for OSDs:

--- OSD.15 node4
2022-01-12T04:46:40.737+0100 7fd28344a700  4 rocksdb: 
[db_impl/db_impl.cc:849] ------- DUMPING STATS -------
2022-01-12T04:46:40.737+0100 7fd28344a700  4 rocksdb: 
[db_impl/db_impl.cc:851]
[... stripped stats ...]

2022-01-12T04:56:09.491+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.232963+0100 currently waiting for 
sub ops
2022-01-12T04:56:09.491+0100 7fd28e460700 -1 osd.15 44032 
get_health_metrics reporting 1 slow ops, oldest is 
osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:10.499+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.232963+0100 currently waiting for 
sub ops
2022-01-12T04:56:10.499+0100 7fd28e460700 -1 osd.15 44032 
get_health_metrics reporting 1 slow ops, oldest is 
osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:11.519+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.232963+0100 currently waiting for 
sub ops
2022-01-12T04:56:11.519+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312662.0:8092600 7.4d 
7:b36b6fe0:::rbd_data.9fe7186b8b4567.0000000000001c22:head [write 
2367488~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:40.907230+0100 currently waiting for 
sub ops
2022-01-12T04:56:11.519+0100 7fd28e460700 -1 osd.15 44032 
get_health_metrics reporting 2 slow ops, oldest is 
osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:12.515+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.232963+0100 currently waiting for 
sub ops
2022-01-12T04:56:12.515+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312662.0:8092600 7.4d 
7:b36b6fe0:::rbd_data.9fe7186b8b4567.0000000000001c22:head [write 
2367488~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:40.907230+0100 currently waiting for 
sub ops
2022-01-12T04:56:12.515+0100 7fd28e460700 -1 osd.15 44032 
get_health_metrics reporting 2 slow ops, oldest is 
osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:13.491+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976492 7.3b 
7:dcd73c34:::rbd_data.a001436b8b4567.0000000000000062:head [write 
1855488~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.232963+0100 currently waiting for 
sub ops
2022-01-12T04:56:13.491+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312662.0:8092600 7.4d 
7:b36b6fe0:::rbd_data.9fe7186b8b4567.0000000000001c22:head [write 
2367488~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:40.907230+0100 currently waiting for 
sub ops
2022-01-12T04:56:13.491+0100 7fd28e460700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312785.0:96260005 7.45 
7:a37c8881:::rbd_data.a002336b8b4567.0000000000000475:head [write 
3395584~53248 in=53248b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:43.436038+0100 currently waiting for 
sub ops
[...]

--- OSD.12 node1
2022-01-12T04:53:55.440+0100 7f1f79d13700  0 log_channel(cluster) log 
[DBG] : 7.71 scrub ok
2022-01-12T04:56:08.999+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612313472.0:156390670 7.56 
7:6ad61629:::rbd_data.6cd97d6b8b4567.00000000000001a0:head [write 
1015808~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:38.959449+0100 currently waiting for 
sub ops
2022-01-12T04:56:08.999+0100 7f1f8f53e700 -1 osd.12 44032 
get_health_metrics reporting 1 slow ops, oldest is 
osd_op(client.612313472.0:156390670 7.56 
7:6ad61629:::rbd_data.6cd97d6b8b4567.00000000000001a0:head [write 
1015808~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:09.999+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612313472.0:156390670 7.56 
7:6ad61629:::rbd_data.6cd97d6b8b4567.00000000000001a0:head [write 
1015808~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:38.959449+0100 currently waiting for 
sub ops
2022-01-12T04:56:09.999+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976478 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
1855488~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.231786+0100 currently waiting for 
sub ops
2022-01-12T04:56:09.999+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976558 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
425984~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.233680+0100 currently waiting for 
sub ops
2022-01-12T04:56:09.999+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976590 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
3506176~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.234269+0100 currently waiting for 
sub ops
2022-01-12T04:56:09.999+0100 7f1f8f53e700 -1 osd.12 44032 
get_health_metrics reporting 4 slow ops, oldest is 
osd_op(client.612313472.0:156390670 7.56 
7:6ad61629:::rbd_data.6cd97d6b8b4567.00000000000001a0:head [write 
1015808~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected e44032)
2022-01-12T04:56:10.951+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612313472.0:156390670 7.56 
7:6ad61629:::rbd_data.6cd97d6b8b4567.00000000000001a0:head [write 
1015808~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:38.959449+0100 currently waiting for 
sub ops
2022-01-12T04:56:10.951+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976478 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
1855488~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.231786+0100 currently waiting for 
sub ops
2022-01-12T04:56:10.951+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976558 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
425984~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.233680+0100 currently waiting for 
sub ops
2022-01-12T04:56:10.951+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612312938.0:57976590 7.73 
7:ce752f82:::rbd_data.a001436b8b4567.0000000000000064:head [write 
3506176~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:39.234269+0100 currently waiting for 
sub ops
2022-01-12T04:56:10.951+0100 7f1f8f53e700  0 log_channel(cluster) log 
[WRN] : slow request osd_op(client.612313472.0:156390838 7.77 
7:efe2ab8b:::rbd_data.6cd97d6b8b4567.0000000000000299:head [write 
3948544~16384 in=16384b] snapc 0=[] ondisk+write+known_if_redirected 
e44032) initiated 2022-01-12T04:55:40.502724+0100 currently waiting for 
sub ops
[...]

Now everything is fine and stable, but this incident affected our service.

Does this issue seem something known and/or can we provide additional info?

OSD.15 was recreated about 40 days ago because another issue (DB 
corruption, see another thread by me and Igor).

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx