Slow OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We are having slow osd's... A hot topic to search on it... I've tried to
dive as deep as I can, but I need to know which debug setting will help me
to dive even deeper...

Okay: situation:
- After expansion lot's of backfill operations are running spread over the
osd's.
- max_backfills is set to 1 (meaning only 1 read and 1 write operation
might take place at the same time for the backfill operation)
- Disks are monitored and manual compaction is performed as soon as load
goes over 80%
- Requests thru RGW goes okay for some objects, and less okay for others.
When it goes fine requests are returned within a few ms, other requests
vary from a few ms to >10s.
- Ceph version 14.2.11
- Erasure coding configured, 6-3
- Backfill is running at an incredible terrible pace. I assume it's all
related


Okay, what have I done to navigate down:
1. Removed 1 RGW from the load balancer to increase logging and dive into it
2. Using s3cmd to request the object teaches me that the delay is between
the request from the RGW to the OSD and the response.
3. Investigating the osd using iostat doesn't reveal an issue. Load is very
low.
3. Next I increased logging on the osd (debug_osd 20/20) and launched a
request (debug setting has been immediately reversed)...
4. First you see that it is starting with the block for getting all
available shards... so far so good
5. Next  we will see multiple blocks like:

2020-11-19 11:53:21.938 7fe90f79c700 15 osd.156 45341 *enqueue_op*
0x559f94d43ce0 prio 127 cost 0 latency 0.000032 epoch 45341
*MOSDECSubOpRead*(1.1e2s0 45341/45313 ECSubRead(tid=191,
to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0},
subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]},
attrs_to_read=)) v3
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 45341 *dequeue_op*
0x559f94d43ce0 prio 127 cost 0 latency 0.000135 *MOSDECSubOpRead*(1.1e2s0
45341/45313 ECSubRead(tid=191,
to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0},
subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]},
attrs_to_read=)) v3 pg pg[1.1e2s0( v 45295'4373744
(34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926
ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313)
[156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744
lcod 0'0 mlcod 0'0 active+clean]
2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 share_map osd.156 v2:
192.168.34.57:6892/1464289 45341
2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 should_share_map
osd.156 v2:192.168.34.57:6892/1464289 45341
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] _handle_message:
MOSDECSubOpRead(1.1e2s0 45341/45313 ECSubRead(tid=191,
to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0},
subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]},
attrs_to_read=)) v3
2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] handle_sub_read read
request=28672 r=28672 len=28672
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: Getting
attr on
1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: not in
cache
1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: found on
disk, size 28672
2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 pg_epoch: 45341 pg[1.1e2s0(
v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317
n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0
45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313
crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] handle_sub_read:
Checking hash of
1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head
2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 share_map_peer
0x559f71ee3400 don't know epoch, doing nothing
2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 45341 dequeue_op
0x559f94d43ce0 finish
.... This goes on...

The latency is caused between such blocks. E.g.:
Last line of a block of logging:
...
2020-11-19 11:53:*22.150* 7fe90678a700 10 osd.156 45341 dequeue_op
0x559f96a5f600 finish
2020-11-19 11:53:*22.150* 7fe90678a700 20 osd.156 op_wq(2) _process empty
q, waiting
...
Next block of logging:
2020-11-19 11:53:*27.089* 7fe925251700 15 osd.156 45341 enqueue_op
0x559f8f9618c0 prio 127 cost 0 latency 0.000019 epoch 45341
*MOSDECSubOpReadReply*(1.1e2s0 45341/45313 *ECSubReadReply*(tid=191,
attrs_read=0)) v2
2020-11-19 11:53:27.089 7fe925251700 20 osd.156 op_wq(2) _enqueue
OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313
ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341)
2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process 1.1e2s0
to_process <> waiting <> waiting_peering {}
2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process
OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313
ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341) queued
2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process 1.1e2s0
to_process <OpQueueItem(1.1e2s0 PGOpItem(op=MOSDECSubOpReadReply(1.1e2s0
45341/45313 ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0
e45341)> waiting <> waiting_peering {}
2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process
OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313
*ECSubReadReply*(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341) pg
0x559f8b5e5000
2020-11-19 11:53:27.089 7fe90678a700 10 osd.156 45341 dequeue_op
0x559f8f9618c0 prio 127 cost 0 latency 0.000092
MOSDECSubOpReadReply(1.1e2s0 45341/45313 ECSubReadReply(tid=191,
attrs_read=0)) v2 pg pg[1.1e2s0( v 45295'4373744
(34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926
ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313)
[156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744
lcod 0'0 mlcod 0'0 active+clean]

Now my question is:
What is triggering the enqueue_op? And what is triggering this operation
('cause it takes 5 seconds)? Which debug setting can help me find that out?

Many thanks for the feedback,

Kristof
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux