Hi, We are having slow osd's... A hot topic to search on it... I've tried to dive as deep as I can, but I need to know which debug setting will help me to dive even deeper... Okay: situation: - After expansion lot's of backfill operations are running spread over the osd's. - max_backfills is set to 1 (meaning only 1 read and 1 write operation might take place at the same time for the backfill operation) - Disks are monitored and manual compaction is performed as soon as load goes over 80% - Requests thru RGW goes okay for some objects, and less okay for others. When it goes fine requests are returned within a few ms, other requests vary from a few ms to >10s. - Ceph version 14.2.11 - Erasure coding configured, 6-3 - Backfill is running at an incredible terrible pace. I assume it's all related Okay, what have I done to navigate down: 1. Removed 1 RGW from the load balancer to increase logging and dive into it 2. Using s3cmd to request the object teaches me that the delay is between the request from the RGW to the OSD and the response. 3. Investigating the osd using iostat doesn't reveal an issue. Load is very low. 3. Next I increased logging on the osd (debug_osd 20/20) and launched a request (debug setting has been immediately reversed)... 4. First you see that it is starting with the block for getting all available shards... so far so good 5. Next we will see multiple blocks like: 2020-11-19 11:53:21.938 7fe90f79c700 15 osd.156 45341 *enqueue_op* 0x559f94d43ce0 prio 127 cost 0 latency 0.000032 epoch 45341 *MOSDECSubOpRead*(1.1e2s0 45341/45313 ECSubRead(tid=191, to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0}, subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]}, attrs_to_read=)) v3 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 45341 *dequeue_op* 0x559f94d43ce0 prio 127 cost 0 latency 0.000135 *MOSDECSubOpRead*(1.1e2s0 45341/45313 ECSubRead(tid=191, to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0}, subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]}, attrs_to_read=)) v3 pg pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] 2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 share_map osd.156 v2: 192.168.34.57:6892/1464289 45341 2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 should_share_map osd.156 v2:192.168.34.57:6892/1464289 45341 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] _handle_message: MOSDECSubOpRead(1.1e2s0 45341/45313 ECSubRead(tid=191, to_read={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=0,28672,0}, subchunks={1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head=[0,1]}, attrs_to_read=)) v3 2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] handle_sub_read read request=28672 r=28672 len=28672 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: Getting attr on 1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: not in cache 1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] get_hash_info: found on disk, size 28672 2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 pg_epoch: 45341 pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] handle_sub_read: Checking hash of 1:47c1b09f:::44a57332-bad8-47b4-85d6-6d34be960d5f.51962.196_CT%2f20201001%2f82078cfd-a111-4b58-8b79-88a860915c56.PNG:head 2020-11-19 11:53:21.938 7fe90678a700 20 osd.156 45341 share_map_peer 0x559f71ee3400 don't know epoch, doing nothing 2020-11-19 11:53:21.938 7fe90678a700 10 osd.156 45341 dequeue_op 0x559f94d43ce0 finish .... This goes on... The latency is caused between such blocks. E.g.: Last line of a block of logging: ... 2020-11-19 11:53:*22.150* 7fe90678a700 10 osd.156 45341 dequeue_op 0x559f96a5f600 finish 2020-11-19 11:53:*22.150* 7fe90678a700 20 osd.156 op_wq(2) _process empty q, waiting ... Next block of logging: 2020-11-19 11:53:*27.089* 7fe925251700 15 osd.156 45341 enqueue_op 0x559f8f9618c0 prio 127 cost 0 latency 0.000019 epoch 45341 *MOSDECSubOpReadReply*(1.1e2s0 45341/45313 *ECSubReadReply*(tid=191, attrs_read=0)) v2 2020-11-19 11:53:27.089 7fe925251700 20 osd.156 op_wq(2) _enqueue OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313 ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341) 2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process 1.1e2s0 to_process <> waiting <> waiting_peering {} 2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313 ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341) queued 2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process 1.1e2s0 to_process <OpQueueItem(1.1e2s0 PGOpItem(op=MOSDECSubOpReadReply(1.1e2s0 45341/45313 ECSubReadReply(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341)> waiting <> waiting_peering {} 2020-11-19 11:53:27.089 7fe90678a700 20 osd.156 op_wq(2) _process OpQueueItem(1.1e2s0 PGOpItem(op=*MOSDECSubOpReadReply*(1.1e2s0 45341/45313 *ECSubReadReply*(tid=191, attrs_read=0)) v2) prio 127 cost 0 e45341) pg 0x559f8b5e5000 2020-11-19 11:53:27.089 7fe90678a700 10 osd.156 45341 dequeue_op 0x559f8f9618c0 prio 127 cost 0 latency 0.000092 MOSDECSubOpReadReply(1.1e2s0 45341/45313 ECSubReadReply(tid=191, attrs_read=0)) v2 pg pg[1.1e2s0( v 45295'4373744 (34740'4370700,45295'4373744] local-lis/les=45313/45317 n=1842926 ec=5869/714 lis/c 45313/45313 les/c/f 45317/45322/0 45313/45313/45313) [156,64,146,256,98,67,46,31,100]p156(0) r=0 lpr=45313 crt=45295'4373744 lcod 0'0 mlcod 0'0 active+clean] Now my question is: What is triggering the enqueue_op? And what is triggering this operation ('cause it takes 5 seconds)? Which debug setting can help me find that out? Many thanks for the feedback, Kristof _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx