>> output follows: >> #pvs -o pe_start /dev/rbd1p1 >> 1st PE >> 4.00m >> # cat /sys/block/rbd1/queue/minimum_io_size >> 4194304 >> # cat /sys/block/rbd1/queue/optimal_io_size >> 4194304 > > Well, the parameters are being set at least. Mike, is it possible that > having minimum_io_size set to 4m is causing some read amplification > in LVM, translating a small read into a complete fetch of the PE (or > somethinga long those lines)? > > Ugis, if your cluster is on the small side, it might be interesting to see > what requests the client is generated in the LVM and non-LVM case by > setting 'debug ms = 1' on the osds (e.g., ceph tell osd.* injectargs > '--debug-ms 1') and then looking at the osd_op messages that appear in > /var/log/ceph/ceph-osd*.log. It may be obvious that the IO pattern is > different. > Sage, here follows debug output. I am no pro in reading this, but seems read block size differ(or what is that number following ~ sign)? OSD.2 read with LVM: 2013-10-20 16:59:05.307159 7f95acfa5700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176566434 rbd_data.3ad974b0dc51.0000000000007cef [read 4083712~4096] ondisk = 0) v4 -- ?+0 0xdc35c00 con 0xd9e4840 2013-10-20 16:59:05.307655 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 5548 ==== osd_op(client.38069.1:176566435 rbd_data.3ad974b0dc51.0000000000007cef [read 4087808~4096] 4.5672f053 e6870) v4 ==== 177+0+0 (1554835253 0 0) 0x12593d80 con 0xd9e4840 2013-10-20 16:59:05.307824 7f95ac7a4700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176566435 rbd_data.3ad974b0dc51.0000000000007cef [read 4087808~4096] ondisk = 0) v4 -- ?+0 0xe24fc00 con 0xd9e4840 2013-10-20 16:59:05.308316 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 5549 ==== osd_op(client.38069.1:176566436 rbd_data.3ad974b0dc51.0000000000007cef [read 4091904~4096] 4.5672f053 e6870) v4 ==== 177+0+0 (3467296840 0 0) 0xe28f6c0 con 0xd9e4840 2013-10-20 16:59:05.308499 7f95acfa5700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176566436 rbd_data.3ad974b0dc51.0000000000007cef [read 4091904~4096] ondisk = 0) v4 -- ?+0 0xdc35a00 con 0xd9e4840 2013-10-20 16:59:05.308985 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 5550 ==== osd_op(client.38069.1:176566437 rbd_data.3ad974b0dc51.0000000000007cef [read 4096000~4096] 4.5672f053 e6870) v4 ==== 177+0+0 (3104591620 0 0) 0xe0b46c0 con 0xd9e4840 OSD.2 read without LVM 2013-10-20 17:03:13.730881 7f95ac7a4700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176708854 rb.0.967b.238e1f29.000000000071 [read 2359296~131072] ondisk = 0) v4 -- ?+0 0x1019d200 con 0xd9e4840 2013-10-20 17:03:13.731318 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 18232 ==== osd_op(client.38069.1:176708855 rb.0.967b.238e1f29.000000000071 [read 2490368~131072] 4.c0d1e4cb e6870) v4 ==== 170+0+0 (1987168552 0 0) 0x171a7480 con 0xd9e4840 2013-10-20 17:03:13.731664 7f95acfa5700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176708855 rb.0.967b.238e1f29.000000000071 [read 2490368~131072] ondisk = 0) v4 -- ?+0 0x12b81200 con 0xd9e4840 2013-10-20 17:03:13.733112 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 18233 ==== osd_op(client.38069.1:176708856 rb.0.967b.238e1f29.000000000071 [read 2621440~131072] 4.c0d1e4cb e6870) v4 ==== 170+0+0 (527551382 0 0) 0x12593d80 con 0xd9e4840 2013-10-20 17:03:13.733393 7f95ac7a4700 1 -- x.x.x.x:6804/1944 --> x.x.x.y:0/269199468 -- osd_op_reply(176708856 rb.0.967b.238e1f29.000000000071 [read 2621440~131072] ondisk = 0) v4 -- ?+0 0xeba9000 con 0xd9e4840 2013-10-20 17:03:13.733741 7f95b27b0700 1 -- x.x.x.x:6804/1944 <== client.38069 x.x.x.y:0/269199468 18234 ==== osd_op(client.38069.1:176708857 rb.0.967b.238e1f29.000000000071 [read 2752512~131072] 4.c0d1e4cb e6870) v4 ==== 170+0+0 (178955972 0 0) 0xe0b4d80 con 0xd9e4840 How to proceed with tuning read performance on LVM? Is there some chanage needed in code of ceph/LVM or my config needs to be tuned? If what is shown in logs means 4k read block in LVM case - then it seems I need to tell LVM(or xfs on top of LVM dictates read block side?) that io block should be rather 4m? Ugis -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html