On Fri, Dec 18, 2015 at 9:24 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > Hi Ilya, > > On Fri, Dec 18, 2015 at 11:46 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> >> On Fri, Dec 18, 2015 at 5:40 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> >> wrote: >> > Hi Ilya >> > >> > On Fri, Dec 18, 2015 at 6:50 AM, Ilya Dryomov <idryomov@xxxxxxxxx> >> > wrote: >> >> >> >> On Fri, Dec 18, 2015 at 10:55 AM, Alex Gorbachev >> >> <ag@xxxxxxxxxxxxxxxxxxx> >> >> wrote: >> >> > I hope this can help anyone who is running into the same issue as us >> >> > - >> >> > kernels 4.1.x appear to have terrible RBD sequential write >> >> > performance. >> >> > Kernels before and after are great. >> >> > >> >> > I tested with 4.1.6 and 4.1.15 on Ubuntu 14.04.3, ceph hammer 0.94.5 >> >> > - a >> >> > simple dd test yields this result: >> >> > >> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in >> >> > 1000+0 >> >> > records out 1048576000 bytes (1.0 GB) copied, 46.3618 s, 22.6 MB/s >> >> > >> >> > On 3.19 and 4.2.8, quite another story: >> >> > >> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in >> >> > 1000+0 >> >> > records out 1048576000 bytes (1.0 GB) copied, 2.18914 s, 479 MB/s >> >> >> >> This is due to an old regression in blk-mq. rbd was switched to blk-mq >> >> infrastructure in 4.0, the regression in blk-mq core was fixed in 4.2 >> >> by commit e6c4438ba7cb "blk-mq: fix plugging in blk_sq_make_request". >> >> It's outside of rbd and wasn't backported, so we are kind of stuck with >> >> it. >> > >> > >> > Thank you for answering that question, this was a huge puzzle for us. >> > So >> > the fix is 4.2, is the earliest stable 3.18? >> >> The problem was in blk-mq code. rbd started interfacing with it in >> 4.0, so anything before 4.0 wouldn't have this particular issue. > > > Thanks again - one last question - this would not affect the OSD nodes at > all, correct? It affects all devices which use blk-mq infrastructure, but only have a single hardware (or virtual) queue. The bug was basically that the queue in this case wasn't plugged, leaving little chance to merge any requests. With locally attached storage that's not the end of the world, but with rbd, which has to go over the network, you see this kind of performance drop. IIRC you still have to opt-in for scsi_mq, so if you are using the usual scsi drivers on your OSD nodes you shouldn't be affected. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com