Re: Kernel 4.1.x RBD very slow on writes

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 18 Dec 2015 21:51:25 +0100

On Fri, Dec 18, 2015 at 9:24 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
> Hi Ilya,
>
> On Fri, Dec 18, 2015 at 11:46 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>
>> On Fri, Dec 18, 2015 at 5:40 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
>> wrote:
>> > Hi Ilya
>> >
>> > On Fri, Dec 18, 2015 at 6:50 AM, Ilya Dryomov <idryomov@xxxxxxxxx>
>> > wrote:
>> >>
>> >> On Fri, Dec 18, 2015 at 10:55 AM, Alex Gorbachev
>> >> <ag@xxxxxxxxxxxxxxxxxxx>
>> >> wrote:
>> >> > I hope this can help anyone who is running into the same issue as us
>> >> > -
>> >> > kernels 4.1.x appear to have terrible RBD sequential write
>> >> > performance.
>> >> > Kernels before and after are great.
>> >> >
>> >> > I tested with 4.1.6 and 4.1.15 on Ubuntu 14.04.3, ceph hammer 0.94.5
>> >> > - a
>> >> > simple dd test yields this result:
>> >> >
>> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in
>> >> > 1000+0
>> >> > records out 1048576000 bytes (1.0 GB) copied, 46.3618 s, 22.6 MB/s
>> >> >
>> >> > On 3.19 and 4.2.8, quite another story:
>> >> >
>> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in
>> >> > 1000+0
>> >> > records out 1048576000 bytes (1.0 GB) copied, 2.18914 s, 479 MB/s
>> >>
>> >> This is due to an old regression in blk-mq.  rbd was switched to blk-mq
>> >> infrastructure in 4.0, the regression in blk-mq core was fixed in 4.2
>> >> by commit e6c4438ba7cb "blk-mq: fix plugging in blk_sq_make_request".
>> >> It's outside of rbd and wasn't backported, so we are kind of stuck with
>> >> it.
>> >
>> >
>> > Thank you for answering that question, this was a huge puzzle for us.
>> > So
>> > the fix is 4.2, is the earliest stable 3.18?
>>
>> The problem was in blk-mq code.  rbd started interfacing with it in
>> 4.0, so anything before 4.0 wouldn't have this particular issue.
>
>
> Thanks again - one last question - this would not affect the OSD nodes at
> all, correct?

It affects all devices which use blk-mq infrastructure, but only have
a single hardware (or virtual) queue.  The bug was basically that the
queue in this case wasn't plugged, leaving little chance to merge any
requests.  With locally attached storage that's not the end of the
world, but with rbd, which has to go over the network, you see this
kind of performance drop.

IIRC you still have to opt-in for scsi_mq, so if you are using the
usual scsi drivers on your OSD nodes you shouldn't be affected.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com