Re: anyone has experience debugging NFD goes into D state with ceph RBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linux 5.4

On Fri, Jul 23, 2021 at 2:17 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Fri, Jul 23, 2021 at 6:02 AM Xiaolong Jiang <xiaolong302@xxxxxxxxx> wrote:
>
> Hi Ilya,
>
> We are using bionic and i am sure it's included bionic+. I think you are right, I didn't go back and check which linux version starts to have this patch.

wbt (writeback throttling) was added in 4.10.  But I was wondering
which kernel *you* were running, just as a data point.

>
> In terms of lot of small 4K IOs issued, I haven't figured out which code is doing this. One observation is fio finished transferring all bytes to the server. the file is cached into pagecache.

Ah, so it's not clear who to blame for the splitting here.  It would be
good to track that down because most users aren't aware of wbt_lat_usec
and wouldn't go this deep...

Thanks,

                Ilya

>
>
>
>
> On Thu, Jul 22, 2021 at 3:18 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>
>> On Thu, Jul 22, 2021 at 6:39 AM Xiaolong Jiang <xiaolong302@xxxxxxxxx> wrote:
>> >
>> > Hi Zizon,
>> >
>> > Thanks for responding. I spent a lot of time debugging this. and eventually I nailed it down. I believe this is a common problem and I wrote an article about this and sharing here:
>> >
>> > https://medium.com/@xiaolongjiang/ceph-rbd-performance-debug-journey-212f3b6f39aa
>>
>> Hi Xiaolong,
>>
>> Thanks for sharing!
>>
>> What kernel is this on?  You are saying ubuntu bionic+ in the article,
>> but it can mean anything from 4.15 in the original 18.04 release to 5.4
>> in 18.04.5 (HWE kernel from linux-generic-hwe-18.04 package).
>>
>> The reason I'm asking is that a quick search turns up recommendations
>> to disable writeback throttling on kernels older than 4.19, e.g. [1].
>>
>> Also, I'm a bit surprised to learn that writeback throttling resulted
>> in big 1M I/Os being split into small 4k I/Os.  IIRC the core idea was
>> to preserve big writes potentially limiting their number [2]:
>>
>>   ... The how is basically throttling background writeback. We still
>>   want to issue big writes from the vm side of things, so we get nice
>>   and big extents on the file system end. But we don't need to flood
>>   the device with THOUSANDS of requests for background writeback.  For
>>   most devices, we don't need a whole lot to get decent throughput.
>>
>> Did you happen to identify where exactly in the block layer the nfsd
>> writes were getting split?
>>
>> [1] https://open-cas.github.io/guide_advanced_options.html
>> [2] https://lore.kernel.org/lkml/1459350477-16404-1-git-send-email-axboe@xxxxxx/
>>
>>                 Ilya
>>
>> >
>> > On Sat, Jul 17, 2021 at 12:33 AM Zizon Qiu <zzdtsv@xxxxxxxxx> wrote:
>> >>
>> >> Maybe some connection issue between rbd to osd/mon.
>> >> As you had already pointed out, D means IO was issued/is in progress, but not yet ‘finished’,which could be no response from
>> >> remote or suck in retrying.
>> >>
>> >> You can cat /proc/stack or tools like strace to investigate more(why it sucks).
>> >>
>> >> On Sat, Jul 17, 2021 at 2:29 PM Xiaolong Jiang <xiaolong302@xxxxxxxxx> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are using ceph rbd. and we format as an XFS file system and then use NFS to export.
>> >>> When clients connect to NFS and send bytes to it, the nfsd  daemon goes to D state.
>> >>>
>> >>> My understanding is when IO is slowing/blocking,  the NFS daemon will be stuck in D  state  waiting for IO. but from the metrics, it seems fine and ceph -s doesn't show any slow ops.
>> >>>
>> >>> Anyone has similar experience and can share some hints/directions where to debug further?
>> >>>
>> >>> Thanks in advance.
>> >>>
>> >>> Xiaolong
>> >>> _______________________________________________
>> >>> Dev mailing list -- dev@xxxxxxx
>> >>> To unsubscribe send an email to dev-leave@xxxxxxx
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Xiaolong Jiang
>> >
>> > Senior Software Engineer at Netflix
>> > Columbia University
>> > _______________________________________________
>> > Dev mailing list -- dev@xxxxxxx
>> > To unsubscribe send an email to dev-leave@xxxxxxx
>
>
>
> --
> Best regards,
> Xiaolong Jiang
>
> Senior Software Engineer at Netflix
> Columbia University


--
Best regards,
Xiaolong Jiang

Senior Software Engineer at Netflix
Columbia University
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux