Re: [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 28, 2017 at 11:45:31AM -0600, Liu Bo wrote:
> On Sun, Aug 27, 2017 at 11:14:20AM -0600, Jens Axboe wrote:
> > On 08/25/2017 06:14 PM, Liu Bo wrote:
> > > While using blk-wbt, sometimes sync writes are blocked by background
> > > writes, for example,
> > > 
> > > a) a background write reaches the (background) limit returned by
> > > get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
> > > this case) and goes to sleep.
> > > 
> > > b) then a sync write gets queued and goes to sleep when finding that
> > > waitqueue_active() returns true and someone else is already on the
> > > waiting list.
> > > 
> > > Thus, the sync write will get its rq after the background write
> > > getting rq.
> > > 
> > > With this, only background writes will check waitqueue's status and
> > > sync writes will only be throttled by the (max) limit returned by
> > > get_limit().
> > 
> > Curious how you ran into this. Is this just speculation, or is it
> > observed behavior?
> >
> 
> I got that from a simple test,
> 
> xfs_io -f -c "falloc 0 4K" /mnt/btrfs/foobar
> xfs_io -f -c "falloc 0 5G" /mnt/btrfs/dummy
> xfs_io -f -c "pwrite -b 128M 0 5G" /mnt/btrfs/dummy &
> 
> off=0
> while true; do
>         date
> 	let $((off = off + 1))
>         xfs_io -s -c "pwrite $off 1"  /mnt/btrfs/foobar > /dev/null
>         sleep 1
> done
> 
> ---
> 
> So a lot of writeback is happening in background, and a write with
> O_SYNC is issued every 1 sec.
> 
> My ftrace observation shows that, write with O_SYNC is waiting in
> may_queue() because of the running writeback writes.
> 
>    kworker/u16:2-137   [003] ....  3104.193996: block_bio_queue: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194003: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194003: block_getrq: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194004: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194004: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194005: block_rq_insert: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194006: block_rq_issue: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194017: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] d.h.  3104.194454: block_rq_complete: 8,64 W () 8704208 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.194978: block_bio_queue: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194985: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194985: block_getrq: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194986: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194987: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194987: block_rq_insert: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194990: block_rq_issue: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195001: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195654: block_bio_queue: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195661: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.195662: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>                                                                    ^^^^^^^^^^^^^^^
> 								   (wb write puts itself on rqw->wait when reaching the limit)
> 								   
>    kworker/u16:2-137   [003] ....  3104.195662: io_schedule <-wbt_wait
>    kworker/u16:2-137   [003] ....  3104.195662: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.195663: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.195663: block_rq_insert: 8,64 W 1048576 () 8753360 + 2048 [kworker/u16:2]
>     kworker/3:1H-162   [003] ....  3104.195680: block_rq_issue: 8,64 W 1048576 () 8753360 + 2048 [kworker/3:1H]
>           <idle>-0     [003] d.h.  3104.200533: block_rq_complete: 8,64 W () 8706256 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.205473: block_rq_complete: 8,64 W () 8708304 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.211451: block_rq_complete: 8,64 W () 8710352 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.216163: block_rq_complete: 8,64 W () 8712400 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.222072: block_rq_complete: 8,64 W () 8714448 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.226931: block_rq_complete: 8,64 W () 8716496 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.232915: block_rq_complete: 8,64 W () 8718544 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.238851: block_rq_complete: 8,64 W () 8720592 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.243593: block_rq_complete: 8,64 W () 8722640 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.249667: block_rq_complete: 8,64 W () 8724688 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.254248: block_rq_complete: 8,64 W () 8726736 + 2048 [0]
>           xfs_io-1045  [007] ....  3104.254347: btrfs_sync_file: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 parent=256 datasync=0
>           xfs_io-1045  [007] ....  3104.254384: cow_file_range.isra.58: ino 257 start 0x0 disk_num_bytes 0x1000
>           xfs_io-1045  [007] ....  3104.254402: btrfs_ordered_extent_add: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 file_offset=0 start=12615680 len=4096 disk_len=4096 truncated_len=18446744073709551615 bytes_left=4096 flags= compress_type=0 refs=1
>           xfs_io-1045  [007] ....  3104.254441: block_bio_queue: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.254446: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254448: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254449: io_schedule <-wbt_wait
> 	                                                    ^^^^^^^^^^^^^^^^^^^
> 							    (write with O_SYNC waits for wb write,
> 							    while the sync limit is more than 24)
> 
>           <idle>-0     [003] d.h.  3104.260393: block_rq_complete: 8,64 W () 8728784 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.266310: block_rq_complete: 8,64 W () 8730832 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.271152: block_rq_complete: 8,64 W () 8732880 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.277141: block_rq_complete: 8,64 W () 8734928 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.281908: block_rq_complete: 8,64 W () 8736976 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.287744: block_rq_complete: 8,64 W () 8739024 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.292584: block_rq_complete: 8,64 W () 8741072 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.298541: block_rq_complete: 8,64 W () 8743120 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.303266: block_rq_complete: 8,64 W () 8745168 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.309273: block_rq_complete: 8,64 W () 8747216 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.315312: block_rq_complete: 8,64 W () 8749264 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.320060: block_rq_complete: 8,64 W () 8751312 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.326148: block_rq_complete: 8,64 W () 8753360 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.326534: may_queue.part.12: inflight 0 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.326537: block_getrq: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.326543: block_plug: [kworker/u16:2]
>           xfs_io-1045  [007] ....  3104.326546: may_queue.part.12: inflight 1 limit 96 wait ffffc90000ab7890
>           xfs_io-1045  [007] ....  3104.326548: block_getrq: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.326553: block_rq_insert: 8,64 WS 4096 () 24640 + 8 [xfs_io]
> 	                                        ^^^^^^^^^^^^^^^^^
> 						(wb write and write with O_SYNC got waked up one by one)
> 
> 
> > I don't think the patch makes sense, conceptually. If you end up in
> > may_queue(), you are (by definition) in a class of writes that
> > should be throttled. Skipping the wait line would be unfair.
> >
> 
> OK, got it.
> 
> > One modification would be to check the depth, as we assign
> > different allowances to different types of writes. If the sync
> > write isn't above it's limit, we should not sleep.
> >
> 
> Yeah, that seems better, so it'd be like
> 
> if ((rw & REQ_HIPRIO) &&
>     atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
>     return true;
> 
> if (waitqueue_active(&rqw->wait) &&
>     rqw->wait.head.next != &wait->entry)
> 	return false;
> 
> 
> What about kswapd?  It's also a high priority write, isn't it?

I think it would be fine without checking kswapd as it has a different
%rqw.

I'll send a updated one if the above code looks OK.

Thanks,

-liubo



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux