Hello Dave,
Thanks for reviewing this.
On 8/21/20 4:41 AM, Dave Chinner wrote:
On Wed, Aug 19, 2020 at 03:58:41PM +0530, Anju T Sudhakar wrote:
From: Ritesh Harjani <riteshh@xxxxxxxxxxxxx>
__bio_try_merge_page() may return same_page = 1 and merged = 0.
This could happen when bio->bi_iter.bi_size + len > UINT_MAX.
Ummm, silly question, but exactly how are we getting a bio that
large in ->writepages getting built? Even with 64kB pages, that's a
bio with 2^16 pages attached to it. We shouldn't be building single
bios in writeback that large - what storage hardware is allowing
such huge bios to be built? (i.e. can you dump all the values in
/sys/block/<dev>/queue/* for that device for us?)
Please correct me here, but as I see, bio has only these two limits
which it checks for adding page to bio. It doesn't check for limits
of /sys/block/<dev>/queue/* no? I guess then it could be checked
by block layer below b4 submitting the bio?
113 static inline bool bio_full(struct bio *bio, unsigned len)
114 {
115 if (bio->bi_vcnt >= bio->bi_max_vecs)
116 return true;
117
118 if (bio->bi_iter.bi_size > UINT_MAX - len)
119 return true;
120
121 return false;
122 }
This issue was first observed while running a fio run on a system with
huge memory. But then here is an easy way we figured out to trigger the
issue almost everytime with loop device on my VM setup. I have provided
all the details on this below.
<cmds to trigger it fairly quickly>
===================================
echo 99999999 > /proc/sys/vm/dirtytime_expire_seconds
echo 99999999 > /proc/sys/vm/dirty_expire_centisecs
echo 90 > /proc/sys/vm/dirty_rati0
echo 90 > /proc/sys/vm/dirty_background_ratio
echo 0 > /proc/sys/vm/dirty_writeback_centisecs
sudo perf probe -s ~/host_shared/src/linux/ -a '__bio_try_merge_page:10
bio page page->index bio->bi_iter.bi_size len same_page[0]'
sudo perf record -e probe:__bio_try_merge_page_L10 -a --filter 'bi_size
> 0xff000000' sudo fio --rw=write --bs=1M --numjobs=1
--name=/mnt/testfile --size=24G --ioengine=libaio
# on running this 2nd time it gets hit everytime on my setup
sudo perf record -e probe:__bio_try_merge_page_L10 -a --filter 'bi_size
> 0xff000000' sudo fio --rw=write --bs=1M --numjobs=1
--name=/mnt/testfile --size=24G --ioengine=libaio
Perf o/p from above filter causing overflow
===========================================
<...>
fio 25194 [029] 70471.559084:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffff8000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559087:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffff9000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559090:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffffa000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559093:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffffb000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559095:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffffc000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559098:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffffd000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559101:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xffffe000 len=0x1000 same_page=0x1
fio 25194 [029] 70471.559104:
probe:__bio_try_merge_page_L10: (c000000000aa054c)
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d
bi_size=0xfffff000 len=0x1000 same_page=0x1
^^^^^^ (this could cause an overflow)
loop dev
=========
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop1 0 0 0 0 /mnt1/filefs 0 512
mount o/p
=========
/dev/loop1 on /mnt type xfs
(rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/sys/block/<dev>/queue/*
========================
setup:/run/perf$ cat /sys/block/loop1/queue/max_segments
128
setup:/run/perf$ cat /sys/block/loop1/queue/max_segment_size
65536
setup:/run/perf$ cat /sys/block/loop1/queue/max_hw_sectors_kb
1280
setup:/run/perf$ cat /sys/block/loop1/queue/logical_block_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/max_sectors_kb
1280
setup:/run/perf$ cat /sys/block/loop1/queue/hw_sector_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/discard_max_bytes
4294966784
setup:/run/perf$ cat /sys/block/loop1/queue/discard_max_hw_bytes
4294966784
setup:/run/perf$ cat /sys/block/loop1/queue/discard_zeroes_data
0
setup:/run/perf$ cat /sys/block/loop1/queue/discard_granularity
4096
setup:/run/perf$ cat /sys/block/loop1/queue/chunk_sectors
0
setup:/run/perf$ cat /sys/block/loop1/queue/max_discard_segments
1
setup:/run/perf$ cat /sys/block/loop1/queue/read_ahead_kb
128
setup:/run/perf$ cat /sys/block/loop1/queue/rotational
1
setup:/run/perf$ cat /sys/block/loop1/queue/physical_block_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/write_same_max_bytes
0
setup:/run/perf$ cat /sys/block/loop1/queue/write_zeroes_max_bytes
4294966784