On Tue, Feb 26, 2019 at 10:03:04AM +0800, Ming Lei wrote: > On Mon, Feb 25, 2019 at 9:26 PM Carlos Maiolino <cmaiolino@xxxxxxxxxx> wrote: > > > > Hi, > > > > On Sat, Feb 23, 2019 at 07:33:21AM +0800, Ming Lei wrote: > > > Hi Carlos, > > > > > > Cc block list given it is related with interface between fs and block layer. > > > > > > On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@xxxxxxxxxx> wrote: > > > > > > > > guard_bio_eod() can truncate a segment in bio to allow it to do IO on > > > > odd last sectors of a device. > > > > > > > > It already checks if the IO starts past EOD, but it does not consider > > > > the possibility of an IO request starting within device boundaries can > > > > contain more than one segment past EOD. > > > > > > > > In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will > > > > underflow bvec->bv_len. > > > > > > It can cause memory corruption even for < PAGE_SIZE, also it can be correct > > > to see > PAGE_SIZE truncated_bytes: > > > > > > - xfs is going to support big block size which may be 64k > > > - multi-page bvec patches have been in block tree, and it is planned to land > > > v5.1, so > PAGE_SIZE truncating may come because .bv_len can be much > > > bigger than PAGE_SIZE > > > > > > - suppose fs block size is 4k, bio sector is 1022 and size is 4k, and > > > disk size is > > > 1024, last bvec is (0, 1024), then 'truncated_bytes' will be 3k, so > > > the check can't > > > capture this case, and memory corruption still may be caused. > > > > > > > > > > > Fix this by checking if truncated_bytes is lower than PAGE_SIZE. > > > > > > > > This situation has been found on filesystems such as isofs and vfat, > > > > which doesn't check the device size before mount, if the device is > > > > smaller than the filesystem itself, a readahead on such filesystem, > > > > which spans EOD, can trigger this situation, leading a call to > > > > zero_user() with a wrong size possibly corrupting memory. > > > > > > > > I didn't see any crash, or didn't let the system run long enough to > > > > check if memory corruption will be hit somewhere, but adding > > > > instrumentation to guard_bio_end() to check truncated_bytes size, was > > > > enough to see the error. > > > > > > > > The following script can trigger the error. > > > > > > > > MNT=/mnt > > > > IMG=./DISK.img > > > > DEV=/dev/loop0 > > > > > > > > mkfs.vfat $IMG > > > > mount $IMG $MNT > > > > cp -R /etc $MNT &> /dev/null > > > > umount $MNT > > > > > > > > losetup -D > > > > > > > > losetup --find --show --sizelimit 16247280 $IMG > > > > mount $DEV $MNT > > > > > > > > find $MNT -type f -exec cat {} + >/dev/null > > > > > > BTW, which kernel is for this issue? Did you investigate why the bad bio > > > is made? > > > > The problem was reproduced, and the patch also wrote based on linux-next tree, > > tag next-20190207. > > > > The bad bio is made while attempting to read the end of a filesystem, which goes > > beyond EOD, as described in the patch description itself, it may happen for > > filesystems which doesn't check the device size before mounting. > > OK. > > > > > Despite the lack of validation of the block size from the filesystem part, the > > block layer should avoid and deny such invalid IO requests. The fact is, the > > block layer already validates such IO request via > > generic_make_request_checks()->bio_check_eod(), but, before the IO layer could > > reach there, guard_bio_eod() has already corrupted the bio. > > > > > > > > > > > > > Kudos to Eric Sandeen for coming up with the reproducer above > > > > > > > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx> > > > > --- > > > > > > > > P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the > > > > right fix, so comments are much appreciated. > > > > Thanks > > > > > > > > fs/buffer.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > > index 784de3dbcf28..32dc5cd2f6ba 100644 > > > > --- a/fs/buffer.c > > > > +++ b/fs/buffer.c > > > > @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio) > > > > /* Uhhuh. We've got a bio that straddles the device size! */ > > > > truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9); > > > > > > > > + /* > > > > + * The bio contains more than one segment which spans EOD, just return > > > > + * and let IO layer turn it into an EIO > > > > + */ > > > > + if (truncated_bytes > PAGE_SIZE) > > > > + return; > > > > + > > > > > > The correct check should be: > > > > > > + if (truncated_bytes > bvec->bv_len) > > > + return; > > > > Thanks, I'll test it. > > > > > > > > Also it should be helpful to warn on this bad bio, but it is fine to not do it > > > here cause block layer logs this bad bio too. > > > > > > BTW, this is just a workaround, not one real fix on the reported problem. > > > > A suggestion of what would be the real fix is much appreciated then. > > Seems this issue can't be reproduced in ISO, maybe it is related with > the specific truncated size. > > In theory, it is better to return failure during mounting because this fs image > is broken, or the fs code should check the bio during making the bio given > fs has the disk size info. However, maybe either one is hard to implement. I do agree the filesystem should the checking it, however, I do also believe the bio layer should still validate it. Notice though, this was not reproduced using any crafted filesystem image, all it needs is a filesystem which does not validate the underlying block device size, and the same block device was shrank. The filesystem does not necessarily need to be corrupted. > > So I think this patch is good, if the check is fixed. > Ok, I'll resubmit the patch with your suggestion. Thanks > Thanks, > Ming Lei -- Carlos