Hi Jason, On 2023-03-31 at 12:32:26 -0300, Jason Gunthorpe wrote: > If batch->end is 0 then setting npfns[0] before computing the new value of > pfns will fail to adjust the pfn and result in various page accounting > corruptions. It should be ordered after. > > This seems to result in various kinds of page meta-data corruption related > failures: > > WARNING: CPU: 1 PID: 527 at mm/gup.c:75 try_grab_folio+0x503/0x740 > Modules linked in: > CPU: 1 PID: 527 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > RIP: 0010:try_grab_folio+0x503/0x740 > Code: e3 01 48 89 de e8 6d c1 dd ff 48 85 db 0f 84 7c fe ff ff e8 4f bf dd ff 49 8d 47 ff 48 89 45 d0 e9 73 fe ff ff e8 3d bf dd ff <0f> 0b 31 db e9 d0 fc ff ff e8 2f bf dd ff 48 8b 5d c8 31 ff 48 89 > RSP: 0018:ffffc90000f37908 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: 00000000fffffc02 RCX: ffffffff81504c26 > RDX: 0000000000000000 RSI: ffff88800d030000 RDI: 0000000000000002 > RBP: ffffc90000f37948 R08: 000000000003ca24 R09: 0000000000000008 > R10: 000000000003ca00 R11: 0000000000000023 R12: ffffea000035d540 > R13: 0000000000000001 R14: 0000000000000000 R15: ffffea000035d540 > FS: 00007fecbf659740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000200011c3 CR3: 000000000ef66006 CR4: 0000000000770ee0 > PKRU: 55555554 > Call Trace: > <TASK> > internal_get_user_pages_fast+0xd32/0x2200 > pin_user_pages_fast+0x65/0x90 > pfn_reader_user_pin+0x376/0x390 > pfn_reader_next+0x14a/0x7b0 > pfn_reader_first+0x140/0x1b0 > iopt_area_fill_domain+0x74/0x210 > iopt_table_add_domain+0x30e/0x6e0 > iommufd_device_selftest_attach+0x7f/0x140 > iommufd_test+0x10ff/0x16f0 > iommufd_fops_ioctl+0x206/0x330 > __x64_sys_ioctl+0x10e/0x160 > do_syscall_64+0x3b/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Cc: <stable@xxxxxxxxxxxxxxx> > Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") > Reported-by: Pengfei Xu <pengfei.xu@xxxxxxxxx> > Link: https://lore.kernel.org/r/ZBExkEW/On0ue68q@xxxxxxxxxxxxxxxx > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > --- > drivers/iommu/iommufd/pages.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c > index b11aace836542d..3c47846cc5efe8 100644 > --- a/drivers/iommu/iommufd/pages.c > +++ b/drivers/iommu/iommufd/pages.c > @@ -294,9 +294,9 @@ static void batch_clear_carry(struct pfn_batch *batch, unsigned int keep_pfns) > batch->npfns[batch->end - 1] < keep_pfns); > > batch->total_pfns = keep_pfns; > - batch->npfns[0] = keep_pfns; > batch->pfns[0] = batch->pfns[batch->end - 1] + > (batch->npfns[batch->end - 1] - keep_pfns); > + batch->npfns[0] = keep_pfns; > batch->end = 0; > } > > -- I tested the reproduced code in the kernel with all 3 fixed patches. Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230313_234302_try_grab_folio/repro.c This issue was gone and the issue was fixed. Thanks! BR. > 2.40.0 >