On Wed 29-12-21 01:13:20, Matthew Wilcox wrote: > On Tue, Dec 28, 2021 at 10:00:53PM +0100, August Wikerfors wrote: > > (resending from gmail due to bounce with outlook) > > > > Hi, I ran into a bug with a very similar call trace, also when copying files > > with rsync from a filesystem mounted using ntfs3. I was able to reproduce it > > on both the default Arch Linux kernel (5.15.11-arch2-1) and on mainline > > 5.16-rc7. > > Hi August! This is very helpful; thank you for putting in the work to > figure this out. I am still a little baffled: > > > [ 486.361177] RIP: 0010:0xffffff8306d925ff > > [ 486.361192] Code: Unable to access opcode bytes at RIP 0xffffff8306d925d5. > > [ 486.361214] RSP: 0018:ffffaa9ec0f8fb37 EFLAGS: 00010246 > > [ 486.361232] RAX: 0000000000000000 RBX: 00000000000002ab RCX: 0000000000000000 > > [ 486.361255] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 486.361279] RBP: ffaa9ec0f8fbf800 R08: 0000000000000000 R09: 0000000000000000 > > [ 486.361302] R10: 0000000000000000 R11: 0000000000000000 R12: ff99687f5746e000 > > [ 486.361324] R13: 00000001112ccaff R14: fffcbb8097368000 R15: 00000000000001ff > > [ 486.361349] ? page_cache_ra_unbounded+0x1c5/0x250 > > [ 486.361369] ? filemap_get_pages+0x117/0x730 > > [ 486.361386] ? make_kuid+0xf/0x20 > > [ 486.361401] ? generic_permission+0x27/0x210 > > [ 486.361419] ? walk_component+0x11d/0x1c0 > > [ 486.361435] ? filemap_read+0xb9/0x360 > > [ 486.361451] ? new_sync_read+0x159/0x1f0 > > [ 486.361467] ? vfs_read+0xff/0x1a0 > > [ 486.361489] ? ksys_read+0x67/0xf0 > > [ 486.361503] ? do_syscall_64+0x5c/0x90 > > > > $ scripts/faddr2line vmlinux.5.15.11-arch2-1 page_cache_ra_unbounded+0x1c5/0x250 > > page_cache_ra_unbounded+0x1c5/0x250: > > filemap_invalidate_unlock_shared at include/linux/fs.h:853 > > (inlined by) page_cache_ra_unbounded at mm/readahead.c:240 > > So ... Jan added this code in commit 730633f0b7f9, but I don't see how > it could be buggy: I don't think the problem is with my code. The address page_cache_ra_unbounded+0x1c5/0x250 is from the stack which means it is a return address for the function that's currently executing or just to be called - presumably from read_pages(). And note that we crashed because we tried to call / jump to invalid address. So most likely aops->readpage(), aops->readahead(), or aops->readpages() was the bogus address 0xffffff8306d925d5. How it got there I don't know but I'd closely look into the ntfs3 driver... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR