Re: [PATCH 24/31] ext4: Convert ext4_mpage_readpages() to work on folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:

> On Thu, Jan 26, 2023 at 08:15:04PM -0800, Eric Biggers wrote:
>> On Thu, Jan 26, 2023 at 08:24:08PM +0000, Matthew Wilcox (Oracle) wrote:
>> >  int ext4_mpage_readpages(struct inode *inode,
>> > -		struct readahead_control *rac, struct page *page)
>> > +		struct readahead_control *rac, struct folio *folio)
>> >  {
>> >  	struct bio *bio = NULL;
>> >  	sector_t last_block_in_bio = 0;
>> > @@ -247,16 +247,15 @@ int ext4_mpage_readpages(struct inode *inode,
>> >  		int fully_mapped = 1;
>> >  		unsigned first_hole = blocks_per_page;
>> >
>> > -		if (rac) {
>> > -			page = readahead_page(rac);
>> > -			prefetchw(&page->flags);
>> > -		}
>> > +		if (rac)
>> > +			folio = readahead_folio(rac);
>> > +		prefetchw(&folio->flags);
>>
>> Unlike readahead_page(), readahead_folio() puts the folio immediately.  Is that
>> really safe?
>
> It's safe until we unlock the page.  The page cache holds a refcount,
> and truncation has to lock the page before it can remove it from the
> page cache.
>
> Putting the refcount in readahead_folio() is a transitional step; once
> all filesystems are converted to use readahead_folio(), I'll hoist the
> refcount put to the caller.  Having ->readahead() and ->read_folio()
> with different rules for who puts the folio is a long-standing mistake.
>
>> > @@ -299,11 +298,11 @@ int ext4_mpage_readpages(struct inode *inode,
>> >
>> >  				if (ext4_map_blocks(NULL, inode, &map, 0) < 0) {
>> >  				set_error_page:
>> > -					SetPageError(page);
>> > -					zero_user_segment(page, 0,
>> > -							  PAGE_SIZE);
>> > -					unlock_page(page);
>> > -					goto next_page;
>> > +					folio_set_error(folio);
>> > +					folio_zero_segment(folio, 0,
>> > +							  folio_size(folio));
>> > +					folio_unlock(folio);
>> > +					continue;
>>
>> This is 'continuing' the inner loop, not the outer loop as it should.
>
> Oops.  Will fix.  I didn't get any extra failures from xfstests
> with this bug, although I suspect I wasn't testing with block size <
> page size, which is probably needed to make a difference.

I am still reviewing the rest of the series. But just wanted to paste
this failure with generic/574 with 4k blocksize on x86 system.

The fix is the same which Eric pointed out.


[  208.818910] fsverity_msg: 3 callbacks suppressed
[  208.818927] fs-verity (loop7, inode 12): FILE CORRUPTED! pos=0, level=0, want_hash=sha256:5d55504690cf24b26f46d577f874d2d4c6
[  208.835984] ------------[ cut here ]------------
[  208.839047] WARNING: CPU: 2 PID: 2370 at fs/verity/verify.c:277 verify_data_blocks+0xc5/0x1b0
[  208.844648] Modules linked in:
[  208.846986] CPU: 2 PID: 2370 Comm: cat Not tainted 6.2.0-xfstests-13498-ga1825ad035c0 #29
[  208.852746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/4
[  208.860155] RIP: 0010:verify_data_blocks+0xc5/0x1b0
[  208.863491] Code: 89 e7 e8 8e 32 e0 ff 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 bf 00 00 00 49 e
[  208.875434] RSP: 0018:ffff8881b8867688 EFLAGS: 00010246
[  208.878903] RAX: 0110000000000110 RBX: 0000000000001000 RCX: ffffffff81cd8a92
[  208.883539] RDX: 1ffffd4000a69bb0 RSI: 0000000000000008 RDI: ffffea000534dd80
[  208.888246] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffea000534dd87
[  208.892932] R10: fffff94000a69bb0 R11: ffffffff86d90cb3 R12: ffffea000534dd80
[  208.897570] R13: ffff8881444381c8 R14: 0000000000000000 R15: ffff88810dd0cea8
[  208.901848] FS:  00007ffff7fb3740(0000) GS:ffff8883eb800000(0000) knlGS:0000000000000000
[  208.904643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  208.906858] CR2: 00007ffff7f91000 CR3: 00000001f9202006 CR4: 0000000000170ee0
[  208.909510] Call Trace:
[  208.910545]  <TASK>
[  208.911469]  fsverity_verify_blocks+0xc7/0x140
[  208.913364]  ext4_mpage_readpages+0x545/0xe50
[  208.915211]  ? __pfx_ext4_mpage_readpages+0x10/0x10
[  208.917051]  ? find_held_lock+0x2d/0x120
[  208.918753]  ? kvm_clock_read+0x14/0x30
[  208.920316]  ? kvm_sched_clock_read+0x9/0x20
[  208.922074]  ? local_clock+0xf/0xd0
[  208.923436]  ? __lock_release+0x480/0x940
[  208.925071]  ? __pfx___lock_release+0x10/0x10
[  208.926723]  read_pages+0x190/0xb60
[  208.928134]  ? folio_add_lru+0x334/0x630
[  208.929746]  ? lock_release+0xff/0x2c0
[  208.931190]  ? folio_add_lru+0x355/0x630
[  208.932904]  ? __pfx_read_pages+0x10/0x10
[  208.934450]  page_cache_ra_unbounded+0x2cc/0x510
[  208.936249]  filemap_get_pages+0x233/0x7c0
[  208.937851]  ? __pfx_filemap_get_pages+0x10/0x10
[  208.939674]  ? __lock_acquire+0x7e1/0x1120
[  208.941229]  filemap_read+0x2dd/0xa20
[  208.942763]  ? __pfx_filemap_read+0x10/0x10
[  208.944522]  ? do_anonymous_page+0x58b/0x12e0
[  208.946333]  ? do_raw_spin_unlock+0x14d/0x1f0
[  208.948279]  ? _raw_spin_unlock+0x2d/0x50
[  208.949899]  ? do_anonymous_page+0x58b/0x12e0
[  208.951631]  vfs_read+0x512/0x750
[  208.953018]  ? __pfx_vfs_read+0x10/0x10
[  208.954480]  ? local_clock+0xf/0xd0
[  208.955964]  ? __pfx___lock_release+0x10/0x10
[  208.957744]  ? __fget_light+0x51/0x230
[  208.959408]  ksys_read+0xfd/0x1d0
[  208.960719]  ? __pfx_ksys_read+0x10/0x10
[  208.962327]  ? syscall_enter_from_user_mode+0x21/0x50
[  208.964180]  do_syscall_64+0x3f/0x90
[  208.965732]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  208.967618] RIP: 0033:0x7ffff7d0ccf1
[  208.969181] Code: 31 c0 e9 b2 fe ff ff 50 48 8d 3d b2 0a 0b 00 e8 65 29 02 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d ed 18 0f 00 4
[  208.975395] RSP: 002b:00007fffffffccc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  208.978052] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007ffff7d0ccf1
[  208.980657] RDX: 0000000000020000 RSI: 00007ffff7f92000 RDI: 0000000000000003
[  208.983256] RBP: 00007ffff7f92000 R08: 00000000ffffffff R09: 0000000000000000
[  208.985874] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[  208.988480] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[  208.991004]  </TASK>
[  208.992250] irq event stamp: 6759
[  208.993616] hardirqs last  enabled at (6769): [<ffffffff81528362>] __up_console_sem+0x52/0x60
[  208.996681] hardirqs last disabled at (6780): [<ffffffff81528347>] __up_console_sem+0x37/0x60
[  208.999764] softirqs last  enabled at (6278): [<ffffffff8445c3d6>] __do_softirq+0x546/0x87f
[  209.002772] softirqs last disabled at (6273): [<ffffffff813d0d64>] irq_exit_rcu+0x124/0x1a0
[  209.005992] ---[ end trace 0000000000000000 ]---
[  209.007743] page:ffffea000534dd80 refcount:1 mapcount:0 mapping:ffff88814476db30 index:0x0 pfn:0x14d376
[  209.011119] memcg:ffff8881800f9000
[  209.012564] aops:ext4_da_aops ino:c dentry name:"file.fsv"
[  209.014614] flags: 0x110000000000110(error|lru|node=0|zone=2)
[  209.016839] raw: 0110000000000110 ffffea0005404388 ffffea0005411588 ffff88814476db30
[  209.019657] raw: 0000000000000000 0000000000000000 00000001ffffffff ffff8881800f9000
[  209.022464] page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
[  209.025086] ------------[ cut here ]------------
[  209.026790] kernel BUG at mm/filemap.c:1529!
[  209.028620] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[  209.030944] CPU: 2 PID: 2370 Comm: cat Tainted: G        W          6.2.0-xfstests-13498-ga1825ad035c0 #29
[  209.034067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/4
[  209.038883] RIP: 0010:folio_unlock+0x6a/0x80
[  209.040434] Code: ec 1c 00 f0 80 65 00 fe 78 06 5d c3 cc cc cc cc 48 89 ef 5d 31 f6 e9 15 f6 ff ff 48 c7 c6 a0 9c 93 84 48 0
[  209.048850] RSP: 0018:ffff8881b8867700 EFLAGS: 00010246
[  209.050702] RAX: 000000000000003f RBX: 0000000000000001 RCX: 0000000000000000
[  209.054327] RDX: 0000000000000000 RSI: ffffffff84cd7440 RDI: 0000000000000001
[  209.056733] RBP: ffffea000534dd80 R08: 0000000000000001 R09: ffff8881b886751f
[  209.060241] R10: ffffed103710cea3 R11: 0000000000000000 R12: 0000000000000000
[  209.062776] R13: 0000000000000001 R14: ffffea000534dd80 R15: dffffc0000000000
[  209.065462] FS:  00007ffff7fb3740(0000) GS:ffff8883eb800000(0000) knlGS:0000000000000000
[  209.068635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  209.070598] CR2: 00007ffff7f91000 CR3: 00000001f9202006 CR4: 0000000000170ee0
[  209.072954] Call Trace:
[  209.073922]  <TASK>
[  209.074887]  ext4_mpage_readpages+0x731/0xe50
[  209.076506]  ? __pfx_ext4_mpage_readpages+0x10/0x10
[  209.078372]  ? find_held_lock+0x2d/0x120
[  209.079776]  ? kvm_clock_read+0x14/0x30
[  209.081165]  ? kvm_sched_clock_read+0x9/0x20
[  209.082883]  ? local_clock+0xf/0xd0
[  209.084158]  ? __lock_release+0x480/0x940
[  209.085591]  ? __pfx___lock_release+0x10/0x10
[  209.087127]  read_pages+0x190/0xb60
[  209.088433]  ? folio_add_lru+0x334/0x630
[  209.089843]  ? lock_release+0xff/0x2c0
[  209.091188]  ? folio_add_lru+0x355/0x630
[  209.092574]  ? __pfx_read_pages+0x10/0x10
[  209.093991]  page_cache_ra_unbounded+0x2cc/0x510
[  209.095580]  filemap_get_pages+0x233/0x7c0
[  209.097019]  ? __pfx_filemap_get_pages+0x10/0x10
[  209.098607]  ? __lock_acquire+0x7e1/0x1120
[  209.100032]  filemap_read+0x2dd/0xa20
[  209.101357]  ? __pfx_filemap_read+0x10/0x10
[  209.102809]  ? do_anonymous_page+0x58b/0x12e0
[  209.104321]  ? do_raw_spin_unlock+0x14d/0x1f0
[  209.105853]  ? _raw_spin_unlock+0x2d/0x50
[  209.107254]  ? do_anonymous_page+0x58b/0x12e0
[  209.108770]  vfs_read+0x512/0x750
[  209.109997]  ? __pfx_vfs_read+0x10/0x10
[  209.111350]  ? local_clock+0xf/0xd0
[  209.112604]  ? __pfx___lock_release+0x10/0x10
[  209.114138]  ? __fget_light+0x51/0x230
[  209.115469]  ksys_read+0xfd/0x1d0
[  209.116676]  ? __pfx_ksys_read+0x10/0x10
[  209.118189]  ? syscall_enter_from_user_mode+0x21/0x50
[  209.119894]  do_syscall_64+0x3f/0x90
[  209.121201]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  209.122901] RIP: 0033:0x7ffff7d0ccf1
[  209.124159] Code: 31 c0 e9 b2 fe ff ff 50 48 8d 3d b2 0a 0b 00 e8 65 29 02 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d ed 18 0f 00 4
[  209.129836] RSP: 002b:00007fffffffccc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000

-ritesh



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux