Re: [PATCH 5/5] xfs: Convert kmem_alloc() users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 24, 2019 at 02:02:56PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 22, 2019 at 02:30:48PM -0800, Darrick J. Wong wrote:
> > On Fri, Nov 22, 2019 at 07:57:56AM -0800, Darrick J. Wong wrote:
> > > On Wed, Nov 20, 2019 at 11:44:25AM +0100, Carlos Maiolino wrote:
> > > > Use kmalloc() directly.
> > 
> > <snip all this>
> > 
> > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > > index 5423171e0b7d..7bb53fbf32f6 100644
> > > > --- a/fs/xfs/xfs_log_recover.c
> > > > +++ b/fs/xfs/xfs_log_recover.c
> > > > @@ -1962,7 +1962,7 @@ xlog_recover_buffer_pass1(
> > > >  		}
> > > >  	}
> > > >  
> > > > -	bcp = kmem_alloc(sizeof(struct xfs_buf_cancel), 0);
> > > > +	bcp = kmalloc(sizeof(struct xfs_buf_cancel), GFP_KERNEL | __GFP_NOFAIL);
> > > >  	bcp->bc_blkno = buf_f->blf_blkno;
> > > >  	bcp->bc_len = buf_f->blf_len;
> > > >  	bcp->bc_refcount = 1;
> > > > @@ -2932,7 +2932,8 @@ xlog_recover_inode_pass2(
> > > >  	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
> > > >  		in_f = item->ri_buf[0].i_addr;
> > > >  	} else {
> > > > -		in_f = kmem_alloc(sizeof(struct xfs_inode_log_format), 0);
> > > > +		in_f = kmalloc(sizeof(struct xfs_inode_log_format),
> > > > +			       GFP_KERNEL | __GFP_NOFAIL);
> > > >  		need_free = 1;
> > > >  		error = xfs_inode_item_format_convert(&item->ri_buf[0], in_f);
> > > >  		if (error)
> > > > @@ -4271,7 +4272,7 @@ xlog_recover_add_to_trans(
> > > >  		return 0;
> > > >  	}
> > > >  
> > > > -	ptr = kmem_alloc(len, 0);
> > > > +	ptr = kmalloc(len, GFP_KERNEL | __GFP_NOFAIL);
> > > >  	memcpy(ptr, dp, len);
> > > >  	in_f = (struct xfs_inode_log_format *)ptr;
> > > 
> > > I noticed that kmalloc is generating warnings with generic/049 when 16k
> > > directories (-n size=16k) are enabled.  I /think/ this is because it's
> > > quite possible to write out an xlog_op_header with a length of more than
> > > a single page; log recovery will then try to allocate a huge memory
> > > buffer to recover the transaction; and so we try to do a huge NOFAIL
> > > allocation, which makes the mm unhappy.
> > > 
> > > The one thing I've noticed with this conversion series is that the flags
> > > translation isn't 100% 1-to-1.  Before, kmem_flags_convert didn't
> > > explicitly set __GFP_NOFAIL anywhere; we simply took the default
> > > behavior.  IIRC that means that small allocations actually /are/
> > > guaranteed to succeed, but multipage allocations certainly aren't.
> > > This seems to be one place where we could have asked for a lot of
> > > memory, failed to get it, and crashed.
> > > 
> > > Now that we explicitly set NOFAIL in all the places where we don't also
> > > check for a null return, I think we're just uncovering latent bugs
> > > lurking in the code base.  The kernel does actually fulfill the
> > > allocation request, but it's clearly not happy.
> > 
> > FWIW I ran with various dirsizes and options and it looks like this is
> > the only place where we screw this up... patches soon.
> 
> I rescind that statement -- there's enough places in this series where I
> can't 100% tell that a k{mzre}alloc call asks for a small enough amount
> of memory to qualify for __GFP_NOFAIL.
> 
> I really want this cleanup to start with a straightforward removal of
> the kmem.c wrappers without any behavior changes.  Only after that's
> done should we move on to things like adding __GFP_NOFAIL to allocations
> or deciding if/where we can substitute kfree for kvfree.
> 
> Munging them together means I can't easily tell if something is
> seriously broken here (but the WARN_ONs suggest this) and I'd forgotten
> that the merge window is opening the week of a major US holiday, so I
> choose to defer this series to 5.6.

Thanks, this is the best option IMO too. I'll think a bit about it and check
what should we do about it.

> 
> --D
> 
> > --D
> > 
> > > --D
> > > 
> > > Relevant snippet of dmesg; everything else was normal:
> > > 
> > >  XFS (sdd): Mounting V5 Filesystem
> > >  XFS (sdd): Starting recovery (logdev: internal)
> > >  ------------[ cut here ]------------
> > >  WARNING: CPU: 1 PID: 459342 at mm/page_alloc.c:3275 get_page_from_freelist+0x434/0x1660
> > >  Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac bfq ip_set nfnetlink ip6table_filter ip6_tables iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > >  CPU: 1 PID: 459342 Comm: mount Not tainted 5.4.0-rc3-djw #rc3
> > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> > >  RIP: 0010:get_page_from_freelist+0x434/0x1660
> > >  Code: e6 00 00 00 00 48 89 84 24 a0 00 00 00 0f 84 08 fd ff ff f7 84 24 c0 00 00 00 00 80 00 00 74 0c 83 bc 24 84 00 00 00 01 76 02 <0f> 0b 49 8d 87 10 05 00 00 48 89 c7 48 89 84 24 88 00 00 00 e8 03
> > >  RSP: 0018:ffffc900035d3918 EFLAGS: 00010202
> > >  RAX: ffff88803fffb680 RBX: 0000000000002968 RCX: ffffea0000c8e108
> > >  RDX: ffff88803fffbba8 RSI: ffff88803fffb870 RDI: 0000000000000000
> > >  RBP: 0000000000000002 R08: 0000000000000201 R09: 000000000002ff81
> > >  R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > >  R13: 0000000000048cc0 R14: 0000000000000001 R15: ffff88803fffb680
> > >  FS:  00007fcfdf89a080(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
> > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >  CR2: 000055b202ec48c8 CR3: 000000003bfdc005 CR4: 00000000001606a0
> > >  Call Trace:
> > >   ? kvm_clock_read+0x14/0x30
> > >   __alloc_pages_nodemask+0x172/0x3a0
> > >   kmalloc_order+0x18/0x80
> > >   kmalloc_order_trace+0x1d/0x130
> > >   xlog_recover_add_to_trans+0x4b/0x340 [xfs]
> > >   xlog_recovery_process_trans+0xe9/0xf0 [xfs]
> > >   xlog_recover_process_data+0x9e/0x1f0 [xfs]
> > >   xlog_do_recovery_pass+0x3a9/0x7c0 [xfs]
> > >   xlog_do_log_recovery+0x72/0x150 [xfs]
> > >   xlog_do_recover+0x43/0x2a0 [xfs]
> > >   xlog_recover+0xdf/0x170 [xfs]
> > >   xfs_log_mount+0x2e3/0x300 [xfs]
> > >   xfs_mountfs+0x4e7/0x9f0 [xfs]
> > >   xfs_fc_fill_super+0x2f8/0x520 [xfs]
> > >   ? xfs_fs_destroy_inode+0x4f0/0x4f0 [xfs]
> > >   get_tree_bdev+0x198/0x270
> > >   vfs_get_tree+0x23/0xb0
> > >   do_mount+0x87e/0xa20
> > >   ksys_mount+0xb6/0xd0
> > >   __x64_sys_mount+0x21/0x30
> > >   do_syscall_64+0x50/0x180
> > >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >  RIP: 0033:0x7fcfdf15d3ca
> > >  Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89 01 48
> > >  RSP: 002b:00007fff0af10a58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> > >  RAX: ffffffffffffffda RBX: 000055b202ec1970 RCX: 00007fcfdf15d3ca
> > >  RDX: 000055b202ec1be0 RSI: 000055b202ec1c20 RDI: 000055b202ec1c00
> > >  RBP: 0000000000000000 R08: 000055b202ec1b80 R09: 0000000000000000
> > >  R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b202ec1c00
> > >  R13: 000055b202ec1be0 R14: 0000000000000000 R15: 00007fcfdf67e8a4
> > >  irq event stamp: 18398
> > >  hardirqs last  enabled at (18397): [<ffffffff8123738f>] __slab_alloc.isra.83+0x6f/0x80
> > >  hardirqs last disabled at (18398): [<ffffffff81001d8a>] trace_hardirqs_off_thunk+0x1a/0x20
> > >  softirqs last  enabled at (18158): [<ffffffff81a003af>] __do_softirq+0x3af/0x4a4
> > >  softirqs last disabled at (18149): [<ffffffff8106528c>] irq_exit+0xbc/0xe0
> > >  ---[ end trace 3669c914fa8ccac6 ]---
> > > 
> > > AFAICT this is because inode buffers are 32K on this system
> > > 
> > > >  
> > > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > > index a2664afa10c3..2993af4a9935 100644
> > > > --- a/fs/xfs/xfs_qm.c
> > > > +++ b/fs/xfs/xfs_qm.c
> > > > @@ -988,7 +988,8 @@ xfs_qm_reset_dqcounts_buf(
> > > >  	if (qip->i_d.di_nblocks == 0)
> > > >  		return 0;
> > > >  
> > > > -	map = kmem_alloc(XFS_DQITER_MAP_SIZE * sizeof(*map), 0);
> > > > +	map = kmalloc(XFS_DQITER_MAP_SIZE * sizeof(*map),
> > > > +		      GFP_KERNEL | __GFP_NOFAIL);
> > > >  
> > > >  	lblkno = 0;
> > > >  	maxlblkcnt = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > > > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> > > > index 7f03b4ab3452..dfd419d402ea 100644
> > > > --- a/fs/xfs/xfs_rtalloc.c
> > > > +++ b/fs/xfs/xfs_rtalloc.c
> > > > @@ -962,7 +962,7 @@ xfs_growfs_rt(
> > > >  	/*
> > > >  	 * Allocate a new (fake) mount/sb.
> > > >  	 */
> > > > -	nmp = kmem_alloc(sizeof(*nmp), 0);
> > > > +	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL | __GFP_NOFAIL);
> > > >  	/*
> > > >  	 * Loop over the bitmap blocks.
> > > >  	 * We will do everything one bitmap block at a time.
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index cc1933dc652f..eee831681e9c 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1739,7 +1739,7 @@ static int xfs_init_fs_context(
> > > >  {
> > > >  	struct xfs_mount	*mp;
> > > >  
> > > > -	mp = kmem_alloc(sizeof(struct xfs_mount), KM_ZERO);
> > > > +	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
> > > >  	if (!mp)
> > > >  		return -ENOMEM;
> > > >  
> > > > -- 
> > > > 2.23.0
> > > > 
> 

-- 
Carlos





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux