Re: [PATCH v28 00/15] xfs: Log Attribute Replay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-03-08 at 19:53 -0800, Darrick J. Wong wrote:
> On Tue, Mar 01, 2022 at 01:39:36PM -0700, Allison Henderson wrote:
> > 
> > On 2/28/22 7:29 PM, Darrick J. Wong wrote:
> > > On Mon, Feb 28, 2022 at 12:51:32PM -0700, Allison Henderson
> > > wrote:
> > > > Hi all,
> > > > 
> > > > This set is a subset of a larger series parent pointers.
> > > > Delayed attributes allow
> > > > attribute operations (set and remove) to be logged and
> > > > committed in the same
> > > > way that other delayed operations do. This allows more complex
> > > > operations (like
> > > > parent pointers) to be broken up into multiple smaller
> > > > transactions. To do
> > > > this, the existing attr operations must be modified to operate
> > > > as a delayed
> > > > operation.  This means that they cannot roll, commit, or finish
> > > > transactions.
> > > > Instead, they return -EAGAIN to allow the calling function to
> > > > handle the
> > > > transaction.  In this series, we focus on only the delayed
> > > > attribute portion.
> > > > We will introduce parent pointers in a later set.
> > > > 
> > > > The set as a whole is a bit much to digest at once, so I
> > > > usually send out the
> > > > smaller sub series to reduce reviewer burn out.  But the entire
> > > > extended series
> > > > is visible through the included github links.
> > > > 
> > > > Updates since v27:
> > > > xfs: don't commit the first deferred transaction without
> > > > intents
> > > >    Comment update
> > > 
> > > I applied this to 5.16-rc6, and turned on larp mode.  generic/476
> > > tripped over something, and this is what kasan had to say:
> > > 
> > > [  835.381655] run fstests generic/476 at 2022-02-28 18:22:04
> > > [  838.008485] XFS (sdb): Mounting V5 Filesystem
> > > [  838.035529] XFS (sdb): Ending clean mount
> > > [  838.040528] XFS (sdb): Quotacheck needed: Please wait.
> > > [  838.050866] XFS (sdb): Quotacheck: Done.
> > > [  838.092369] XFS (sdb): EXPERIMENTAL logged extended attributes
> > > feature added. Use at your own risk!
> > > [  838.092938] general protection fault, probably for non-
> > > canonical address 0xe012f573e6000046: 0000 [#1] PREEMPT SMP KASAN
> > > [  838.099085] KASAN: maybe wild-memory-access in range
> > > [0x0097cb9f30000230-0x0097cb9f30000237]
> > > [  838.101148] CPU: 2 PID: 4403 Comm: fsstress Not tainted
> > > 5.17.0-rc5-djwx #rc5 63f7e400b85b2245f2d4d3033e82ec8bc95c49fd
> > > [  838.103757] Hardware name: QEMU Standard PC (Q35 + ICH9,
> > > 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > [  838.105811] RIP: 0010:xlog_cil_commit+0x2f9/0x2800 [xfs]
> > > 
> > > 
> > > FWIW, gdb says this address is:
> > > 
> > > 0xffffffffa06e0739 is in xlog_cil_commit
> > > (fs/xfs/xfs_log_cil.c:237).
> > > 232
> > > 233                     /*
> > > 234                      * if we have no shadow buffer, or it is
> > > too small, we need to
> > > 235                      * reallocate it.
> > > 236                      */
> > > 237                     if (!lip->li_lv_shadow ||
> > > 238                         buf_size > lip->li_lv_shadow-
> > > >lv_size) {
> > > 239                             /*
> > > 240                              * We free and allocate here as a
> > > realloc would copy
> > > 241                              * unnecessary data. We don't use
> > > kvzalloc() for the
> > > 
> > > I don't know what this is about, but my guess is that we freed
> > > something
> > > we weren't supposed to...?
> > > 
> > > (An overnight fstests run with v27 and larp=0 ran fine,
> > > though...)
> > > 
> > > --D
> > 
> > Hmm, ok, I will dig into this then.  I dont see anything between
> > v27 and v28
> > that would have cause this though, so I'm thinking what ever it is
> > must by
> > intermittent.  I'll stick it in a loop and see if I can get a
> > recreate
> > today.  Thanks!
> 
> I think I've figured out two of the problems here --
> 
> The biggest problem is that xfs_attri_init isn't fully initializing
> the
> xattr log item structure, which is why the CIL would crash on my
> system
> when it tried to resize what it thought was the lv_shadow buffer
> attached to the log item.  I changed it to kmem_cache_zalloc and the
> problems went away; you might want to check if your kernel has some
> debugging kconfig feature enabled that auto-zeroes everything.
> 
Ah, ok.  When ever you get a moment, would you mind shooting me a copy
of your kernel config?  That might be an easy way to bring together
some of the differences between our testing environments.

> The other KASAN report has to do with the log iovec code -- it
> assumes
> that any buffer passed in has a size that is congruent with 4(?)
> bytes.
> This isn't necessarily true for the xattr name (and in principle also
> the value) buffer that we get from the VFS; if either is (say) 37
> bytes
> long, you'll get 37 bytes, and KASAN will expect you to stick to
> that.
> I think with the way the slab works this isn't a real memory
> corruption
> vector, but I wouldn't put it past slob or someone to actually pack
> things in tightly.
> 
> Also, I'm not sure what's going on with the name/value buffers in the
> relog function.  I patched it to try to track the buffer references,
> but
> maybe you have a better idea for why we allocate new name/value
> buffers
> for a log item relog?  I wasn't sure where those new buffers ever got
> freed, so I <cough> ran it over with a backhoe, and it shows. :/
> 
> The following shabby patch helps us to pass generic/475 and
> generic/624
> in LARP mode, though I have no idea if it's correct.  You might want
> to
> pay attention to the "XXX:" comments I added, since that's where I
> got
> stuck.
> 
> --D
> 

Ok, I'll dig through what you have here.  Most of what youre saying
makes sense, though you shouldnt have to go through all this leg work
with the iovec sizes.  That's supposed to be handled in
xfs_attri_item_format.  What ever arbitrary sized buffers are in the
xfs_attri_log_item are supposed to be adjusted there.

In most cases, the name/val buffers are owned by the calling function,
and in fact usually point back to the name/val pointers in the
xfs_da_args. So in that case, the larp code doesnt own that.  The idea
is that the calling function just passes the args struct into the
xfs_attr_(set/remove)_deferred which will then create a new
xfs_attr_item, and stores a refrence to the xfs_da_args there.  Once
the calling routine finishes up the transaction, it is the callers
responsibility clean up where ever the name/value pair comes from.

When the calling function invokes the defer finish routines, the defer
ops code creates the intent (via the xfs_defer_op_type) which calls
xfs_attr_create_intent. THis initializes and populates the
xfs_attri_log_item with the name/value pointers in the xfs_attr_item

Later the iop code creates the vectors using the xfs_attri_item_size
(referenced through the xfs_item_ops).  Once created, it then uses the
xfs_attri_item_format callback to appropriately populate those vectors.

So, it's complex, but mostly I'm just trying to adhere to what the
interfaces expect


NOW.... during a replay or relog this changes slightly.  These routines
need to recreate the log items, but unlike the calling function in the
above example, they dont span the transaction from creation to
completion.  THey just recreate one item and exit.  SO they cant just
use a xfs_da_args on the stack, nor will they be around to free
anything they allocate.  In this case, we simply allocate extra space
off the end of the xfs_attri_log_item, store the name/val buffers
there, and adjust the name/val pointers to point to this
location.  This is really the only reason why xfs_attri_init has an
extra buffsize parameter at all. So in this case, the name/val buffers
are released at the same time as the item through xfs_attri_item_free.

At least thats how it's supposed to work.  Clearly something isnt
following the theory, so I'll go through what you have here and maybe
shoot you back another patch.  Thanks for all your help here :-)

Allison

> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 41404d35c76c..7a00a43610d5 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -678,6 +678,9 @@ xfs_attr_set(
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	struct xfs_mount	*mp = dp->i_mount;
> +	void			*fakename = NULL, *fakeval = NULL;
> +	const void		*oldname = args->name;
> +	void			*oldval = args->value;
>  	struct xfs_trans_res	tres;
>  	bool			rsvd = (args->attr_filter &
> XFS_ATTR_ROOT);
>  	int			error, local;
> @@ -692,6 +695,34 @@ xfs_attr_set(
>  	if (error)
>  		return error;
>  
> +	/*
> +	 * XXX: the log iovec code requires that every buffer it's
> given has a
> +	 * size that's an even multiple of four.  If this isn't the
> case, create
> +	 * a shadow buffers so that we don't read past the end of the
> name and
> +	 * value buffers and trip KASAN.
> +	 */
> +	if (xfs_has_larp(mp) && (args->namelen & 3)) {
> +		fakename = kzalloc(roundup(args->namelen, 4),
> +				GFP_KERNEL | GFP_NOFS);
> +		if (!fakename)
> +			return -ENOMEM;
> +		memcpy(fakename, args->name, args->namelen);
> +		args->name = fakename;
> +	}
> +
> +	if (xfs_has_larp(mp) && (args->valuelen & 3)) {
> +		fakeval = kzalloc(roundup(args->valuelen, 4),
> +				GFP_KERNEL | GFP_NOFS);
> +		if (!fakeval) {
> +			if (fakename)
> +				kvfree(fakename);
> +			return -ENOMEM;
> +		}
> +		memcpy(fakeval, args->value, args->valuelen);
> +		args->value = fakeval;
> +	}
> +
> +	//trace_printk("start name 0x%px len 0x%x val 0x%px len 0x%x",
> args->name, args->namelen, args->value, args->valuelen);
>  	args->geo = mp->m_attr_geo;
>  	args->whichfork = XFS_ATTR_FORK;
>  	args->hashval = xfs_da_hashname(args->name, args->namelen);
> @@ -797,6 +828,13 @@ xfs_attr_set(
>  drop_incompat:
>  	if (delayed)
>  		xlog_drop_incompat_feat(mp->m_log);
> +	//trace_printk("end name 0x%px val 0x%px", args->name, args-
> >value);
> +	if (fakename)
> +		kvfree(fakename);
> +	if (fakeval)
> +		kvfree(fakeval);
> +	args->value = oldval;
> +	args->name = oldname;
>  	return error;
>  
>  out_trans_cancel:
> @@ -852,6 +890,7 @@ xfs_attr_item_init(
>  	new->xattri_op_flags = op_flags;
>  	new->xattri_da_args = args;
>  
> +	//trace_printk("name 0x%px val 0x%px", args->name, args-
> >value);
>  	*attr = new;
>  	return 0;
>  }
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 5aa7a764d95e..8e365583e265 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -43,6 +43,12 @@ STATIC void
>  xfs_attri_item_free(
>  	struct xfs_attri_log_item	*attrip)
>  {
> +	if (attrip->attri_own_buffers) {
> +		kvfree(attrip->attri_name);
> +		if (attrip->attri_value)
> +			kvfree(attrip->attri_value);
> +		attrip->attri_own_buffers = false;
> +	}
>  	kmem_free(attrip->attri_item.li_lv_shadow);
>  	kmem_free(attrip);
>  }
> @@ -114,6 +120,8 @@ xfs_attri_item_format(
>  	if (attrip->attri_value_len > 0)
>  		attrip->attri_format.alfi_size++;
>  
> +	//trace_printk("fmt 0x%px name 0x%px val 0x%px", &attrip-
> >attri_format, attrip->attri_name, attrip->attri_value);
> +
>  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>  			&attrip->attri_format,
>  			sizeof(struct xfs_attri_log_format));
> @@ -158,19 +166,39 @@ xfs_attri_item_release(
>  STATIC struct xfs_attri_log_item *
>  xfs_attri_init(
>  	struct xfs_mount		*mp,
> -	int				buffer_size)
> +	unsigned int			namelen,
> +	unsigned int			valuelen)
>  
>  {
>  	struct xfs_attri_log_item	*attrip;
> +	unsigned int			buffer_size;
>  
> -	if (buffer_size) {
> -		attrip = kmem_alloc(sizeof(struct xfs_attri_log_item) +
> -				    buffer_size, KM_NOFS);
> -		if (attrip == NULL)
> +	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS |
> __GFP_NOFAIL);
> +
> +	if (valuelen)
> +		ASSERT(namelen != 0);
> +
> +	if (namelen) {
> +		buffer_size = roundup(namelen, 4);
> +		attrip->attri_name = kzalloc(buffer_size,
> +				GFP_KERNEL | GFP_NOFS |
> __GFP_RETRY_MAYFAIL);
> +		if (!attrip->attri_name) {
> +			kmem_cache_free(xfs_attri_cache, attrip);
> +			return NULL;
> +		}
> +		attrip->attri_own_buffers = true;
> +	}
> +
> +	if (valuelen) {
> +		buffer_size = roundup(valuelen, 4);
> +		attrip->attri_value = kzalloc(buffer_size,
> +				GFP_KERNEL | GFP_NOFS
> |__GFP_RETRY_MAYFAIL);
> +		if (!attrip->attri_value) {
> +			if (attrip->attri_name)
> +				kvfree(attrip->attri_name);
> +			kmem_cache_free(xfs_attri_cache, attrip);
>  			return NULL;
> -	} else {
> -		attrip = kmem_cache_alloc(xfs_attri_cache,
> -					  GFP_NOFS | __GFP_NOFAIL);
> +		}
>  	}
>  
>  	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> @@ -343,6 +371,8 @@ xfs_attr_log_item(
>  	attrip->attri_value = attr->xattri_da_args->value;
>  	attrip->attri_name_len = attr->xattri_da_args->namelen;
>  	attrip->attri_value_len = attr->xattri_da_args->valuelen;
> +
> +	//trace_printk("lip 0x%px fmt 0x%px name 0x%px val 0x%px",
> &attrip->attri_item, &attrip->attri_format, attrip->attri_name,
> attrip->attri_value);
>  }
>  
>  /* Get an ATTRI. */
> @@ -362,7 +392,7 @@ xfs_attr_create_intent(
>  	if (!xfs_sb_version_haslogxattrs(&mp->m_sb))
>  		return NULL;
>  
> -	attrip = xfs_attri_init(mp, 0);
> +	attrip = xfs_attri_init(mp, 0, 0);
>  	if (attrip == NULL)
>  		return NULL;
>  
> @@ -434,6 +464,13 @@ xfs_attri_item_committed(
>  	 * committed so these fields are no longer accessed. Clear them
> out of
>  	 * caution since we're about to free the xfs_attr_item.
>  	 */
> +	//trace_printk("fmt 0x%px name 0x%px val 0x%px", &attrip-
> >attri_format, attrip->attri_name, attrip->attri_value);
> +	if (attrip->attri_own_buffers) {
> +		kvfree(attrip->attri_name);
> +		if (attrip->attri_value)
> +			kvfree(attrip->attri_value);
> +		attrip->attri_own_buffers = false;
> +	}
>  	attrip->attri_name = NULL;
>  	attrip->attri_value = NULL;
>  
> @@ -582,17 +619,15 @@ xfs_attri_item_relog(
>  	struct xfs_attri_log_item	*new_attrip;
>  	struct xfs_attri_log_format	*new_attrp;
>  	struct xfs_attri_log_format	*old_attrp;
> -	int				buffer_size;
>  
>  	old_attrip = ATTRI_ITEM(intent);
>  	old_attrp = &old_attrip->attri_format;
> -	buffer_size = old_attrp->alfi_value_len + old_attrp-
> >alfi_name_len;
>  
>  	tp->t_flags |= XFS_TRANS_DIRTY;
>  	attrdp = xfs_trans_get_attrd(tp, old_attrip);
>  	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>  
> -	new_attrip = xfs_attri_init(tp->t_mountp, buffer_size);
> +	new_attrip = xfs_attri_init(tp->t_mountp, 0, 0);
>  	new_attrp = &new_attrip->attri_format;
>  
>  	new_attrp->alfi_ino = old_attrp->alfi_ino;
> @@ -602,19 +637,30 @@ xfs_attri_item_relog(
>  	new_attrp->alfi_attr_flags = old_attrp->alfi_attr_flags;
>  
>  	new_attrip->attri_name_len = old_attrip->attri_name_len;
> -	new_attrip->attri_name = ((char *)new_attrip) +
> -				 sizeof(struct xfs_attri_log_item);
> -	memcpy(new_attrip->attri_name, old_attrip->attri_name,
> -		new_attrip->attri_name_len);
> +	new_attrip->attri_name = old_attrip->attri_name;
>  
>  	new_attrip->attri_value_len = old_attrip->attri_value_len;
> -	if (new_attrip->attri_value_len > 0) {
> -		new_attrip->attri_value = new_attrip->attri_name +
> -					  new_attrip->attri_name_len;
> +	if (new_attrip->attri_value_len > 0)
> +		new_attrip->attri_value = old_attrip->attri_value;
>  
> -		memcpy(new_attrip->attri_value, old_attrip-
> >attri_value,
> -		       new_attrip->attri_value_len);
> -	}
> +	/*
> +	 * XXX Err... so who owns the name/value buffer references,
> anyway?
> +	 * Is it safe to drop them during ->iop_commmitted?  Do we ever
> want to
> +	 * relog a log item after _committed, in which case we no
> longer have
> +	 * the buffer attached to the new log item.
> +	 *
> +	 * If we don't need the name and value after the first commit
> of the
> +	 * attrip then it's ok to drop the references in _committed...
> but that
> +	 * doesn't make sense, since most of the tx rolling is to
> prepare the
> +	 * xattr structure to receive the name and value.
> +	 *
> +	 * OTOH -- _committed is called to write things into the
> AIL.  By that
> +	 * time we are well past formatting things into buffers, right?
> +	 *
> +	 * <zzzz tired, brain dead>
> +	 */
> +	new_attrip->attri_own_buffers = old_attrip->attri_own_buffers;
> +	old_attrip->attri_own_buffers = false;
>  
>  	xfs_trans_add_item(tp, &new_attrip->attri_item);
>  	set_bit(XFS_LI_DIRTY, &new_attrip->attri_item.li_flags);
> @@ -633,10 +679,7 @@ xlog_recover_attri_commit_pass2(
>  	struct xfs_mount                *mp = log->l_mp;
>  	struct xfs_attri_log_item       *attrip;
>  	struct xfs_attri_log_format     *attri_formatp;
> -	char				*name = NULL;
> -	char				*value = NULL;
>  	int				region = 0;
> -	int				buffer_size;
>  
>  	attri_formatp = item->ri_buf[region].i_addr;
>  
> @@ -646,11 +689,9 @@ xlog_recover_attri_commit_pass2(
>  		return -EFSCORRUPTED;
>  	}
>  
> -	buffer_size = attri_formatp->alfi_name_len +
> -		      attri_formatp->alfi_value_len;
> -
>  	/* memory alloc failure will cause replay to abort */
> -	attrip = xfs_attri_init(mp, buffer_size);
> +	attrip = xfs_attri_init(mp, attri_formatp->alfi_name_len,
> +			attri_formatp->alfi_value_len);
>  	if (attrip == NULL)
>  		return -ENOMEM;
>  
> @@ -662,11 +703,10 @@ xlog_recover_attri_commit_pass2(
>  	attrip->attri_name_len = attri_formatp->alfi_name_len;
>  	attrip->attri_value_len = attri_formatp->alfi_value_len;
>  	region++;
> -	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> -	memcpy(name, item->ri_buf[region].i_addr, attrip-
> >attri_name_len);
> -	attrip->attri_name = name;
> +	memcpy(attrip->attri_name, item->ri_buf[region].i_addr,
> +			attrip->attri_name_len);
>  
> -	if (!xfs_attr_namecheck(name, attrip->attri_name_len)) {
> +	if (!xfs_attr_namecheck(attrip->attri_name, attrip-
> >attri_name_len)) {
>  		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		error = -EFSCORRUPTED;
>  		goto out;
> @@ -674,11 +714,8 @@ xlog_recover_attri_commit_pass2(
>  
>  	if (attrip->attri_value_len > 0) {
>  		region++;
> -		value = ((char *)attrip) + sizeof(struct
> xfs_attri_log_item) +
> -				attrip->attri_name_len;
> -		memcpy(value, item->ri_buf[region].i_addr,
> +		memcpy(attrip->attri_value, item-
> >ri_buf[region].i_addr,
>  				attrip->attri_value_len);
> -		attrip->attri_value = value;
>  	}
>  
>  	/*
> @@ -706,7 +743,7 @@ xfs_trans_get_attrd(struct xfs_trans		
> *tp,
>  
>  	ASSERT(tp != NULL);
>  
> -	attrdp = kmem_cache_alloc(xfs_attrd_cache, GFP_NOFS |
> __GFP_NOFAIL);
> +	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_NOFS |
> __GFP_NOFAIL);
>  
>  	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item,
> XFS_LI_ATTRD,
>  			  &xfs_attrd_item_ops);
> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> index c3b779f82adb..2b690b1d3228 100644
> --- a/fs/xfs/xfs_attr_item.h
> +++ b/fs/xfs/xfs_attr_item.h
> @@ -31,6 +31,9 @@ struct xfs_attri_log_item {
>  	void				*attri_name;
>  	void				*attri_value;
>  	struct xfs_attri_log_format	attri_format;
> +
> +	/* Do we actually own attri_name/value? */
> +	bool				attri_own_buffers;
>  };
>  
>  /*




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux