Re: [PATCH] xfs: reinit btree pointer on attr tree inactivation walk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 06, 2017 at 01:47:51PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 06, 2017 at 04:07:40PM -0400, Brian Foster wrote:
> > xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
> > associated blocks. xfs_attr3_node_inactive() recursively descends
> > from internal blocks to leaf blocks, caching block address values
> > along the way to revisit parent blocks, locate the next entry and
> > descend down that branch of the tree.
> > 
> > The code that attempts to reread the parent block is unsafe because
> > it assumes that the local xfs_da_node_entry pointer remains valid
> > after an xfs_trans_brelse() and re-read of the parent buffer. Under
> > heavy memory pressure, it is possible that the buffer has been
> > reclaimed and reallocated by the time the parent block is reread.
> > This means that 'btree' can point to an invalid memory address, lead
> > to a random/garbage value for child_fsb and cause the subsequent
> > read of the attr fork to go off the rails and return a NULL buffer
> > for an attr fork offset that is most likely not allocated.
> > 
> > Note that this problem can be manufactured by setting
> > XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
> > creating a file with a multi-level attr fork and removing it to
> > trigger inactivation.
> > 
> > To address this problem, reinit the node/btree pointers to the
> > parent buffer after it has been re-read. This ensures btree points
> > to a valid record and allows the walk to proceed.
> > 
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> 
> Looks ok,
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> /me wonders if this is a good enough reason to introduce a new errortag
> that turns xfs_buf_set_ref into a no-op and fills bp->b_addr with
> garbage prior to releasing the memory to weed out any other dangling
> pointers?
> 

I was starting to think about if/how we could test for this but hadn't
come up with anything concrete yet. An errortag that forces
_buf_set_ref() to set ->b_lru_ref to 0 of all buffer types might be
simple and effective enough to do the trick. I'll play around with that
next week.

Note that with my roughly equivalent hack in place, I didn't have to
overwrite the buffer content to observe the tree walk to go off the
rails. I haven't confirmed why, but perhaps the memory just happens to
be reused fairly quickly for the next buffer (we have a zone for
buffers, after all). I don't have poisoning or anything like that
enabled that I know of.

We may need a better way to know whether the test failed, however, since
the kernel now handles the !bpp case without crashing (as it should).
Perhaps we can assert that we don't hit a hole in this case, but that
has me wondering why we pass -2 to _da3_node_read() and skip over the
!child_bp case in xfs_attr3_node_inactive() in the first place. Is there
some legitimate case where we can hit a hole here that we don't know
about? I suppose adding an assert might be a more cautious first step
than switching it to -1 and potentially subjecting real users to
corruption errors. Thoughts?

> > ---
> > 
> > I suspect this is the cause of the NULL buf problem down in
> > xfs_attr_inactive(). I can manufacture an instance of that problem as
> > noted above. We have a customer who's hitting that problem and will
> > attempt to validate this fix, but there is no confirmation as of yet.
> > I'm posting this for review in the meantime because this seems like a
> > legit fix regardless of whether they are hitting this or something else.
> 
> Let me know what they report back.
> 

Will do, thanks.

Brian

> --D
> 
> > Brian
> > 
> >  fs/xfs/xfs_attr_inactive.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > index ebd66b1..e3a950e 100644
> > --- a/fs/xfs/xfs_attr_inactive.c
> > +++ b/fs/xfs/xfs_attr_inactive.c
> > @@ -302,6 +302,8 @@ xfs_attr3_node_inactive(
> >  						 &bp, XFS_ATTR_FORK);
> >  			if (error)
> >  				return error;
> > +			node = bp->b_addr;
> > +			btree = dp->d_ops->node_tree_p(node);
> >  			child_fsb = be32_to_cpu(btree[i + 1].before);
> >  			xfs_trans_brelse(*trans, bp);
> >  		}
> > -- 
> > 2.9.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux