Re: [PATCH] stable: restart busy extent search after node removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/12/11 7:12 PM, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 05:03:38PM -0500, Eric Sandeen wrote:
>> Sending this for review prior to stable submission...
>>
>> A user on #xfs reported that a log replay was oopsing in
>> __rb_rotate_left() with a null pointer deref.
>>
>> I traced this down to the fact that in xfs_alloc_busy_insert(),
>> we erased a node with rb_erase() when the new node overlapped,
>> but left it specified as the parent node for the new insertion.
>>
>> So when we try to insert a new node with an erased node as
>> its parent, obviously things go very wrong.
>>
>> Upstream,
>> 97d3ac75e5e0ebf7ca38ae74cebd201c09b97ab2 xfs: exact busy extent tracking
>> actually fixed this, but as part of a much larger change.  Here's
>> the relevant bit:
>>
>>                 * We also need to restart the busy extent search from the
>>                 * tree root, because erasing the node can rearrange the
>>                 * tree topology.
>>                 */
>>                rb_erase(&busyp->rb_node, &pag->pagb_tree);
>>                busyp->length = 0;
>>                return false;
>>
>> We can do essentially the same thing to older codebases by restarting
>> the search after the erase.
>>
>> This should apply to .35 through .39, and was tested on .39
>> with the oopsing replay reproducer.
>>
>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
>> ---
>>
>> Index: linux-2.6/fs/xfs/xfs_alloc.c
>> ===================================================================
>> --- linux-2.6.orig/fs/xfs/xfs_alloc.c
>> +++ linux-2.6/fs/xfs/xfs_alloc.c
>> @@ -2664,6 +2664,12 @@ restart:
>>  					new->bno + new->length) -
>>  				min(busyp->bno, new->bno);
>>  		new->bno = min(busyp->bno, new->bno);
>> +		/*
>> +		 * Start the search over from the tree root, because
>> +		 * erasing the node can rearrange the tree topology.
>> +		 */
>> +		spin_unlock(&pag->pagb_lock);
>> +		goto restart;
>>  	} else
>>  		busyp = NULL;
> 
> Looks good.
> 
> I'm guessing that the only case I was able to hit during testing of
> this code originally was the "overlap with exact start block match",
> otherwise I would have seen this. I'm not sure that there really is
> much we can do to improve the test coverage of this code, though.
> Hell, just measuring our test coverage so we know what we aren't
> testing would probably be a good start. :/

Apparently the original oops, and the subsequent replay oopses,
were on a filesystem VERY busy with torrents.

Might be a testcase ;)

> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

Thanks,
-Eric


> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux