Re: xfs: pass total block res. as total xfs_bmapi_write() parameter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/15/2017, 01:24 PM, Nikolay Borisov wrote:
> From: Brian Foster <bfoster@xxxxxxxxxx>
> 
> The total field from struct xfs_alloc_arg is a bit of an unknown
> commodity. It is documented as the total block requirement for the
> transaction and is used in this manner from most call sites by virtue of
> passing the total block reservation of the transaction associated with
> an allocation. Several xfs_bmapi_write() callers pass hardcoded values
> of 0 or 1 for the total block requirement, which is a historical oddity
> without any clear reasoning.
> 
> The xfs_iomap_write_direct() caller, for example, passes 0 for the total
> block requirement. This has been determined to cause problems in the
> form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
> the block allocator. Specifically, the xfs_alloc_space_available()
> function incorrectly selects an AG that doesn't actually have sufficient
> space for the allocation. This occurs because the args.total field is 0
> and thus the remaining free space check on the AG doesn't actually
> consider the size of the allocation request. This locks the AGF buffer,
> the allocation attempt proceeds and ultimately fails (in
> xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
> AG. In turn, this can lead to incorrect AG locking order (if the
> allocator wraps around, attempting to lock AG 0 after acquiring AG N)
> and thus deadlock if racing with another operation. This problem has
> been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
> 
> To avoid this problem, replace the undocumented hardcoded total
> parameters from the iomap and utility callers to pass the block
> reservation used for the associated transaction. This is consistent with
> other xfs_bmapi_write() callers throughout XFS. The assumption is that
> the total field allows the selection of an AG that can handle the entire
> operation rather than simply the allocation/range being requested (e.g.,
> resulting btree splits, etc.). This addresses the aforementioned
> generic/299 hang by ensuring AG selection only occurs when the
> allocation can be satisfied by the AG.
> 
> Reported-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> Signed-off-by: Nikolay Borisov <nborisov@xxxxxxxx>
> ---
> 
> Here is the backport for 3.12 

Applied, thanks!

-- 
js
suse labs



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]