On Mon, Sep 24, 2012 at 01:09:04PM -0500, Mark Tinguely wrote: > > From bpm@xxxxxxx Mon Sep 24 12:11:59 2012 > > Date: Mon, 24 Sep 2012 12:11:59 -0500 > > From: Ben Myers <bpm@xxxxxxx> > > To: <tinguely@xxxxxxx> > > Subject: Re: [PATCH 0/3] xfs: allocation worker causes freelist buffer lock > > hang > > Cc: <xfs@xxxxxxxxxxx> > > > > Hi Mark, > > > > On Wed, Sep 19, 2012 at 11:31:33AM -0500, tinguely@xxxxxxx wrote: > > ... > > > I traced the callers of xfs_alloc_vextent(), noting transaction creation, > > > commits and cancels; I noted loops in the callers and which were marked > > > as metadata/userdata. I can send this document to reviewers. > > > > I'd like to see the doc of xfs_alloc_vextent callers and which of them loop. > > Can you post your doc to the list? > > > > Regards, > > Ben > > > Here are the Linux 3.6.x callers of xfs_alloc_vextent() stopping at the > transaction commit/cancel level. > > XFS_BMAPI_METADATA *not* being set implies user data. > > Userdata being set is the community designated indication that an allocate > worker is needed to prevent the overflow of the kernel stack. > > Calling xfs_alloc_vextent() several times with the same transaction can cause > a dead lock if a new allocation worker can not be allocated. I noted where the > loops occur. xfs_alloc_vextent() can call itself, those calls must be in the > same allocation worker. > > As a bonus, consolidating the loops into one worker actually gives a slight > performance advantage. Can you quantify it? > Sorry this is wider than 80 characters wide. > --- > xfs_bmap_btalloc(xfs_bmalloca_t) > ! xfs_bmap_alloc(xfs_bmalloca_t) > ! ! xfs_bmapi_allocate(xfs_bmalloca_t, ...) > ! ! ! xfs_bmapi_write(xfs_trans_t ...) loops over above > ! ! ! ! xfs_attr_rmtval_set(xfs_da_args_t) loops over bmapi_write (xfs_attr_set_int) **XFS_BMAPI_METADATA** > ! ! ! ! xfs_da_grow_inode_int(xfs_da_args_t) loops over bmapi_write **XFS_BMAPI_METADATA** > ! ! ! ! xfs_qm_dqalloc(xfs_trans_t ...) does a xfs_bmap_finish/cancel **XFS_BMAPI_METADATA** > ! ! ! ! xfs_iomap_write_direct(...) alloc trans, xfs_trans_commit/cancel > ! ! ! ! xfs_iomap_write_allocate(...) alloc trans, xfs_trans_commit/cancel safe loop > ! ! ! ! xfs_iomap_write_unwritten(..) alloc trans, xfs_trans_commit/cancel safe loop > ! ! ! ! xfs_growfs_rt_alloc(..) alloc trans, xfs_trans_commit/cancel safe loop > ! ! ! ! xfs_symlink(...) allocates trans does a xfs_trans_commit/cancel > ! ! ! ! xfs_alloc_file_space(...) alloc trans, xfs_trans_commit/cancel safe loop So the only data path callers though here are xfs_iomap_write_direct(), xfs_iomap_write_allocate() and xfs_iomap_write_unwritten() and xfs_alloc_file_space(). Everything else is metadata, so won't use > xfs_bmap_extents_to_btree(xfs_trans_t ...) <- set userdata to 0 (patch 3) > ! xfs_bmap_add_attrfork_extents(xfs_trans_t ...) > ! ! xfs_bmap_add_attrfork(...) allocates trans does a xfs_trans_commit/cancel > ! xfs_bmap_add_extent_delay_real(fs_bmalloca) > ! ! xfs_bmapi_allocate(xfs_bmalloca_t, ...) > ! ! ! <see above> > ! xfs_bmap_add_extent_unwritten_real(xfs_trans_t ...) > ! ! xfs_bmapi_convert_unwritten(xfs_bmalloca ...) > ! ! ! xfs_bmapi_write(xfs_trans ...) calls xfs_bmapi_convert_unwritten in loop XFS_BMAPI_METADATA > ! ! ! ! ... <see above> ..... > ! xfs_bmap_add_extent_hole_real(xfs_bmalloca ...) > ! ! xfs_bmapi_allocate(xfs_bmalloca_t, ...) > ! ! ! ... <see above> So it's bmbt modification that looks to be the problem here, right? > xfs_bmap_local_to_extents(xfs_trans_t ...) <- set userdata to 0 (patch 3) > ! xfs_bmap_add_attrfork_local(xfs_trans_t ..) > ! ! xfs_bmap_add_attrfork(...) trans alloc, bmap_finish trans_commit/cancel > ! xfs_bmapi_write(xfs_trans_t ...) XFS_BMAPI_METADATA > ! ! ... <see above> Same here. That's all I can see as problematic - maybe I read the output wrong and there's others? i.e. all other xfs_alloc_vextent() callers are either in metadata context (so don't use the workqueue) or commit the transaction directly after xfs_bmapi_write returns so will unlock the AGF buffer before calling into xfs_bmapi_write a second time. If these are the only loops, then patch 3 is all that is necessary to avoid the problem of blocking on workqueue resource while we are already on the workqueue, right? i.e. bmbt allocation is a metadata allocation, even though the context is a data allocation, and ensuring it is metadata means that the current transaction won't get blocked waiting for workqueue resources... What am I missing? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs