On Thu, Mar 03, 2016 at 10:17:30AM -0500, Brian Foster wrote: > On Wed, Feb 24, 2016 at 09:20:11AM +0100, Christoph Hellwig wrote: > > This patch implements two closely related changes: First it embedds a > > bio the ioend structure so that we don't have to allocate one separately. > > Second it uses the block layer bio chaining mechanism to chain additional > > bios off this first one if needed instead of manually accouting for > > multiple bio completions in the ioend structure. Together this removes a > > memory allocation per ioend and greatly simplifies the ioend setup and > > I/O completion path. > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > --- > > fs/xfs/xfs_aops.c | 217 ++++++++++++++++++++--------------------------------- > > fs/xfs/xfs_aops.h | 15 ++-- > > fs/xfs/xfs_super.c | 26 ++----- > > 3 files changed, 93 insertions(+), 165 deletions(-) > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index fc4fed6..1ea4167 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c ... > > if (xfs_bio_add_buffer(ioend->io_bio, bh) != bh->b_size) { > > - xfs_submit_ioend_bio(wbc, ioend, ioend->io_bio); > > - ioend->io_bio = NULL; > > + /* > > + * No space left in the bio. > > + * > > + * Allocate a new one, and chain the old bio to the new one. > > + * Note that we have to do perform the chaining in this > > + * unintuitive order so that the bi_private linkage is set up > > + * in the right direction for the traversal in > > + * xfs_destroy_ioend(). > > + */ > > + new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES); > > + new->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); > > + new->bi_bdev = bh->b_bdev; > > + > > + bio_chain(ioend->io_bio, new); > > + bio_get(ioend->io_bio); /* for xfs_destroy_ioend */ > > + submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, > > + ioend->io_bio); > > + ioend->io_bio = new; > > I'm trying to make sure I grok how this works without knowing much about > the block layer. So we chain the current bio to the new one, the latter > becoming the parent, and submit the old one. It looks to me that this > could result in bio_endio() on the parent, which seems fishy... what am > I missing? IOW, is it safe to submit bios like this before the entire > chain is created? > Responding to my own question... Dave pointed out on irc yesterday that the bio chaining does some reference counting similar to the xfs_ioend to ensure that bi_end_io() of the parent is never called until the chain is complete. I took a closer look at the code and managed to convince myself that this should work fine. When the old bio is chained to the new, the new becomes the parent and the remaining I/O count is elevated. If the child bio happens to complete while the parent is still being built, bio_endio() on the child propagates an error to the parent and calls bio_remaining_done() on the parent, which drops the extra reference it held. The parent can still subsequently be chained itself to another bio or have the XFS bi_end_io() cb set. One thing I'm a bit suspicious about still is whether the error propagation is racy. For example, consider we've created two chained bios A and B, such that A is the parent and thus bio(io_remaining) for each is A(2) and B(1). Suppose bio A happens to complete first with an error. A->bi_error is set and bio_endio(A) is called, which IIUC basically just does A(2)->A(1). If bio B completes successfully, B->bi_error presumably remains set to 0 and bio_endio(B) is called. The latter checks that B->bi_end_io == bio_chain_endio, propagates B->bi_error to A->bi_error unconditionally and then walks up to the parent bio to drop its reference and finally call A->bi_end_io(). Doesn't this mean that we can potentially lose errors in the chain? I could easily still be missing something here... Brian > > goto retry; > > } > > > ... > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 59c9b7b..33aa638 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > ... > > @@ -1755,10 +1749,8 @@ xfs_init_zones(void) > > kmem_zone_destroy(xfs_bmap_free_item_zone); > > out_destroy_log_ticket_zone: > > kmem_zone_destroy(xfs_log_ticket_zone); > > - out_destroy_ioend_pool: > > - mempool_destroy(xfs_ioend_pool); > > - out_destroy_ioend_zone: > > - kmem_zone_destroy(xfs_ioend_zone); > > + out_free_ioend_bioset: > > + bioset_free(xfs_ioend_bioset); > > Space before tab ^. > > > out: > > return -ENOMEM; > > } > > @@ -1784,9 +1776,7 @@ xfs_destroy_zones(void) > > kmem_zone_destroy(xfs_btree_cur_zone); > > kmem_zone_destroy(xfs_bmap_free_item_zone); > > kmem_zone_destroy(xfs_log_ticket_zone); > > - mempool_destroy(xfs_ioend_pool); > > - kmem_zone_destroy(xfs_ioend_zone); > > - > > + bioset_free(xfs_ioend_bioset); > > Space before tab ^. > > Brian > > > } > > > > STATIC int __init > > -- > > 2.1.4 > > > > _______________________________________________ > > xfs mailing list > > xfs@xxxxxxxxxxx > > http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs