Re: [PATCH 2/3] xfs: go straight to real allocations for direct I/O COW writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 24, 2017 at 04:09:34PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 24, 2017 at 09:08:55PM +0100, Christoph Hellwig wrote:
> > On Tue, Jan 24, 2017 at 12:43:18PM -0500, Brian Foster wrote:
> > > Hmm, that's not what I'm seeing (not that it really matters, but I'm
> > > curious if I'm missing something):
> > 
> > Yeah, I can reproduce this on mainline.  Turns out the it was done
> > by the align call in xfs_bmap_btalloc that even my before run had
> > removed.
> > 
> > Took me some time to spin my head around this.
> > 
> > Btw, I think we have a nasty race in the current DIO code that might
> > expose stale data, but this is just the same kind of head spinning
> > exercise for now:
> > 
> > Thread 1 writes a range from B to c
> > 
> >                     B --------- C
> 
>           A --------- B --------- C
>                ^           ^
>                d           e
> 
> I'm assuming B-C has no shared blocks, d-B has shared blocks, and that
> both d & e are cowextsize boundaries.
> 
> > a little later thread 2 writes from A to B
> > 
> >         A --------- B
> > but the code preallocates beyond B into the range where thread
> > 1 has just written, but ->end_io hasn't been called yet.
> > But once ->end_io is called thread 2 has already allocated
> > up to the extent size hint into the write range of thread 1,
> > so the end_io handler will splice the unintialized blocks from
> > that preallocation back into the file right after B.
> 
> I think you're right about there being a dio race here.  I'm not sure
> what the solution here is -- certainly we could disregard the cowextsize
> hint, though that has a fragmentation cost that we already know about.
> 
> We could also change the dio write setup to extend the range that it
> checks for shared blocks up and down to the nearest cowextsize boundary
> and set up the delalloc reservations in the cow fork as necessary.  If
> our thread2 comes along then it'll find the reservations already set
> up for a cow so that we avoid the situation where B-C changes between
> iomap_begin and dio_write_end_io does the wrong thing.  That's more in
> the spirit of cowextsize since we'd promote future writes to CoW.
> However it's more wasteful of blocks since we have no idea if those
> future writes are ever actually going to happen, and it also adds more
> bmap records if we don't use all of the reservation.
> 
> Ugh, my head hurts, I'm going to go for a walk to ponder this some more,
> and to try to work out whether the buffered path is all right.

I think I came up with a better idea overnight: take advantage of the
unwritten bit.  Currently, all extents in the cow fork are either
delalloc or normal extents.  When we allocate the blocks to fill a cow
extent, we'll flag the extent as unwritten.  When we go to write the
data to disk, we convert the unwritten extent to a real extent, and the
post-write remap (since it only takes a file offset and length) will
be changed to remap only real, written extents into the data fork.

--D

> 
> --D
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux