Re: Uninitialized extent races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 21-12-12 13:02:43, Ted Tso wrote:
> On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote:
> >   No, I'm speaking about merging currently uninitialized extents. I.e.
> > suppose someone does the following on a filesystem with dioread_nolock so
> > that writeback happens via unwritten extents:
> >   fd = open("file", O_RDWR);
> >   pwrite(fd, buf, 4096, 0);
> > 					flusher thread starts writing
> > 					we create uninitialized extent for
> > 					  range 0-4096
> >   fallocate(fd, 0, 4096, 4096);
> >     - we merge extents and now have just 1 uninitialized extent for range
> >       0-8192
> > 					ext4_convert_unwritten_extents() now
> > 					  has to split the extent to finish
> > 					  the IO.
> 
> Ah, I see.  Disabling the the merging that might take place as a
> result of the fallocate.  Yes, I agree that's a completely sane thing
> to do.
  OK, I'll write some patches.

> The alternate approach would be to add a flag in the extent status
> tree indicating that an unwritten conversion is pending, but that
> would add more complexity.
> 
> Hmmm.... do we need that complexity anyway?  What happens if we have a
> race between a punch (or truncate) and the flusher thread, so there is
> pending write.  There are two things that would be of concern.  (1)
> Will convert_unwritten_extents do the right thing if the extent in
> question has disappeared, and (2) what if the block gets reused for
> some other inode in the interim?
> 
> I _think_ we're OK in the case of (2), since we're not using FUA writes
> for anything other than the commit block, so there shouldn't be any way
> that a write for the new inode could complete before the pending write
> finishes up.  And (1) should be OK, although it may end up triggering a
> WARN_ON and a scarry ext4_msg() in ext4_convert_unwritten_extents().
> But it made me stop and think....
  It's actually simpler than that. We wait for any pending DIO using
inode_dio_wait() and i_mutex protects from new writes to be submitted. So
that takes care of one possibility. truncate_inode_pages() waits for
PageWriteback bit so that handles waiting for IO itself. After I change
ext4 to convert extents before clearing PageWriteback, this will take care
also of extent conversion. Now a call to ext4_flush_unwritten_io() in
ext4_ext_truncate() resolves the problems. It's called after invalidating
page cache so we know all the pending IO for the truncated / punched area
is finished, just a conversion may be still pending.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux