Re: Uninitialized extent races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote:
>   No, I'm speaking about merging currently uninitialized extents. I.e.
> suppose someone does the following on a filesystem with dioread_nolock so
> that writeback happens via unwritten extents:
>   fd = open("file", O_RDWR);
>   pwrite(fd, buf, 4096, 0);
> 					flusher thread starts writing
> 					we create uninitialized extent for
> 					  range 0-4096
>   fallocate(fd, 0, 4096, 4096);
>     - we merge extents and now have just 1 uninitialized extent for range
>       0-8192
> 					ext4_convert_unwritten_extents() now
> 					  has to split the extent to finish
> 					  the IO.

Ah, I see.  Disabling the the merging that might take place as a
result of the fallocate.  Yes, I agree that's a completely sane thing
to do.

The alternate approach would be to add a flag in the extent status
tree indicating that an unwritten conversion is pending, but that
would add more complexity.

Hmmm.... do we need that complexity anyway?  What happens if we have a
race between a punch (or truncate) and the flusher thread, so there is
pending write.  There are two things that would be of concern.  (1)
Will convert_unwritten_extents do the right thing if the extent in
question has disappeared, and (2) what if the block gets reused for
some other inode in the interim?

I _think_ we're OK in the case of (2), since we're not using FUA
writes for anything other than the commit block, so there shouldn't be
any way that a write for the new inode could complete before the
pending write finishes up.  And (1) should be OK, although it may end
up triggering a WARN_ON and a scarry ext4_msg() in
ext4_convert_unwritten_extents().   But it made me stop and think....

> And I regarding more merging, that could be done (obviously), just we might
> need to postpone that after writeback is finished (PageWriteback is
> cleared) because there extent estimates are not clear. And I need to know
> necessary number of extents well in advance to be able to reserve credits
> in the journal. OTOH maybe we could use jbd2_journal_extend() to get more
> credits if we need them for merging. And when that fails, bad luck but we
> can cope... Anyway, this is a different problem.

Yeah, using jbd2_journal_extend() was what I was thinking about doing
where we could do some opportunistic merging if there's room in the
journal to allow that.  But I agree that's a different problem....

	   	 	      	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux