On Fri 21-12-12 13:02:43, Ted Tso wrote: > On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote: > > No, I'm speaking about merging currently uninitialized extents. I.e. > > suppose someone does the following on a filesystem with dioread_nolock so > > that writeback happens via unwritten extents: > > fd = open("file", O_RDWR); > > pwrite(fd, buf, 4096, 0); > > flusher thread starts writing > > we create uninitialized extent for > > range 0-4096 > > fallocate(fd, 0, 4096, 4096); > > - we merge extents and now have just 1 uninitialized extent for range > > 0-8192 > > ext4_convert_unwritten_extents() now > > has to split the extent to finish > > the IO. > > Ah, I see. Disabling the the merging that might take place as a > result of the fallocate. Yes, I agree that's a completely sane thing > to do. OK, I'll write some patches. > The alternate approach would be to add a flag in the extent status > tree indicating that an unwritten conversion is pending, but that > would add more complexity. > > Hmmm.... do we need that complexity anyway? What happens if we have a > race between a punch (or truncate) and the flusher thread, so there is > pending write. There are two things that would be of concern. (1) > Will convert_unwritten_extents do the right thing if the extent in > question has disappeared, and (2) what if the block gets reused for > some other inode in the interim? > > I _think_ we're OK in the case of (2), since we're not using FUA writes > for anything other than the commit block, so there shouldn't be any way > that a write for the new inode could complete before the pending write > finishes up. And (1) should be OK, although it may end up triggering a > WARN_ON and a scarry ext4_msg() in ext4_convert_unwritten_extents(). > But it made me stop and think.... It's actually simpler than that. We wait for any pending DIO using inode_dio_wait() and i_mutex protects from new writes to be submitted. So that takes care of one possibility. truncate_inode_pages() waits for PageWriteback bit so that handles waiting for IO itself. After I change ext4 to convert extents before clearing PageWriteback, this will take care also of extent conversion. Now a call to ext4_flush_unwritten_io() in ext4_ext_truncate() resolves the problems. It's called after invalidating page cache so we know all the pending IO for the truncated / punched area is finished, just a conversion may be still pending. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html