Re: [patch] RFC directio: partial writes support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 25 Feb 2010 15:45:58 +0300
Dmitry Monakhov <dmonakhov@xxxxxxxxxx> wrote:

> Can someone please describe me why directio deny partial writes.
> For example if someone try to write 100Mb but file system has less
> data it return ENOSPC in the middle of block allocation.
> All allocated blocks will be truncated (it may be 100Mb -4k) end
> ENOSPC will be returned. As far as i remember direct_io always act
> like this, but i never asked why?
> Why do we have to give up all the progress we made?
> In fact partial writes are possible in case of holes, when we 
> fall back to buffered write. XFS implemented partial writes.

The problem with direct-io writes is that the writes don't necessarily
complete in file-offset-ascending order.  So if we've issued 50 write
BIOs and then hit an EIO on a BIO then we could have a hunk of
unwritten data with newly-writted data either side of it.  If we get a
bunch of discontiguous EIO BIOs coming in then the problem gets even
messier - we have a span of disk which has a random mix of
correctly-written and not-correctly-written runs of sectors.  What do
we do with that?

The code _could_ perhaps go back and crawl through the request and
identify the number of successfully-written bytes between
start-of-request and first-EIO and then return that.  But we didn't
bother.


ENOSPC errors are handled via the same code path and hence got
deoptimised due to this EIO handling.  We could perhaps improve the
ENOSPC handling along the lines you propose, as long as we
appropriately take care of EIO considerations.  Which, afacit, your
patch didn't do.

The presence of opt-in DIO_PARTIAL_WRITE thing is rather unfortunate -
it would be better to make this change for all filesystems in one hit. 
But I guess DIO_PARTIAL_WRITE permits us to migrate filesystems
one-at-a-time as testing permits.  But the aim should be to remove
DIO_PARTIAL_WRITE altogether once all the conversion and testing is
completed.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux