Re: [PATCH v6 00/11] ext4: port direct I/O to iomap infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 31, 2019 at 05:54:16PM +0100, Jan Kara wrote:
> On Thu 31-10-19 20:16:41, Matthew Bobrowski wrote:
> > On Wed, Oct 30, 2019 at 12:39:18PM +0100, Jan Kara wrote:
> > > On Wed 30-10-19 12:26:52, Jan Kara wrote:
> > > Hum, actually no. This write from fsx output:
> > > 
> > > 24( 24 mod 256): WRITE    0x23000 thru 0x285ff  (0x5600 bytes)
> > > 
> > > should have allocated blocks to where the failed write was going (0x24000).
> > > But still I'd expect some interaction between how buffered writes to holes
> > > interact with following direct IO writes... One of the subtle differences
> > > we have introduced with iomap conversion is that the old code in
> > > __generic_file_write_iter() did fsync & invalidate written range after
> > > buffered write fallback and we don't seem to do that now (probably should
> > > be fixed regardless of relation to this bug).
> > 
> > After performing some debugging this afternoon, I quickly realised
> > that the fix for this is rather trivial. Within the previous direct
> > I/O implementation, we passed EXT4_GET_BLOCKS_CREATE to
> > ext4_map_blocks() for any writes to inodes without extents. I seem to
> > have missed that here and consequently block allocation for a write
> > wasn't performing correctly in such cases.
> 
> No, this is not correct. For inodes without extents we used
> ext4_dio_get_block() and we pass DIO_SKIP_HOLES to __blockdev_direct_IO().
> Now DIO_SKIP_HOLES means that if starting block is within i_size, we pass
> 'create == 0' to get_blocks() function and thus ext4_dio_get_block() uses
> '0' argument to ext4_map_blocks() similarly to what you do.

Ah right, I missed that part. :(

> And indeed for inodes without extents we must fallback to buffered IO for
> filling holes inside a file to avoid stale data exposure (racing DIO read
> could read block contents before data is written to it if we used
> EXT4_GET_BLOCKS_CREATE).

Well in this case I'm pretty sure I know exactly where the problem
resides. I seem to be falling back to buffered I/O from
ext4_dio_write_iter() without actually taking into account any of the
data that may have partially been written by the direct I/O. So, when
returning the bytes written back to userspace it's whatever actually
is returned by ext4_buffered_write_iter(), which may not necessarily
be the amount of bytes that were expected, so it should rather be
ext4_dio_write_iter() + ext4_buffered_write_iter()...

> > Also, I agree, the fsync + page cache invalidation bits need to be
> > implemented. I'm just thinking to branch out within
> > ext4_buffered_write_iter() and implement those bits there i.e.
> > 
> > 	...
> > 	ret = generic_perform_write();
> > 
> > 	if (ret > 0 && iocb->ki_flags & IOCB_DIRECT) {
> > 	   	err = filemap_write_and_wait_range();
> > 
> > 		if (!err)
> > 			invalidate_mapping_pages();
> > 	...
> > 
> > AFAICT, this would be the most appropriate place to put it? Or, did
> > you have something else in mind?
> 
> Yes, either this, or maybe in ext4_dio_write_iter() after returning from
> ext4_buffered_write_iter() would be even more logical.

Yes, let's stick with doing it within ext4_dio_write_iter().

--<M>--




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux