Re: [PATCH 0/3] Revert parallel dio reads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Before doing this, you might want to have a chat and co-ordinate
with the folks that are currently trying to port the ext4 direct IO
code to use the iomap infrastructure:

https://lore.kernel.org/linux-ext4/20190827095221.GA1568@xxxxxxxxxxxxxxxxxxxxxx/T/#t

That is going to need the shared locking on read and will work just
fine with shared locking on write, too (it's the code that XFS uses
for direct IO). So it might be best here if you work towards shared
locking on the write side rather than just revert the shared locking
on the read side....

Yeah, after converting ext4 DIO path to iomap infrastructure, using shared
inode lock for all aligned non-extending DIO writes will be easy so I'd
prefer if we didn't have to redo the iomap conversion patches due to these
reverts.

I was looking into this to see what is required to convert this into shared locking for DIO write side. Why do you say shared inode lock only for *aligned non-extending DIO writes*?
non-extending means overwrite case only, which still in today's code is
not under any exclusive inode_lock (except the orphan handling and truncate handling in case of error after IO is completed). But even with
current code the perf problem is reported right?

If we see in today's code (including in iomap new code for overwrite case where we downgrade the lock). We call for inode_unlock in case of overwrite and then do the IO, since we don't have to allocate any blocks in this case.


	/*
	 * Make all waiters for direct IO properly wait also for extent
	 * conversion. This also disallows race between truncate() and
	 * overwrite DIO as i_dio_count needs to be incremented under
 	 * i_mutex.
	 */
	inode_dio_begin(inode);
	/* If we do a overwrite dio, i_mutex locking can be released */
	overwrite = *((int *)iocb->private);
	if (overwrite)
		inode_unlock(inode);
	
	write data (via __blockdev_direct_IO)

	inode_dio_end(inode);
	/* take i_mutex locking again if we do a ovewrite dio */
	if (overwrite)
		inode_lock(inode);



What can we do next:-

Earlier the DIO reads was not having any inode_locking.
IIUC, the inode_lock_shared() in the DIO reads case was added to
protect the race against reading back uninitialized/stale data for e.g.
in case of truncate or writeback etc.

So as Dave suggested, shouldn't we add the complete shared locking in DIO writes cases as well (except for unaligned writes, since it may required zeroing of partial blocks).


Could you please help me understand how we can go about it?
I was thinking if we can create uninitialized blocks beyond EOF, so that
any parallel read should only read 0 from that area and we may not require exclusive inode_lock. The block creation is anyway protected
with i_data_sem in ext4_map_blocks.


I do see that in case of bufferedIO writes, we use ext4_get_block_unwritten for dioread_nolock case. Which should
be true for even writes extending beyond EOF. This will
create uninitialized and mapped blocks only (even beyond EOF).
But all of this happen under exclusive inode_lock() in ext4_file_write_iter.


Whereas in case of DIO writes extending beyond EOF, we pass
EXT4_GET_BLOCKS_CREATE in ext4_map_blocks which may allocate
initialized & mapped blocks.
Do you know why so?


Do you think we can create uninit. blocks in dioread_nolock AIO/non-AIO DIO writes cases as well with only shared inode lock mode?
Do you see any problems with this?
Or do you have any other suggestion?


In case of XFS -
I do see it promotes the demotes the shared lock (IOLOCK_XXX) in xfs_file_aio_write_checks. But I don't know if after that, whether
it only holds the shared lock while doing the DIO IO?
And how does it protects against the parallel DIO reads which can potentially expose any stale data?


Appreciate any help & inputs from FS developers.

-ritesh




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux