Hello Ted/Jan,
On 1/7/20 10:52 PM, Jan Kara wrote:
On Tue 07-01-20 12:11:09, Theodore Y. Ts'o wrote:
Hmm..... There's actually an even more radical option we could use,
given that Ritesh has made dioread_nolock work on block sizes < page
size. We could make dioread_nolock the default, until we can revamp
ext4_writepages() to write the data blocks first....
Agreed. I guess it should be a straight forward change to make.
It should be just removing test_opt(inode->i_sb, DIOREAD_NOLOCK)
condition from ext4_should_dioread_nolock().
Yes, that's a good point. And I'm not opposed to that if it makes the life
simpler. But I'd like to see some performance numbers showing how much is
writeback using unwritten extents slower so that we don't introduce too big
regression with this...
Yes, let me try to get some performance numbers with dioread_nolock as
the default option for buffered write on my setup.
AFAIU this should also fix the stale data exposure race between DIO
read and ext4_page_mkwrite, since we will by default be using unwritten
extents.
Currently I am testing the patch to fix this race which is based on our
previous discussion. Will anyway post that. After that I can also
collect the performance numbers for this suggested option (to make
dioread_nolock as default)
-ritesh