Re: [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/4/15 20:34, Jan Kara wrote:
On Mon 15-04-24 12:28:01, Baokun Li wrote:
On 2023/6/5 23:08, Jan Kara wrote:
On Mon 05-06-23 15:55:35, Matthew Wilcox wrote:
On Mon, Jun 05, 2023 at 02:21:41PM +0200, Jan Kara wrote:
On Mon 05-06-23 11:16:55, Jan Kara wrote:
Yeah, I agree, that is also the conclusion I have arrived at when thinking
about this problem now. We should be able to just remove the conversion
from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
growing i_size.
OK, thinking more about this and searching through the history, I've
realized why the conversion is originally in ext4_page_mkwrite(). The
problem is described in commit 7b4cc9787fe35b ("ext4: evict inline data
when writing to memory map") but essentially it boils down to the fact that
ext4 writeback code does not expect dirty page for a file with inline data
because ext4_write_inline_data_end() should have copied the data into the
inode and cleared the folio's dirty flag.

Indeed messing with xattrs from the writeback path to copy page contents
into inline data xattr would be ... interesting. Hum, out of good ideas for
now :-|.
Is it so bad?  Now that we don't have writepage in ext4, only
writepages, it seems like we have a considerably more benign locking
environment to work in.
Well, yes, without ->writepage() it might be *possible*. But still rather
ugly. The problem is that in ->writepages() i_size is not stable. Thus also
whether the inode data is inline or not is not stable. We'd need inode_lock
for that but that is not easily doable in the writeback path - inode lock
would then become fs_reclaim unsafe...

								Honza
Hi Honza!
Hi Ted!
Hi Matthew!

Long time later came back to this, because while discussing another similar
ABBA problem with Hou Tao, he mentioned VM_FAULT_RETRY, and then I
thought that this could be used to solve this problem as well.

The general idea is that if we see a file with inline data in
ext4_page_mkwrite(),
we release the mmap_lock and grab the inode_lock to convert the inline data,
and then return VM_FAULT_RETRY to retry to get the mmap_lock.

The code implementation is as follows, do you have any thoughts?
So the problem with this is that VM_FAULT_RETRY is not always an option -
in particular the caller has to set FAULT_FLAG_ALLOW_RETRY to indicate it
is prepared to handle VM_FAULT_RETRY return. See how
maybe_unlock_mmap_for_io() is carefully checking this.
Yes, at least we need to check for FAULT_FLAG_RETRY_NOWAIT.
There are callers
(most notably some get_user_pages() users) that don't set
FAULT_FLAG_ALLOW_RETRY so the escape through VM_FAULT_RETRY is sadly not a
reliable solution.
It is indeed sad.  I'm going to go learn more about the code for
FAULT_FLAG_ALLOW_RETRY.
My long-term wish is we were always allowed to use VM_FAULT_RETRY and that
was actually what motivated some get_user_pages() cleanups I did couple
years ago. But dealing with all the cases in various drivers was too
difficult and I've run out of time. Now maybe it would be worth it to
revisit this since things have changed noticeably and maybe now it would be
easier to achive the goal...

								Honza
That sounds like a great idea. I will try to get the history on it and
then come back.

Thank you very much for your patient explanation!
--
With Best Regards,
Baokun Li
.




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux