On Mon 05-06-23 02:58:15, Matthew Wilcox wrote: > On Sun, Jun 04, 2023 at 05:08:21PM -0400, Theodore Ts'o wrote: > > On Sat, Jun 03, 2023 at 11:04:45PM -0400, Theodore Ts'o wrote: > > > I tried testing to see if this fixed [1], and it appears to be > > > triggering a lockdep warning[2] at this line in the patch: > > > > > > [1] https://syzkaller.appspot.com/bug?extid=f4582777a19ec422b517 > > > [2] https://syzkaller.appspot.com/x/report.txt?x=17260843280000 > > > > Looking at this more closely, the fundamental problem is by the time > > ext4_file_mmap() is called, the mm layer has already taken > > current->mm->mmap_lock, and when we try to take the inode_lock, this > > causes locking ordering problems with how buffered write path works, > > which take the inode_lock first, and then in some cases, may end up > > taking the mmap_lock if there is a page fault for the buffer used for > > the buffered write. > > > > If we're going to stick with the approach in this patch, I think what > > we would need is to add a pre_mmap() function to file_operations > > struct, which would get called by the mmap path *before* taking > > current->mm->mmap_lock, so we can do the inline conversion before we > > take the mmap_lock. > > > > I'm not sure how the mm folks would react to such a proposal, though. > > I could be seen as a bit hacky, and it's not clear that any file > > system other than ext4 would need something like this. Willy, as > > someone who does a lot of work in both mm and fs worlds --- I'm > > curious what you think about this idea? > > I'm probably missing something here, but why do we need to convert inline > data in page_mkwrite? mmap() can't change i_size (stores past i_size are > discarded), so we should be able to simply copy the data from the page > cache into the inode and write the inode when it comes to writepages() > time. > > Unless somebody does a truncate() or write() that expands i_size, but we > should be able to do the conversion then without the mmap_lock held. No? > I'm not too familiar with inline data. Yeah, I agree, that is also the conclusion I have arrived at when thinking about this problem now. We should be able to just remove the conversion from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when growing i_size. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR