On Wed 14-04-21 10:01:13, Dave Chinner wrote: > On Tue, Apr 13, 2021 at 01:28:46PM +0200, Jan Kara wrote: > > * > > * ->mmap_lock > > * ->i_mmap_rwsem > > @@ -85,7 +86,8 @@ > > * ->i_pages lock (arch-dependent flush_dcache_mmap_lock) > > * > > * ->mmap_lock > > - * ->lock_page (access_process_vm) > > + * ->i_mapping_sem (filemap_fault) > > + * ->lock_page (filemap_fault, access_process_vm) > > * > > * ->i_rwsem (generic_perform_write) > > * ->mmap_lock (fault_in_pages_readable->do_page_fault) > > @@ -2276,16 +2278,28 @@ static int filemap_update_page(struct kiocb *iocb, > > { > > int error; > > > > + if (iocb->ki_flags & IOCB_NOWAIT) { > > + if (!down_read_trylock(&mapping->host->i_mapping_sem)) > > + return -EAGAIN; > > + } else { > > + down_read(&mapping->host->i_mapping_sem); > > + } > > We really need a lock primitive for this. The number of times this > exact lock pattern is being replicated all through the IO path is > getting out of hand. > > static inline bool > down_read_try_or_lock(struct rwsem *sem, bool try) > { > if (try) { > if (!down_read_trylock(sem)) > return false; > } else { > down_read(&mapping->host->i_mapping_sem); > } > return true; > } > > and the callers become: > > if (!down_read_try_or_lock(sem, (iocb->ki_flags & IOCB_NOWAIT))) > return -EAGAIN; > > We can do the same with mutex_try_or_lock(), down_try_or_lock(), etc > and we don't need to rely on cargo cult knowledge to propagate this > pattern anymore. Because I'm betting relatively few people actually > know why the code is written this way because the only place it is > documented is in an XFS commit message.... > > Doing this is a separate cleanup, though, and not something that > needs to be done in this patchset. Yep, good idea but let's do it in a separate patch set. > > index c5b0457415be..ac5bb50b3a4c 100644 > > --- a/mm/readahead.c > > +++ b/mm/readahead.c > > @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > */ > > unsigned int nofs = memalloc_nofs_save(); > > > > + down_read(&mapping->host->i_mapping_sem); > > /* > > * Preallocate as many pages as we will need. > > */ > > I can't say I'm a great fan of having the mapping reach back up to > the host to lock the host. THis seems the wrong way around to me > given that most of the locking in the IO path is in "host locks > mapping" and "mapping locks internal mapping structures" order... > > I also come back to the naming confusion here, in that when we look > at this in long hand from the inode perspective, this chain actually > looks like: > > lock(inode->i_mapping->inode->i_mapping_sem) > > i.e. the mapping is reaching back up outside it's scope to lock > itself against other inode->i_mapping operations. Smells of layering > violations to me. > > So, next question: should this truncate semanphore actually be part > of the address space, not the inode? This patch is actually moving > the page fault serialisation from the inode into the address space > operations when page faults and page cache operations are done, so > maybe the lock should also make that move? That would help clear up > the naming problem, because now we can name it based around what it > serialises in the address space, not the address space as a whole... I think that moving the lock to address_space makes some sence although the lock actually protects consistency of inode->i_mapping->i_pages with whatever the filesystem has in its file_offset->disk_block mapping structures (which are generally associated with the inode). So it is not only about inode->i_mapping contents but I agree that struct address_space is probably a bit more logical place than struct inode. Regarding the name: How about i_pages_rwsem? The lock is protecting invalidation of mapping->i_pages and needs to be held until insertion of pages into i_pages is safe again... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR