On Tue, Jun 21, 2022 at 11:59 AM Jan Kara <jack@xxxxxxx> wrote: > > On Tue 21-06-22 10:49:48, Amir Goldstein wrote: > > > How exactly do you imagine the synchronization of buffered read against > > > buffered write would work? Lock all pages for the read range in the page > > > cache? You'd need to be careful to not bring the machine OOM when someone > > > asks to read a huge range... > > > > I imagine that the atomic r/w synchronisation will remain *exactly* as it is > > today by taking XFS_IOLOCK_SHARED around generic_file_read_iter(), > > when reading data into user buffer, but before that, I would like to issue > > and wait for read of the pages in the range to reduce the probability > > of doing the read I/O under XFS_IOLOCK_SHARED. > > > > The pre-warm of page cache does not need to abide to the atomic read > > semantics and it is also tolerable if some pages are evicted in between > > pre-warn and read to user buffer - in the worst case this will result in > > I/O amplification, but for the common case, it will be a big win for the > > mixed random r/w performance on xfs. > > > > To reduce risk of page cache thrashing we can limit this optimization > > to a maximum number of page cache pre-warm. > > > > The questions are: > > 1. Does this plan sound reasonable? > > Ah, I see now. So essentially the idea is to pull the readahead (which is > currently happening from filemap_read() -> filemap_get_pages()) out from under > the i_rwsem. It looks like a fine idea to me. Great! Anyone doesn't like the idea or has another suggestion? > > > 2. Is there a ready helper (force_page_cache_readahead?) that > > I can use which takes the required page/invalidate locks? > > page_cache_sync_readahead() should be the function you need. It does take > care to lock invalidate_lock internally when creating & reading pages. I Thanks, I'll try that. > just cannot comment on whether calling this without i_rwsem does not break > some internal XFS expectations for stuff like reflink etc. relink is done under xfs_ilock2_io_mmap => filemap_invalidate_lock_two so it should not be a problem. pNFS leases I need to look into. Thanks, Amir.