Re: [PATCH v2] xfs: enable WQ_MEM_RECLAIM on m_sync_workqueue

"NeilBrown" <neilb@xxxxxxx> · Wed, 10 Jul 2024 09:39:04 +1000

On Sat, 06 Jul 2024, Christoph Hellwig wrote:
> Btw, one issue with using direct I/O is that need to synchronize with
> page cache access from the server itself.  For pNFS we can do that as
> we track outstanding layouts.  Without layouts it will be more work
> as we'll need a different data structure tracking grant for bypassing
> the server.  Or just piggy back on layouts anyway as that's what they
> are doing.
> 

I'm missing something here.

Certainly if localio or nfsd were to choose to use direct I/O we would
need to ensure proper synchronisation with page cache access.

Does VFS/MM already provide enough synchronisation?  A quick look at the
code suggests:
- before an O_DIRECT read any dirty pages that overlap are flushed to
  the device.
- after a write, any pages that overlap are invalidated.

So as long as IO requests don't overlap we should have adequate
synchronisation.

If they do overlap we should expect inconsistent results.  Maybe we
would expect reads to only "tear" on a page boundary, and writes to only
interleave in whole pages, and probably using O_DIRECT would not give
any whole-page guarantees.  So maybe that is a problem.

If it is a problem, I think it can only be fixed by keeping track of which
pages are under direct IO, and preventing access to the page-cache for
those regions.  This could be done in the page-cache itself, or in a
separte extent-tree.  I don't think the VFS/MM supports this - does any
filesystem?

(or we could prevent adding any new pages to the page-cache for an inode
 with i_dio_count > 0 - but that would likely hurt performance.)

I can see that pNFS extents could encode the information to enforce
this, but I don't see how that is mapped to filesystems in Linux at
present.

What am I missing?

Thanks,
NeilBrown