Splitting dirty fs folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At the moment, when we truncate (also holepunch, etc) a file,
the VFS attempts to split any large folios which overlap the newly
created boundary in the file.  See mm/truncate.c for the callers of
->invalidate_folio.

We need the filesystem and the MM to cooperate on splitting a folio
because there's FS metadata attached to folio->private.  We have per-folio
state (uptodate, dirty) which the filesystem keeps per-block state for
and uses the folio state as a summary (if every block is uptodate,
the folio is uptodate.  if any block is dirty, the folio is dirty).
If we just throw away that per-folio extra state we risk not writing
back blocks which are dirty, or losing buffered writes as we re-read
blocks which were more uptodate in memory than on disk.  There's no
safe state to set the folio to.

This is fine if the entire folio is uptodate, and it generally is today
because large folios are only created through readahead, which will
bring the entire folio uptodate unless there is a read error.  But when
creating a large folio in the write path, we can end up with large folios
which are not uptodate under various circumstances.  For example, I've
captured one where we write to pos:0x2a0e5f len:0xf1a1.  Because this is
on a 1kB block size filesystem, we leave the first three blocks in the folio
unread, and the uptodate bits are fffffffffffffff8.  That means that
the folio as a whole is not uptodate.

Option 1: Read the start of the folio so we can set the whole folio
uptodate.  In this case, we're already submitting a read for bytes
0x2a0c00-0x2a0fff (so we can overwrite the end of that block).  We could
expand that to read 0x2a0000-0x2a0fff instead.  This could get tricky;
at the moment we're guaranteed to have the iomap that covers the start
of the block, but we might have to do a lookup to find the iomap(s)
that covers the start of the folio.

Option 2: In the invalidate_folio implementation, writeback the folio
so it is no longer dirty.  I'm not sure we have all the information we
need to start writeback, and it'll annoy the filesystem as it has to
allocate space if it wasn't already allocated.

Option 3: Figure out a more complicated dance between the FS and the MM
that allows the FS to attach state to the newly created folios before
finally freeing the original folio.

Option 4: Stop splitting folios on holepunch / truncate.  Folio splits
can fail, so we all have to cope with folios that substantially overhang
a hole/data/EOF boundary.  We don't attempt to split folios on readahead
when we discover we're trying to read from a hole, we just zero the
appropriate chuks of the folio.  We do attempt to not allocate folios
which extend more than one page past EOF, but that's subject to change
anyway.

Option 5: If the folio is both dirty and !uptodate, just refuse to split
it, like if somebody else had a reference on it.  A less extreme version
of #4.

I may have missed some other option.  Option 5 seems like the least
amount of work.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux