Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 6 Jul 2018 15:42:50 +1000

On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
> 
> This series implements xfs inode data inlining feature.

Cool. I was actually thinking about this earlier this week. :)

> Refered below link during development:
> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
> 
> 
> How it works:
> - the data inlining happens at:
>   write_iter: for DIO/DAX write
>   writeback: for buffered write
> - extents to local format conversion is done in writeback but not in write_iter
> - local to extents format conversion is done in the write_iter and writeback

The problem with the way local to extents conversion is imlpemented
in this patchset is that we do not synchronise data writeback with
journal commits and log recovery.  The local to extents conversion
appears to be done like so:

----
writeback arrives in local format, data in extent cached page
convert inode to extent format
allocate new block
commit allocation and inode format conversion
submit data IO to newly allocated block.
----

This works when nothing goes wrong, but it's not failsafe. The
problem is that we've logged the inode data fork conversion before
we've completed writing the data to the new block. IOWs, we can end
up in the state where the active journal recoery window does not contain
the data in the inode and the data has not yet reached the new
allocated block on disk, either.

If we crash in this window, we lose the data that was in the inode -
log recovery finishes with the inode in extent form pointing to
uninitialised data blocks. i.e. not only is it a data loss event,
it's also a security issue because it exposes stale data.

This has been the unsolved problem from all previous attempts to
implement inline data in XFS. I actually outlined this problem and
ways to solve it in the mailing list post linked above, but it
doesn't appear that either mechanism was implemented in this
patchset.

However, unlike that post from 2008, we now have infrastructure that
can be used to solve to this problem: the data path copy-on-write
mechanism.  The COW mechanism is essentially a generic
implementation of the "Method 1: use an intent/done transaction
pair" solution I describe in the above post.

This mechanism solves the above problem by storing the newly
allocated block in the in-memory COW fork and doesn't modify the
data fork until after the data write IO completes. IOWs, we do
allocation before the write IO, and do the data fork and BMBT
manipulation after the IO completes. i.e. we do the local->extent
data fork modification at IO completion using the extent that was
stored in the in memory COW fork.

Hence I think the inline data write path needs to piggy back on the
iomap COW path that we use for writing to shared extents if the
write would cause a data fork format change.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html