Re: [PATCH 06/12] iomap: Introduce read_inline() function hook

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Fri, 11 Oct 2024 11:28:42 +0800

Hi Dave,

On 2024/10/11 08:43, Dave Chinner wrote:
On Thu, Oct 10, 2024 at 02:10:25PM -0400, Goldwyn Rodrigues wrote:

...

.... there is specific ordering needed.

For writes, the ordering is:

	1. pre-write data compression - requires data copy
	2. pre-write data encryption - requires data copy
	3. pre-write data checksums - data read only
	4. write the data
	5. post-write metadata updates

We cannot usefully perform compression after encryption -
random data doesn't compress - and the checksum must match what is
written to disk, so it has to come after all other transformations
have been done.

For reads, the order is:

	1. read the data
	2. verify the data checksum
	3. decrypt the data - requires data copy
	4. decompress the data - requires data copy
	5. place the plain text data in the page cache

Just random stuffs for for reference, currently fsverity makes
markle tree for the plain text, but from the on-disk data
security/authentication and integrity perspective, I guess the
order you mentioned sounds more saner to me, or:

 	1. read the data
 	2. pre-verify the encoded data checksum (optional,
                                       dm-verity likewise way)
 	3. decrypt the data - requires data copy
 	4. decompress the data - requires data copy
	5. post-verify the decoded (plain) checksum (optional)
 	6. place the plain text data in the page cache

2,5 may apply to different use cases though.

...

Compression is where using xattrs gets interesting - the xattrs can
have a fixed "offset" they blong to, but can store variable sized
data records for that offset.

If we say we have a 64kB compression block size, we can store the
compressed data for a 64k block entirely in a remote xattr even if
compression fails (i.e. we can store the raw data, not the expanded
"compressed" data). The remote xattr can store any amount of smaller
data, and we map the compressed data directly into the page cache at
a high offset. Then decompression can run on the high offset pages
with the destination being some other page cache offset....

but compressed data itself can also be multiple reference (reflink
likewise), so currently EROFS uses a seperate pseudo inode if it
decides with physical addresses as indexes.

On the write side, compression can be done directly into the high
offset page cache range for that 64kb offset range, then we can
map that to a remote xattr block and write the xattr. The xattr
naturally handles variable size blocks.

Also different from plain text, each compression fses may keep
different encoded data forms (e.g. fses could add headers or
trailers to the on-disk compressed data or add more informations
to extent metadata) for their own needs.  So unlike the current
unique plain text process, an unique encoder/decoder may not sound
quite flexible though.

Thanks,
Gao Xiang