Re: [PATCH 1/2] ext4: Handle casefolding with encryption

Andreas Dilger <adilger@xxxxxxxxx> · Thu, 25 Feb 2021 22:14:07 -0700

On Feb 18, 2021, at 4:21 PM, Daniel Rosenberg <drosen@xxxxxxxxxx> wrote:
> 
> On Wed, Feb 17, 2021 at 2:48 PM Andreas Dilger <adilger@xxxxxxxxx> wrote:
>> 
>> On Feb 17, 2021, at 9:08 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>>> 
>>> The problem is in how the space after the filename in a directory is
>>> encoded.  The dirdata format is (mildly) expandable, supporting up to
>>> 4 different metadata chunks after the filename, using a very
>>> compatctly encoded TLV (or moral equivalent) scheme.  For directory
>>> inodes that have both the encyption and compression flags set, we have
>>> a single blob which gets used as the IV for the crypto.
>>> 
>>> So it's the difference between a simple blob that is only used for one
>>> thing in this particular case, and something which is the moral
>>> equivalent of simple ASN.1 or protobuf encoding.
>>> 
>>> Currently, datadata has defined uses for 2 of the 4 "chunks", which is
>>> used in Lustre servers.  The proposal which Andreas has suggested is
>>> if the dirdata feature is supported, then the 3rd dirdata chunk would
>>> be used for the case where we currently used by the
>>> encrypted-casefolded extension, and the 4th would get reserved for a
>>> to-be-defined extension mechanism.
>>> 
>>> If there ext4 encrypted/casefold is not yet in use, and we can get the
>>> changes out to all potential users before they release products out
>>> into the field, then one approach would be to only support
>>> encrypted/casefold when dirdata is also enabled.
>>> 
>>> If ext4 encrypted/casefold is in use, my suggestion is that we support
>>> both encrypted/casefold && !dirdata as you have currently implemented
>>> it, and encrypted/casefold && dirdata as Andreas has proposed.
>>> 
>>> IIRC, supporting that Andreas's scheme essentially means that we use
>>> the top four bits in the rec_len field to indicate which chunks are
>>> present, and then for each chunk which is present, there is a 1 byte
>>> length followed by payload.  So that means in the case where it's
>>> encrypted/casefold && dirdata, the required storage of the directory
>>> entry would take one additional byte, plus setting a bit indicating
>>> that the encrypted/casefold dirdata chunk was present.
>> 
>> I think your email already covers pretty much all of the points.
>> 
>> One small difference between current "raw" encrypted/casefold hash vs.
>> dirdata is that the former is 4-byte aligned within the dirent, while
>> dirdata is packed.  So in 3/4 cases dirdata would take the same amount
>> of space (the 1-byte length would use one of the 1-3 bytes of padding
>> vs. the raw format), since the next dirent needs to be aligned anyway.
>> 
>> The other implication here is that the 8-byte hash may need to be
>> copied out of the dirent into a local variable before use, due to
>> alignment issues, but I'm not sure if that is actually needed or not.
>> 
>>> So, no, they aren't incompatible ultimatly, but it might require a
>>> tiny bit more work to integrate the combined support for dirdata plus
>>> encrypted/casefold.  One way we can do this, if we have to support the
>>> current encrypted/casefold format because it's out there in deployed
>>> implementations already, is to integrate encrypted/casefold &&
>>> !dirdata first upstream, and then when we integrate dirdata into
>>> upstream, we'll have to add support for the encrypted/casefold &&
>>> dirdata case.  This means that we'll have two variants of the on-disk
>>> format to test and support, but I don't think it's the going to be
>>> that difficult.
>> 
>> It would be possible to detect if the encrypted/casefold+dirdata
>> variant is in use, because the dirdata variant would have the 0x40
>> bit set in the file_type byte.  It isn't possible to positively
>> identify the "raw" non-dirdata variant, but the assumption would be
>> if (rec_len >= round_up(name_len, 4) + 8) in an encrypted+casefold
>> directory that the "raw" hash must be present in the dirent.
> 
> So sounds like we're going with the combined version. Andreas, do you
> have any suggestions for changes to the casefolding patch to ease the
> eventual merging with dirdata? A bunch of the changes are already
> pretty similar, so some of it is just calling essentially the same
> functions different things.

One thing I would suggest is to change the "is_fake_entry()" from using
offsets in the leaf block to using the content of the dirent to make
that decision.  Comparing entries against "." and ".." is trivial (and
already done in many places), and the checksum entry/tail has a "magic"
file type that can be used.  This will avoid potential problems if e.g.
encrypted entries are stored inline with the inode, and/or dirdata that
also adds fields to "." and "..".

Also, the patch adds the use of "lblk" all around the code, but that
wouldn't be needed if is_fake_entry() was updated as above?

Note in find_group_orlov() the filename hash doesn't strictly need to
match the actual hash used in the directory.  That is only for finding
a suitable group for allocating the inode, so it can be any relatively
uniform hash function and could remain DX_HASH_HALF_MD4.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP