Re: [PATCH -v2] ext4: introduce linear search for dentries

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 20 Feb 2025 09:46:11 -0500

On Wed, Feb 19, 2025 at 06:58:00PM -0700, Andreas Dilger wrote:
> Sure, my suggestions are aimed at minimizing the impact of this extra
> (and very expensive) fallback mechanism.  If there is a direct way to
> determine which filenames were impacted by the earlier bug, and then
> do only two lookups (one with the "buggy" casefolded name, one with the
> "good" casefolded name) then this would be (at worst) a 2x slowdown for
> the lookup, instead of a 1000x slowdown (or whatever, for large directories).
> 
> Also, since the number of users affected by this bug is relatively small
> (only users running kernels >= v6.12-rc2-1-g5c26d2f1d3f5 where the broken
> patch landed and v6.13-rc2-36-g231825b2e1ff when it was reverted), but the
> workaround by default affects everyone using the casefold feature, it
> behooves us to minimize the performance impact of the workaround.

This is why I added a new encoding flag, SB_ENC_NO_COMPAT_FALLBACK_FL,
so if the system administrator is sure that the device never had that
alternate encoding, we don't have to pay that performance penalty.

The problem is the original reason for making the change was a
"security vulnerability" where if you had one of these invisibile
zero-length characters in a directory named ".git", this could cause
someone who was using git on a case-fold directory vulnerable to an
attach where if they pulled from reponsitory that was controlled by a
malicious entity, that this could cause the pull to resullt in an
overwrite of .git/config.  So it was a relativey narrow range in
Linus's tree, but the "security fix" was backported into LTS kernels,
and pushed out to a large number of Android handsets which do use case
folding.

This is why the default is to do the fallback; other than Android
handsets, the number of user of case folding is mercifully quite
small.  And the problem was detected on Android machines, where users
who had files that included characters such as '❤️' or '❤' could no
longer access them; fixing that regression had to take priority.

> We have been looking at adding casefold support to Lustre, in order to
> improve Samba export performance (which also has a "scan all entries"
> fallback), and we cannot control how many files are in a single directory.

For Lustre, if you know that no one is going to be using kernels with
the changed encoding, you could just aways set
SB_ENC_NO_COMPAT_FALLBACK_FL and just be happy.  If you think that
Lustre users might actually use git, and you are worried about this
"security vulnerability", we could ask the git project to fix it at
their level.  I personally don't care that much, since I'm not sure
how many people would really want to be doing development using git on
an Android handset using Termux.  :-)

> It seems likely that systems have been using casefold directly on ext4
> for Samba as well.  If the performance impact of "scan all entries" is
> noticeable for Samba, then it would be noticeable for this fallback.

I'm not sure how many Samba installations actually do use it, but if
they do, but it might not be that bad, since we do have negative
dentries for the common misses in a search path (for example).  And if
it is safe, we can provide utilities to make it easier to
set SB_ENC_NO_COMPAT_FALLBACK_FL.

> One option would be to have the kernel re-hash any entries that it finds
> with the old filename, so that the directories repair themselves, and the
> workaround could be removed after some time.  Also, have e2fsck re-hash
> the filenames in this case, so that there is a long-term solution after
> the kernel workaround is removed.

There are quite a lot of things that can be done, but quite frankly,
I'm not really that excited to invest the time to do something more
complicated.  If someone does want to spend more time, including
changes that might involve another encoding bit so we could actualy
safely eliminate the "dangerous" zero-width characters while allowing
'❤️' or '❤' to be distinct and without breaking file systems that have
those characters, I'm certainly willing to entertain patches.

      		      			   	     - Ted