On Tue, Nov 28, 2023 at 02:16:44PM +0000, Luis Henriques wrote: > > > > Yeah, looking closer it makes sense. Sorry for the noise. I'm currently > > investigating a test failure (which I can't reproduce locally) where > > 'orig_key' unexpectedly is set to '1' and makes the test fail because it > > was supposed to be '0'. That's when this caught my attention. Anyway, > > I'll go look somewhere else. > > OK, I'm not 100% sure yet, but I've an idea about what's going on with > this test failure. > > I _think_ that even after the following is done in the test: > > _user_do_rm_enckey $SCRATCH_MNT $keyid > _scratch_cycle_mount > > the key garbage collector may not have finish running. And then, when we > read '/proc/key-users', we can race against key_user_put(), which needs > key_user_lock, which is also grabbed while the proc file seq_operations > are run. > > Eric, does this make any sense? There is a loop in the test to wait for > invalidated keys, but I believe it's not relevant anymore since commit > d7e7b9af104c ("fscrypt: stop using keyrings subsystem for > fscrypt_master_key"). But I might be misunderstanding the code. Thanks for looking into this! I had noticed this test is still flaky on arm64 but haven't had a chance to look into it. Yes, it's probably related to the key garbage collector again. The test needs to wait for the fscrypt "user" keys (key_type_fscrypt_user in the kernel) to be released from the quota. I think that loop in the test does not have the intended effect because it waits for "invalidated" keys, but the fscrypt "user" keys (which are charged to the quota) are never invalidated; they're just released normally. There used to be another key (in the "keyrings" subsystem sense of the word "key") associated with each fscrypt key, and that key was indeed invalidated, but that's no longer the case. Maybe there's a better way to wait for the key garbage collector to finish. Or the kernel could be changed to make releasing the key quota be synchronous. Unfortunately the keyrings subsystem doesn't seem to work that way, and fscrypt is tying into the key quota from the keyrings subsystem, so it is subject to its limitations. But maybe there's a way to do it. - Eric