Eric Biggers <ebiggers@xxxxxxxxxx> writes: > On Tue, Nov 28, 2023 at 02:16:44PM +0000, Luis Henriques wrote: >> > >> > Yeah, looking closer it makes sense. Sorry for the noise. I'm currently >> > investigating a test failure (which I can't reproduce locally) where >> > 'orig_key' unexpectedly is set to '1' and makes the test fail because it >> > was supposed to be '0'. That's when this caught my attention. Anyway, >> > I'll go look somewhere else. >> >> OK, I'm not 100% sure yet, but I've an idea about what's going on with >> this test failure. >> >> I _think_ that even after the following is done in the test: >> >> _user_do_rm_enckey $SCRATCH_MNT $keyid >> _scratch_cycle_mount >> >> the key garbage collector may not have finish running. And then, when we >> read '/proc/key-users', we can race against key_user_put(), which needs >> key_user_lock, which is also grabbed while the proc file seq_operations >> are run. >> >> Eric, does this make any sense? There is a loop in the test to wait for >> invalidated keys, but I believe it's not relevant anymore since commit >> d7e7b9af104c ("fscrypt: stop using keyrings subsystem for >> fscrypt_master_key"). But I might be misunderstanding the code. > > Thanks for looking into this! I had noticed this test is still flaky on arm64 > but haven't had a chance to look into it. Yes, it's probably related to the key > garbage collector again. The test needs to wait for the fscrypt "user" keys > (key_type_fscrypt_user in the kernel) to be released from the quota. I think > that loop in the test does not have the intended effect because it waits for > "invalidated" keys, but the fscrypt "user" keys (which are charged to the quota) > are never invalidated; they're just released normally. There used to be another > key (in the "keyrings" subsystem sense of the word "key") associated with each > fscrypt key, and that key was indeed invalidated, but that's no longer the case. > Awesome, thanks for confirming this. That loop probably made sense back when keys were invalidated -- that behaviour was changed by the commit I mentioned, I believe. Anyway, it's probably better to keep it the loop for testing old kernels, as it doesn't really hurt. > > Maybe there's a better way to wait for the key garbage collector to > finish. > > Or the kernel could be changed to make releasing the key quota be synchronous. > Unfortunately the keyrings subsystem doesn't seem to work that way, and fscrypt > is tying into the key quota from the keyrings subsystem, so it is subject to its > limitations. But maybe there's a way to do it. Hmm... yeah, that requires a closer look at that subsystem to see if something can be done. I'll try to find something there that would make that test more reliable. Cheers, -- Luís