On Thu, Mar 02, 2023 at 02:28:00PM -0800, Nathan Huckleberry wrote: > Hey Eric, > > On Sun, Feb 26, 2023 at 12:43 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > Once all I/O using a blk_crypto_key has completed, filesystems can call > > blk_crypto_evict_key(). However, the block layer doesn't call > > blk_crypto_put_keyslot() until the request is being cleaned up, which > > happens after upper layers have been told (via bio_endio()) the I/O has > > completed. This causes a race condition where blk_crypto_evict_key() > > can see 'slot_refs > 0' without there being an actual bug. > > > > This makes __blk_crypto_evict_key() hit the > > 'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without > > doing anything, eventually causing a use-after-free in > > blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only > > been seen when per-file keys are being used with fscrypt.) > > > > There are two options to fix this: either release the keyslot in > > blk_update_request() just before bio_endio() is called on the request's > > last bio, or just make __blk_crypto_evict_key() ignore slot_refs. Let's > > go with the latter solution for now, since it avoids adding overhead to > > the loop in blk_update_request(). (It does have the disadvantage that > > hypothetical bugs where a key is evicted while still in-use become > > harder to detect. But so far there haven't been any such bugs anyway.) > > I disagree with the proposal to ignore the race condition in > blk_crypto_evict_key(). As you said, ignoring the error could lead to > undetected bugs in the future. Instead, I think we should focus on > fixing the function ordering so that blk_crypto_put_keyslot() is > called before blk_crypto_evict_key(). > > I think the overhead is a necessary trade-off to ensure correctness. > > Thanks, Sure, I'm concerned about pushback on adding something to blk_update_request() that's not critical, but I'll send out that patch for consideration too. - Eric