Re: [PATCH v3] dm-crypt: fix deadlock when swapping to encrypted device

John Dorminy <jdorminy@xxxxxxxxxx> · Mon, 30 Nov 2020 17:29:45 -0500

I am the first to admit I don't understand all the subtleties of the
block layer, but I'm worried about a deadlock in close analogy to that
described in bio_alloc_bioset.
https://elixir.bootlin.com/linux/latest/source/block/bio.c#L463
Hopefully ya'll can clear up my understanding...

Specifically, if a segment of a target blocks in its map function, I
believe that risks deadlock due to memory congestion. Such a action
blocks the rest of the IO being submitted by the current thread, and
the other IO currently being submitted may potentially be to other
segments which may make progress and free memory.

As a example, suppose something submits a bio to a device; the device
submits three child bios in its map / submit_bio function, some with
their own data allocated, to different devices, which add up to all
the reclaimable memory on the system. The first of these bios gets
blocked by the target (perhaps because it needs memory, or is at
capacity). Because submit_bio_noacct() traverses bio submittals
depth-first, the other two child bios will not be submitted to their,
different, devices until the first of these bios finishes being
submitted; although were those other two bios allowed to make
progress, they might complete and free memory. (Admittedly, they might
need to make further memory allocations to make progress. But in
theory the mempool / bio_set reserve should allow forward progress on
these other bios, I think.).

Even this limit only reduces, not eliminates, the problem. Consider a
machine where fewer than 32768 bios exhausts the available memory;
additional IOs will be accepted, but will be blocked by trying to
allocate memory; that memory may be necessary to service the requests
already in progress, causing a similar starvation of other
memory-requiring work on the machine. It may be less likely on a
well-tuned machine for there to be less memory than needed for
dm-crypt to service 32768 IOs, but it's still possible.

I think it would be safer to stash excess bios on a list for future
work, and, when congestion support exists again (I don't think it
currently does, but I haven't kept good track), perhaps use that
mechanism to signal when the device is at capacity. But I am probably
being too paranoid and missing some subtlety above.

Less major, and hopefully clearer, thoughts:

dm-crypt already has a concept of the max number of pages allocatable
for currently active IOs -- specifically DM_CRYPT_MEMORY_PERCENT per
cpu. If you're trying to scale by amount of memory on the system,
perhaps going off DM_CRYPT_MEMORY_PERCENT is safer. I'm somewhat
baffled why that mechanism isn't enough for the observed problem, tbh.

Perhaps it would be better to add a maximum allocatable objects
mechanism, if it's safe, to mempool, dm, or bioset. If it were in
bio_alloc_bioset, it could take advantage of the same rescuer
workqueue to keep from blocking, potentially.

Your patch format doesn't work when I try to apply them via git apply
patchfile for testing, while most patches do... Not sure if it's you
or me, but it seems most patches from git send-email / git
format-patch have some stats about inserts/deletes after the ---
marker, which marker seems missing from your messages. (Also,
traditionally, I think the changes between patch versions go after the
--- marker, so they don't go in the change description of the commit.)

Thanks!

John Dorminy

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel