Hi On Mon, 30 Nov 2020, John Dorminy wrote: > I am the first to admit I don't understand all the subtleties of the > block layer, but I'm worried about a deadlock in close analogy to that > described in bio_alloc_bioset. > https://elixir.bootlin.com/linux/latest/source/block/bio.c#L463 > Hopefully ya'll can clear up my understanding... > > Specifically, if a segment of a target blocks in its map function, I > believe that risks deadlock due to memory congestion. Such a action > blocks the rest of the IO being submitted by the current thread, and > the other IO currently being submitted may potentially be to other > segments which may make progress and free memory. All the request-based drivers block in their map function if there's more than 128 requests (it is tunable in /sys/block/*/queue/nr_requests). So, many of the device mapper targets block as well - for example, the linear target just passes a bio down to an underlying device - so it will block if the underlying device has more than 128 requests in flight. If you don't block, kswapd will just send another bio - and that is the problem. > As a example, suppose something submits a bio to a device; the device > submits three child bios in its map / submit_bio function, some with > their own data allocated, to different devices, which add up to all > the reclaimable memory on the system. The first of these bios gets > blocked by the target (perhaps because it needs memory, or is at > capacity). Because submit_bio_noacct() traverses bio submittals > depth-first, the other two child bios will not be submitted to their, > different, devices until the first of these bios finishes being > submitted; although were those other two bios allowed to make > progress, they might complete and free memory. (Admittedly, they might > need to make further memory allocations to make progress. But in > theory the mempool / bio_set reserve should allow forward progress on > these other bios, I think.). > > Even this limit only reduces, not eliminates, the problem. Consider a > machine where fewer than 32768 bios exhausts the available memory; > additional IOs will be accepted, but will be blocked by trying to > allocate memory; that memory may be necessary to service the requests > already in progress, causing a similar starvation of other > memory-requiring work on the machine. It may be less likely on a > well-tuned machine for there to be less memory than needed for > dm-crypt to service 32768 IOs, but it's still possible. > > I think it would be safer to stash excess bios on a list for future > work, and, when congestion support exists again (I don't think it But if you don't block, you'll get more bios - so the list would grow beyond limit. And the excessive amount of bios in flight will reduce system responsiveness. > currently does, but I haven't kept good track), perhaps use that > mechanism to signal when the device is at capacity. But I am probably > being too paranoid and missing some subtlety above. > > Less major, and hopefully clearer, thoughts: > > dm-crypt already has a concept of the max number of pages allocatable > for currently active IOs -- specifically DM_CRYPT_MEMORY_PERCENT per > cpu. If you're trying to scale by amount of memory on the system, > perhaps going off DM_CRYPT_MEMORY_PERCENT is safer. I'm somewhat > baffled why that mechanism isn't enough for the observed problem, tbh. Perhaps we could lower DM_CRYPT_MEMORY_PERCENT, perhaps we could remove it and make a limit using a semaphore. > Perhaps it would be better to add a maximum allocatable objects > mechanism, if it's safe, to mempool, dm, or bioset. If it were in > bio_alloc_bioset, it could take advantage of the same rescuer > workqueue to keep from blocking, potentially. dm-crypt is currently doing it - see "ret = mempool_init(&cc->page_pool, BIO_MAX_PAGES, crypt_alloc_page, crypt_free_page, cc);" > Your patch format doesn't work when I try to apply them via git apply > patchfile for testing, while most patches do... Not sure if it's you > or me, but it seems most patches from git send-email / git > format-patch have some stats about inserts/deletes after the --- > marker, which marker seems missing from your messages. (Also, > traditionally, I think the changes between patch versions go after the > --- marker, so they don't go in the change description of the commit.) I use quilt, not git, to work on my patches. > Thanks! > > John Dorminy Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel