Hi. I'm presently trying to fix a couple issues in a kernel module which shares some properties with DM. It's called blktap and used quite extensively in Xen. It's basically I/O virtualization in userspace, and so it forwards I/O on block devices to a userspace app. The userspace part would commonly translate requests to one or a number of disk nodes. The common base is stacking devices. Without precautions, the result is a number of deadlock hazards when memory congestion comes into play. Can maybe someone help me on how this was dealt with in DM? I couldn't explain a couple things looking at DM code so I'm wondering if maybe even DM still has a problem. It's mainly about mempools involved. Not necessarily limited to bio_alloc. I found a couple DM patch on those matters, eg. http://www.spinics.net/lists/dm-devel/msg03578.html So one obvious problem is bio allocation above and below the upper level request queue. (That's the one addressed above.) If both ends allocate from the same bio pool, the upper layer exhausts it, and free memory is short, then the the lower levels will starve, and both get stuck that way. The way this is commonly dealt with is to separate biosets between layers, which ensures that both can always make progress. DM does it, as does blk-core. Cool. Now, one potential problem I still see is is the following: Imagine a large number of dirty pages over a DM node. So some thread starts queueing those pages. Requests get translated, translated request are allocated. When allocating from the pooled objects, I can see stuff like the following happen. All in mempool_alloc. 1. First iteration is set ~__GFP_WAIT. 2. Still no memory, so it fails, and falls back to the pool. 3. pool curr_nr is 0, so we got to sleep on pool->wait. 4. I/O was in flight, and will complete, so once objects get returned, pool->wait wakes us. Now the interesting bit: 5. Next iteration resets __GFP_WAIT to 1. 6. The mempool retries the (slab) allocator first, not the pool. Seen on 2.6.32, but I don't think that code moved a lot recently. So I got two questions: When retrying, is the __GFP_WAIT being reset even desirable? It means the calling thread is likely to wait on disk I/O. The pool is known to have seen a refill through mempool_free, so that waiting pool->alloc can get much slower than wanted. The pool->alloc itself is fine, it's that wait bit which scares me. Second, when waiting, how does DM make sure the private bioset allocations never block on a page queued on its own device? That would be a (potential) deadlock scenario again, the most simple case is when that page entry directly depends on the lower level object to make progress. To me this all seems to boil down to that gfp_temp = gfp_mask line in mempool_alloc. Any good idea on this would be very much appreciated. Thanks. Daniel -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel