On Mon 23-02-15 11:45:21, Dave Chinner wrote: [...] > A reserve memory pool is no different - every time a memory reserve > occurs, a watermark is lifted to accommodate it, and the transaction > is not allowed to proceed until the amount of free memory exceeds > that watermark. The memory allocation subsystem then only allows > *allocations* marked correctly to allocate pages from that the > reserve that watermark protects. e.g. only allocations using > __GFP_RESERVE are allowed to dip into the reserve pool. The idea is sound. But I am pretty sure we will find many corner cases. E.g. what if the mere reservation attempt causes the system to go OOM and trigger the OOM killer? Sure that wouldn't be too much different from the OOM triggered during the allocation but there is one major difference. Reservations need to be estimated and I expect the estimation would be on the more conservative side and so the OOM might not happen without them. > By using watermarks, freeing of memory will automatically top > up the reserve pool which means that we guarantee that reclaimable > memory allocated for demand paging during transacitons doesn't > deplete the reserve pool permanently. As a result, when there is > plenty of free and/or reclaimable memory, the reserve pool > watermarks will have almost zero impact on performance and > behaviour. Typical busy system won't be very far away from the high watermark so there would be a reclaim performed during increased watermaks (aka reservation) and that might lead to visible performance degradation. This might be acceptable but it also adds a certain level of unpredictability when performance characteristics might change suddenly. > Further, because it's just accounting and behavioural thresholds, > this allows the mm subsystem to control how the reserve pool is > accounted internally. e.g. clean, reclaimable pages in the page > cache could serve as reserve pool pages as they can be immediately > reclaimed for allocation. But they also can turn into hard/impossible to reclaim as well. Clean pages might get dirty and e.g. swap backed pages run out of their backing storage. So I guess we cannot count with those pages without reclaiming them first and hiding them into the reserve. Which is what you suggest below probably but I wasn't really sure... > This could be acheived by setting reclaim targets first to the reserve > pool watermark, then the second target is enough pages to satisfy the > current allocation. > > And, FWIW, there's nothing stopping this mechanism from have order > based reserve thresholds. e.g. IB could really do with a 64k reserve > pool threshold and hence help solve the long standing problems they > have with filling the receive ring in GFP_ATOMIC context... > > Sure, that's looking further down the track, but my point still > remains: we need a viable long term solution to this problem. Maybe > reservations are not the solution, but I don't see anyone else who > is thinking of how to address this architectural problem at a system > level right now. I think the idea is good! It will just be quite tricky to get there without causing more problems than those being solved. The biggest question mark so far seems to be the reservation size estimation. If it is hard for any caller to know the size beforehand (which would be really close to the actually used size) then the whole complexity in the code sounds like an overkill and asking administrator to tune min_free_kbytes seems a better fit (we would still have to teach the allocator to access reserves when really necessary) because the system would behave more predictably (although some memory would be wasted). > We need to design and document the model first, then review it, then > we can start working at the code level to implement the solution we've > designed. I have already asked James to add this on LSF agenda but nothing has materialized on the schedule yet. I will poke him again. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>