Nick Piggin wrote:
So assuming there is no reasonable way to do out of core algorithms on the filesystem metadata (and likely you don't want to anyway because it would be a significant slowdown or diverge of code paths), you still only need to reserve one set of those 30-40 pages for the entire kernel. You only ever need to reserve enough memory for a *single* page to be processed. In the worst case that there are multiple pages under writeout but can't allocate memory, only one will be allowed access to reserves and the others will block until it is finished and can unpin them all.
Sure, nobody will mind seeing lots of extra pinned memory ;) Don't forget to add the space for data transforms and raid driver operations in the write stack, and whatever else we may not have thought of. With good engineering we can make it so "we can always make forward progress". But it won't matter because once a real user drives the system off this cliff there is no difference between "hung" and "really slow progress". They are going to crash it and report a hang.
Well I'm not saying it is an immediate problem or it would be a good use of anybody's time to rush out and try to redesign their fs code to fix it ;) But at least for any new core/generic library functionality like fsblock, it would be silly not to close the hole there (not least because the problem is simpler here than in a complex fs).
Hey, I appreciate anything you do in VM to make the ugly dance with filesystems (my area) a little less ugly. I'm sure you also appreciate that every time VM tries to save 32 bytes, someone else tries to take 32 K-bytes. As they say... memory is cheap :) jim -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html