Michal Hocko wrote: > On Wed 23-11-16 23:35:10, Tetsuo Handa wrote: > > If __alloc_pages_nowmark() called by __GFP_NOFAIL could not find pages > > with requested order due to fragmentation, __GFP_NOFAIL should invoke > > the OOM killer. I believe that risking kill all processes and panic the > > system eventually is better than __GFP_NOFAIL livelock. > > I violently disagree. Just imagine a driver which asks for an order-9 > page and cannot really continue without it so it uses GFP_NOFAIL. There > is absolutely no reason to disrupt or even put the whole system down > just because of this particular request. It might take for ever to > continue but that is to be expected when asking for such a hard > requirement. Did we find such in-tree drivers? If any, we likely already know it via WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue(). Even if there were such out-of-tree drivers, we don't need to take care of out-of-tree drivers. > > Unfortunately, there seems to be cases where the > > caller needs to use GFP_NOFS rather than GFP_KERNEL due to unclear dependency > > between memory allocation by system calls and memory reclaim by filesystems. > > I do not understand your point here. Syscall is an entry point to the > kernel where we cannot recurse to the FS code so GFP_NOFS seems wrong > thing to ask. Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit 869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from re-entering the fs code") ? My understanding is that mkdir() system call caused memory allocation for inode creation and that memory allocation caused memory reclaim which had to be !__GFP_FS. And whether we need to use GFP_NOFS at specific point is very very unclear. For example, security_inode_init_security() calls call_int_hook() macro which calls smack_inode_init_security() if Smack is active. smack_inode_init_security() uses GFP_NOFS for memory allocation. security_inode_init_security() also calls evm_inode_init_security(), and evm_inode_init_security() uses GFP_NOFS for memory allocation. Looks consistent? Yes. But evm_inode_init_security() also calls evm_init_hmac() which in turn calls init_desc() which uses GFP_KERNEL for memory allocation. This is not consistent. And security_inode_init_security() also calls initxattrs() callback which is provided by filesystem code. For example, btrfs_initxattrs() is called if security_inode_init_security() is called by btrfs. And btrfs_initxattrs() is using GFP_KERNEL for memory allocation. This is not consistent too. Either we are needlessly using GFP_NOFS with risk of retry-forever loop or we are wrongly using GFP_KERNEL with risk of memory reclaim deadlock. Apart from we need to make these GFP_NOFS/GFP_KERNEL usages consistent (although whether we need to use GFP_NOFS is very very unclear), I do want to allow memory allocations from functions which are called by system calls to invoke the OOM-killer (e.g. __GFP_MAY_OOMKILL) rather than risk retry-forever loop (or fail that request) even if we need to use GFP_NOFS. Also, I'm willing to give up memory allocations from functions which are called by system calls if SIGKILL is pending (i.e. __GFP_KILLABLE). Did you understand my point? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>