Michal Hocko wrote: > On Mon 16-03-15 15:38:43, Andrew Morton wrote: > > Realistically, I don't think this overall effort will be successful - > > we'll add the knob, it won't get enough testing and any attempt to > > alter the default will be us deliberately destabilizing the kernel > > without knowing how badly :( > > Without the knob we do not allow users to test this at all though and > the transition will _never_ happen. Which is IMHO bad. > Even with the knob, quite little users will test this. The consequence is likely that end users rush into customer support center about obscure bugs. I'm working at a support center, and such bugs are really annoying. > > I wonder if we can alter the behaviour only for filesystem code, so we > > constrain the new behaviour just to that code where we're having > > problems. Most/all fs code goes via vfs methods so there's a reasonably > > small set of places where we can call > > We are seeing issues with the fs code now because the test cases which > led to the current discussion exercise FS code. The code which does > lock(); kmalloc(GFP_KERNEL) is not reduced there though. I am pretty sure > we can find other subsystems if we try hard enough. I'm expecting for patches which avoids deadlock by lock(); kmalloc(GFP_KERNEL). > > static inline void enter_fs_code(struct super_block *sb) > > { > > if (sb->my_small_allocations_can_fail) > > current->small_allocations_can_fail++; > > } > > > > that way (or something similar) we can select the behaviour on a per-fs > > basis and the rest of the kernel remains unaffected. Other subsystems > > can opt in as well. > > This is basically leading to GFP_MAYFAIL which is completely backwards > (the hard requirement should be an exception not a default rule). > I really do not want to end up with stuffing random may_fail annotations > all over the kernel. > I wish that GFP_NOFS / GFP_NOIO regions are annotated with static inline void enter_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code++; #endif } static inline void leave_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code--; #endif } static inline void enter_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code++; #endif } static inline void leave_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code--; #endif } so that inappropriate GFP_KERNEL usage inside GFP_NOFS region are catchable by doing struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; +#ifdef CONFIG_DEBUG_GFP_FLAGS + WARN_ON(current->in_fs_code & (gfp_mask & __GFP_FS)); + WARN_ON(current->in_io_code & (gfp_mask & __GFP_IO)); +#endif lockdep_trace_alloc(gfp_mask); . It is difficult for non-fs developers to determine whether they need to use GFP_NOFS than GFP_KERNEL in their code. An example is seen at http://marc.info/?l=linux-security-module&m=138556479607024&w=2 . Moreover, I don't know how GFP flags are managed when stacked like "a swap file on ext4 on top of LVM (with snapshots) on a RAID array connected over iSCSI" (quoted from comments on Jon's writeup), but I wish that the distinction between GFP_KERNEL / GFP_NOFS / GFP_NOIO are removed from memory allocating function callers by doing static inline void enter_fs_code(void) { current->in_fs_code++; } static inline void leave_fs_code(void) { current->in_fs_code--; } static inline void enter_io_code(void) { current->in_io_code++; } static inline void leave_io_code(void) { current->in_io_code--; } struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; + if (current->in_fs_code) + gfp_mask &= ~__GFP_FS; + if (current->in_io_code) + gfp_mask &= ~__GFP_IO; lockdep_trace_alloc(gfp_mask); so that GFP flags passed to memory allocations involved by stacking will be appropriately masked. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>