On Fri, Aug 21, 2015 at 03:42:21PM +0200, Vlastimil Babka wrote: > On 08/12/2015 12:45 PM, Mel Gorman wrote: > >__GFP_WAIT has been used to identify atomic context in callers that hold > >spinlocks or are in interrupts. They are expected to be high priority and > >have access one of two watermarks lower than "min". __GFP_HIGH users get > >access to the first lower watermark and can be called the "high priority > >reserve". Atomic users and interrupts access yet another lower watermark > >that can be called the "atomic reserve". > > > >Over time, callers had a requirement to not block when fallback options > >were available. Some have abused __GFP_WAIT leading to a situation where > >an optimisitic allocation with a fallback option can access atomic reserves. > > > >This patch uses __GFP_ATOMIC to identify callers that are truely atomic, > >cannot sleep and have no alternative. High priority users continue to use > >__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are > >willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers > >that want to wake kswapd for background reclaim. __GFP_WAIT is redefined > >as a caller that is willing to enter direct reclaim and wake kswapd for > >background reclaim. > > > >This patch then converts a number of sites > > > >o __GFP_ATOMIC is used by callers that are high priority and have memory > > pools for those requests. GFP_ATOMIC uses this flag. Callers with > > interrupts disabled still automatically use the atomic reserves. > > Hm I can't see where the latter happens? In gfp_to_alloc_flags(), > ALLOC_HARDER is set for __GFP_ATOMIC, or rt-tasks *not* in interrupt? What > am I missing? > It was a mistake from an earlier version of the patch that was itself buggy. I forgot to fix the changelog properly. > >o Callers that have a limited mempool to guarantee forward progress use > > __GFP_DIRECT_RECLAIM. bio allocations fall into this category where > > kswapd will still be woken but atomic reserves are not used as there > > is a one-entry mempool to guarantee progress. > > > >o Callers that are checking if they are non-blocking should use the > > helper gfpflags_allows_blocking() where possible. This is because > > A bit subjective but gfpflags_allow_blocking() sounds better to me. > Or shorter gfp_allows_blocking()? > I'll use gfpflags_allow_blocking. > > checking for __GFP_WAIT as was done historically now can trigger false > > positives. Some exceptions like dm-crypt.c exist where the code intent > > is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to > > flag manipulations. > > > >The key hazard to watch out for is callers that removed __GFP_WAIT and > >was depending on access to atomic reserves for inconspicuous reasons. > >In some cases it may be appropriate for them to use __GFP_HIGH. > > Hm we might also have a (non-fatal) hazard of callers that directly combined > __GFP_* flags that didn't include __GFP_WAIT, but did wake up kswapd, and > now might be missing __GFP_KSWAPD_RECLAIM. Did you try checking for those? I > imagine it's not a simple task... > I hadn't searched but there are a small number of callers that potentially care. I fixed them. > >Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > > > > >diff --git a/Documentation/vm/balance b/Documentation/vm/balance > >index c46e68cf9344..6f1f6fae30f5 100644 > >--- a/Documentation/vm/balance > >+++ b/Documentation/vm/balance > >@@ -1,12 +1,14 @@ > > Started Jan 2000 by Kanoj Sarcar <kanoj@xxxxxxx> > > > >-Memory balancing is needed for non __GFP_WAIT as well as for non > >-__GFP_IO allocations. > >+Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as > >+well as for non __GFP_IO allocations. > > > >-There are two reasons to be requesting non __GFP_WAIT allocations: > >-the caller can not sleep (typically intr context), or does not want > >-to incur cost overheads of page stealing and possible swap io for > >-whatever reasons. > >+The first reason why a caller may avoid reclaim is that the caller can not > >+sleep due to holding a spinlock or is in interrupt context. The second may > >+be that the caller is willing to fail the allocation without incurring the > >+overhead of page stealing. This may happen for opportunistic high-order > > I think "page stealing" has nowadays a different meaning in the > anti-fragmentation context? Should it just say "reclaim"? > Good point, corrected. > >+allocation requests that have order-0 fallback options. In such cases, > >+the caller may also wish to avoid waking kswapd. > > > > __GFP_IO allocation requests are made to prevent file system deadlocks. > > > >diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > >index cba12f34ff77..100d3fbaebae 100644 > >--- a/arch/arm/mm/dma-mapping.c > >+++ b/arch/arm/mm/dma-mapping.c > >@@ -650,7 +650,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, > > > > if (is_coherent || nommu()) > > addr = __alloc_simple_buffer(dev, size, gfp, &page); > >- else if (!(gfp & __GFP_WAIT)) > >+ else if (gfp & __GFP_ATOMIC) > > addr = __alloc_from_pool(size, &page); > > else if (!dev_get_cma_area(dev)) > > addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller, want_vaddr); > >@@ -1369,7 +1369,7 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, > > *handle = DMA_ERROR_CODE; > > size = PAGE_ALIGN(size); > > > >- if (!(gfp & __GFP_WAIT)) > >+ if (gfp & __GFP_ATOMIC) > > return __iommu_alloc_atomic(dev, size, handle); > > > > /* > >diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > >index d16a1cead23f..713d963fb96b 100644 > >--- a/arch/arm64/mm/dma-mapping.c > >+++ b/arch/arm64/mm/dma-mapping.c > >@@ -100,7 +100,7 @@ static void *__dma_alloc_coherent(struct device *dev, size_t size, > > if (IS_ENABLED(CONFIG_ZONE_DMA) && > > dev->coherent_dma_mask <= DMA_BIT_MASK(32)) > > flags |= GFP_DMA; > >- if (IS_ENABLED(CONFIG_DMA_CMA) && (flags & __GFP_WAIT)) { > >+ if (IS_ENABLED(CONFIG_DMA_CMA) && (flags & __GFP_DIRECT_RECLAIM)) { > > struct page *page; > > void *addr; > > > >@@ -147,7 +147,7 @@ static void *__dma_alloc(struct device *dev, size_t size, > > > > size = PAGE_ALIGN(size); > > > >- if (!coherent && !(flags & __GFP_WAIT)) { > >+ if (!coherent && (flags & __GFP_ATOMIC)) { > > struct page *page = NULL; > > void *addr = __alloc_from_pool(size, &page, flags); > > > > Hmm these change the lack of __GFP_WAIT to expect __GFP_ATOMIC, so it's > potentially one of those "key hazards" mentioned in the changelog, right? > But here it's not just about using atomic reserves, but using a completely > different allocation function. > E.g. in case of arch/arm/mm/dma-mapping.c:__dma_alloc() I see it can go to > __alloc_remap_buffer -> __dma_alloc_remap -> dma_common_contiguous_remap > which does kmalloc(..., GFP_KERNEL) and has comment "Cannot be used in > non-sleeping contexts". > I completely missed that. It needs to be a check for __GFP_DIRECT_RECLAIM and similar in the other arm file. > So I think callers that cannot sleep and did clear __GFP_WAIT before, are > now dangerous unless they set __GFP_ATOMIC? > > >diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c > >index 2a3973a7c441..dc611c8cad10 100644 > >--- a/drivers/firewire/core-cdev.c > >+++ b/drivers/firewire/core-cdev.c > >@@ -486,7 +486,7 @@ static int ioctl_get_info(struct client *client, union ioctl_arg *arg) > > static int add_client_resource(struct client *client, > > struct client_resource *resource, gfp_t gfp_mask) > > { > >- bool preload = !!(gfp_mask & __GFP_WAIT); > >+ bool preload = !!(gfp_mask & __GFP_DIRECT_RECLAIM); > > Use the helper here to avoid !! as a bonus? > Done. > >--- a/drivers/infiniband/core/sa_query.c > >+++ b/drivers/infiniband/core/sa_query.c > >@@ -619,7 +619,7 @@ static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent) > > > > static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) > > { > >- bool preload = !!(gfp_mask & __GFP_WAIT); > >+ bool preload = !!(gfp_mask & __GFP_DIRECT_RECLAIM); > > unsigned long flags; > > int ret, id; > > > > Same here. > Done. > >diff --git a/drivers/usb/host/u132-hcd.c b/drivers/usb/host/u132-hcd.c > >index d51687780b61..06badad3ab75 100644 > >--- a/drivers/usb/host/u132-hcd.c > >+++ b/drivers/usb/host/u132-hcd.c > >@@ -2247,7 +2247,7 @@ static int u132_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, > > { > > struct u132 *u132 = hcd_to_u132(hcd); > > if (irqs_disabled()) { > >- if (__GFP_WAIT & mem_flags) { > >+ if (__GFP_DIRECT_RECLAIM & mem_flags) { > > printk(KERN_ERR "invalid context for function that migh" > > "t sleep\n"); > > return -EINVAL; > > And here - no other flag manipulations and it would match the printk. Fixed > >--- a/fs/btrfs/extent_io.c > >+++ b/fs/btrfs/extent_io.c > >@@ -594,7 +594,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, > > if (bits & (EXTENT_IOBITS | EXTENT_BOUNDARY)) > > clear = 1; > > again: > >- if (!prealloc && (mask & __GFP_WAIT)) { > >+ if (!prealloc && (mask & __GFP_DIRECT_RECLAIM)) { > > /* > > * Don't care for allocation failure here because we might end > > * up not needing the pre-allocated extent state at all, which > >@@ -850,7 +850,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, > > > > bits |= EXTENT_FIRST_DELALLOC; > > again: > >- if (!prealloc && (mask & __GFP_WAIT)) { > >+ if (!prealloc && (mask & __GFP_DIRECT_RECLAIM)) { > > prealloc = alloc_extent_state(mask); > > BUG_ON(!prealloc); > > } > >@@ -1076,7 +1076,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, > > btrfs_debug_check_extent_io_range(tree, start, end); > > > > again: > >- if (!prealloc && (mask & __GFP_WAIT)) { > >+ if (!prealloc && (mask & __GFP_DIRECT_RECLAIM)) { > > /* > > * Best effort, don't worry if extent state allocation fails > > * here for the first iteration. We might have a cached state > >@@ -4265,7 +4265,7 @@ int try_release_extent_mapping(struct extent_map_tree *map, > > u64 start = page_offset(page); > > u64 end = start + PAGE_CACHE_SIZE - 1; > > > >- if ((mask & __GFP_WAIT) && > >+ if ((mask & __GFP_DIRECT_RECLAIM) && > > page->mapping->host->i_size > 16 * 1024 * 1024) { > > u64 len; > > while (start <= end) { > > Why not here as well. > Done > >--- a/kernel/smp.c > >+++ b/kernel/smp.c > >@@ -669,7 +669,7 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info), > > cpumask_var_t cpus; > > int cpu, ret; > > > >- might_sleep_if(gfp_flags & __GFP_WAIT); > >+ might_sleep_if(gfp_flags & __GFP_DIRECT_RECLAIM); > > > > if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) { > > preempt_disable(); > >diff --git a/lib/idr.c b/lib/idr.c > >index 5335c43adf46..e5118fc82961 100644 > >--- a/lib/idr.c > >+++ b/lib/idr.c > >@@ -399,7 +399,7 @@ void idr_preload(gfp_t gfp_mask) > > * allocation guarantee. Disallow usage from those contexts. > > */ > > WARN_ON_ONCE(in_interrupt()); > >- might_sleep_if(gfp_mask & __GFP_WAIT); > >+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > > > preempt_disable(); > > > >@@ -453,7 +453,7 @@ int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask) > > struct idr_layer *pa[MAX_IDR_LEVEL + 1]; > > int id; > > > >- might_sleep_if(gfp_mask & __GFP_WAIT); > >+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > > > /* sanity checks */ > > if (WARN_ON_ONCE(start < 0)) > >diff --git a/lib/radix-tree.c b/lib/radix-tree.c > >index f9ebe1c82060..cc5fdc3fb734 100644 > >--- a/lib/radix-tree.c > >+++ b/lib/radix-tree.c > >@@ -188,7 +188,7 @@ radix_tree_node_alloc(struct radix_tree_root *root) > > * preloading in the interrupt anyway as all the allocations have to > > * be atomic. So just do normal allocation when in interrupt. > > */ > >- if (!(gfp_mask & __GFP_WAIT) && !in_interrupt()) { > >+ if (!(gfp_mask & __GFP_DIRECT_RECLAIM) && !in_interrupt()) { > > struct radix_tree_preload *rtp; > > > > /* > > These too? > Yep > >diff --git a/mm/backing-dev.c b/mm/backing-dev.c > >index dac5bf59309d..2056d16807de 100644 > >--- a/mm/backing-dev.c > >+++ b/mm/backing-dev.c > >@@ -632,7 +632,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, > > { > > struct bdi_writeback *wb; > > > >- might_sleep_if(gfp & __GFP_WAIT); > >+ might_sleep_if(gfp & __GFP_DIRECT_RECLAIM); > > > > if (!memcg_css->parent) > > return &bdi->wb; > > ditto > Indeed. > >--- a/mm/page_alloc.c > >+++ b/mm/page_alloc.c > >@@ -2143,7 +2143,7 @@ static bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) > > return false; > > if (fail_page_alloc.ignore_gfp_highmem && (gfp_mask & __GFP_HIGHMEM)) > > return false; > >- if (fail_page_alloc.ignore_gfp_wait && (gfp_mask & __GFP_WAIT)) > >+ if (fail_page_alloc.ignore_gfp_wait && (gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > > Should __GFP_ATOMIC really be here? > I felt it was safer because it felt in line with the intent of alloc.ignore_gfp_wait. > >diff --git a/net/sctp/associola.c b/net/sctp/associola.c > >index 197c3f59ecbf..c5fcdd6f85b7 100644 > >--- a/net/sctp/associola.c > >+++ b/net/sctp/associola.c > >@@ -1588,7 +1588,7 @@ int sctp_assoc_lookup_laddr(struct sctp_association *asoc, > > /* Set an association id for a given association */ > > int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp) > > { > >- bool preload = !!(gfp & __GFP_WAIT); > >+ bool preload = !!(gfp & __GFP_DIRECT_RECLAIM); > > int ret; > > > > /* If the id is already assigned, keep it. */ > > helper? > Yes. Thanks very much -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>