Sorry for the late response. I was on a vacation. On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote: > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote: > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote: > [...] > > > I see this point and I agree that using a specific zone might be a > > > _nicer_ solution in the end but you have to consider another aspects as > > > well. The main one I am worried about is a long term maintainability. > > > We are really out of page flags and consuming one for a rather specific > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid > > > of it and so we have that memory laying around unused all the time > > > and blocking one page flag bit. CMA falls into a similar category > > > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA > > > allocations in few years, yet we will have to fight to get rid of it > > > like we do with ZONE_DMA. And not only that. We will also have to fight > > > finding page flags for other more general usecases in the meantime. > > > > This maintenance problem is inherent. This problem exists even if we > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a > > future HW will not need CMA allocation in few years. The only > > difference is that one takes single zone bit only for CMA user and the > > other approach takes many hooks that we need to take care about it all > > the time. > > And I consider this a big difference. Because while hooks are not nice > they will affect CMA users (in a sense of bugs/performance etc.). While > an additional bit consumed will affect potential future and more generic > features. Because these hooks are so tricky and are spread on many places, bugs about these hooks can be made by *non-CMA* user and they hurt *CMA* user. These hooks could also delay non-CMA user's development speed since there are many hooks about CMA and understanding how CMA is managed is rather difficult. I think that this is a big maintenance overhead not only for CMA user but also for non-CMA user. So, I think that it can justify additional bit consumed. > > [...] > > > I believe that the overhead in the hot path is not such a big deal. We > > > have means to make it 0 when CMA is not used by jumplabels. I assume > > > that the vast majority of systems will not use CMA. Those systems which > > > use CMA should be able to cope with some slight overhead IMHO. > > > > Please don't underestimate number of CMA user. Most of android device > > uses CMA. So, there would be more devices using CMA than the server > > not using CMA. They also have a right to experience the best performance. > > This is not a fair comparison, though. Android development model is much > more faster and tend to not care about future maintainability at all. I > do not know about any android device that would run on a clean vanilla > kernel because vendors simply do not care enough (or have time) to put > the code into a proper shape to upstream it. I understand that this > model might work quite well for rapidly changing and moving mobile or > IoT segment but it is not the greatest fit to motivate the core kernel > subsystem development. We are not in the drivers space! > > [...] > > > This looks like a nice clean up. Those ifdefs are ugly as hell. One > > > could argue that some of that could be cleaned up by simply adding some > > > helpers (with a jump label to reduce the overhead), though. But is this > > > really strong enough reason to bring the whole zone in? I am not really > > > convinced to be honest. > > > > Please don't underestimate the benefit of this patchset. > > I have said that we need *more* hooks to fix all the problems. > > > > And, please think that this code removal is not only code removal but > > also concept removal. With this removing, we don't need to consider > > ALLOC_CMA for alloc_flags when calling zone_watermark_ok(). There are > > many bugs on it and it still remains. We don't need to consider > > pageblock migratetype when handling pageblock migratetype. We don't > > need to take a great care about calculating the number of CMA > > freepages. > > And all this can be isolated to CMA specific hooks with mostly minimum > impact to most users. I hear you saying that zone approach is more natural > and I would agree if we wouldn't have to care about the number of zones. I attach a solution about one more bit in page flags although I don't agree with your opinion that additional bit is no-go approach. Just note that we have already used three bits for zone encoding in some configuration due to ZONE_DEVICE. > > > > [...] > > > > > > > > Please do _not_ take this as a NAK from me. At least not at this time. I > > > > > am still trying to understand all the consequences but my intuition > > > > > tells me that building on top of highmem like approach will turn out to > > > > > be problematic in future (as we have already seen with the highmem and > > > > > movable zones) so this needs a very prudent consideration. > > > > > > > > I can understand that you are prudent to this issue. However, it takes more > > > > than two years and many people already expressed that ZONE approach is the > > > > way to go. > > > > > > I can see a single Acked-by and one Reviewed-by. It would be much more > > > convincing to see much larger support. Do not take me wrong I am not > > > trying to undermine the feedback so far but we should be clear about one > > > thing. CMA is mostly motivated by the industry which tries to overcome > > > HW limitations which can change in future very easily. I would rather > > > see good enough solution for something like that than a nicer solution > > > which is pushing additional burden on more general usecases. > > > > First of all, current MIGRATETYPE approach isn't good enough to me. > > They caused too many problems and there are many remanining problems. > > It will causes maintenance issue for a long time. > > > > And, although good enough solution can be better than nicer solution > > in some cases, it looks like current situation isn't that case. > > There is a single reason, saving page flag bit, to support good enough > > solution. > > > > I'd like to ask reversly. Is this a enough reason to make CMA user to > > suffer from bugs? > > No, but I haven't heard any single argument that those bugs are > impossible to fix with the current approach. They might be harder to fix > but if I can chose between harder for CMA and harder for other more > generic HW independent features I will go for the first one. And do not > take me wrong, I have nothing against CMA as such. It solves a real life > problem. I just believe it doesn't deserve to consume a new bit in page > flags because that is just too scarce resource. As I mentioned above, I think that maintenance overhead due to CMA deserves to consume a new bit in page flags. It also provide us extendability to introduce more zones in the future. Anyway, this value-judgement is subjective so I guess that we cannot agree with each other. To solve your concern, I make following solution. Please let me know your opinion about this. This patch can be applied on top of my ZONE_CMA series. Thanks. ------------------->8---------------------- >From bd8baee10fd83121b4172375524b97c3296a61e1 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Date: Wed, 10 May 2017 15:12:26 +0900 Subject: [RFC PATCH] mm/cma: don't use one more bit on page flags for ZONE_CMA This is just for showing my idea to solve the page flags problem due to ZONE_CMA. There is some consensus that ZONE_CMA is a nicer solution than MIGRATETYPE approach but there is also a worry about using one more bit on page flags. This patch tries to solve this worry. This patch is a temporary implementation and I will optimize it more if everyone agree with this approach. Adding a new zone (ZONE_CMA) needs one more bit on page flags in some configuration. This resource is very precious so that it's better not to consume it as much as possible. Therefore, this patch implements following tricky magic not to use one more bit. 1. If the number of the zone are five due to the ZONE_CMA, start encoding magic. 2. ZONE_MOVABLE is written on the pages for ZONE_CMA. Since we don't use ZONE_CMA (enum value 4) that needs one more bit for encoding, we can save one bit on page flags. 3. Check ZONE_MOVABLE when retrieving page's zone index. If the zone index is ZONE_MOVABLE, we use a special handler function to get the actual zone index. I found no regression on kernel build test with applying this magic. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> --- include/linux/cma.h | 24 ++++++++++++++++++++++++ include/linux/gfp.h | 24 ++++++++++++++++++++++-- include/linux/mm.h | 8 +++++++- include/linux/page-flags-layout.h | 15 ++++++++++++++- mm/cma.c | 19 +++++++++++++++++++ mm/page_alloc.c | 10 +++++++++- 6 files changed, 95 insertions(+), 5 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index 2433d5e..d36695c 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -3,6 +3,7 @@ #include <linux/init.h> #include <linux/types.h> +#include <linux/mmzone.h> /* * There is always at least global CMA area and a few optional @@ -34,9 +35,32 @@ extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); #ifdef CONFIG_CMA +extern unsigned cma_area_count; extern unsigned long cma_get_free(void); +extern bool is_zone_cma_page(const struct page *page); + +static inline enum zone_type page_zonenum_special(const struct page *page, + enum zone_type zone_type) +{ + if (!cma_area_count) + return zone_type; + + if (zone_type != ZONE_MOVABLE) + return zone_type; + + if (is_zone_cma_page(page)) + return ZONE_CMA; + + return ZONE_MOVABLE; +} + #else static inline unsigned long cma_get_free(void) { return 0; } +static inline enum zone_type page_zonenum_special(const struct page *page, + enum zone_type zone_type) +{ + return zone_type; +} #endif #endif diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 15987cc..bc5e443 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -348,13 +348,33 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) */ -#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4 +#if MAX_NR_ZONES <= 4 +#define GFP_ZONES_SHIFT ZONES_SHIFT +#elif (MAX_NR_ZONES-1) == 4 + +/* + * ZONE_CMA is encoded as ZONE_MOVABLE and ZONES_SHIFT would be one less + * than what GFP_ZONES_SHIFT usually needs. + */ +#if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_CMA) +#define GFP_ZONES_SHIFT ZONES_SHIFT + /* ZONE_DEVICE is not a valid GFP zone specifier */ -#define GFP_ZONES_SHIFT 2 +#elif defined(CONFIG_ZONE_DEVICE) +#define GFP_ZONES_SHIFT Z + +#elif defined(CONFIG_CMA) +#define GFP_ZONES_SHIFT (ZONES_SHIFT + 1) + +#else +#define GFP_ZONES_SHIFT ZONES_SHIFT +#endif + #else #define GFP_ZONES_SHIFT ZONES_SHIFT #endif + #if !defined(CONFIG_64BITS) && GFP_ZONES_SHIFT > 2 typedef unsigned long long GFP_ZONE_TABLE_TYPE; #else diff --git a/include/linux/mm.h b/include/linux/mm.h index 693dc6e..50228ef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -23,6 +23,7 @@ #include <linux/page_ext.h> #include <linux/err.h> #include <linux/page_ref.h> +#include <linux/cma.h> struct mempolicy; struct anon_vma; @@ -780,7 +781,12 @@ int finish_mkwrite_fault(struct vm_fault *vmf); static inline enum zone_type page_zonenum(const struct page *page) { - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; + enum zone_type zone_type = (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; + + if (ZONE_CMA_IN_PAGE_FLAGS) + return zone_type; + + return page_zonenum_special(page, zone_type); } #ifdef CONFIG_ZONE_DEVICE diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index 77b078c..8a4ae38 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -17,12 +17,25 @@ #define ZONES_SHIFT 1 #elif MAX_NR_ZONES <= 4 #define ZONES_SHIFT 2 + #elif MAX_NR_ZONES <= 8 -#define ZONES_SHIFT 3 + +#if defined(CONFIG_CMA) && (MAX_NR_ZONES-1) <= 4 +#define ZONES_SHIFT 2 +#define ZONE_CMA_IN_PAGE_FLAGS 0 #else +#define ZONES_SHIFT 3 +#endif + +#else /* MAX_NR_ZONES <= 8 */ + #error ZONES_SHIFT -- too many zones configured adjust calculation #endif +#ifndef ZONE_CMA_IN_PAGE_FLAGS +#define ZONE_CMA_IN_PAGE_FLAGS 1 +#endif + #ifdef CONFIG_SPARSEMEM #include <asm/sparsemem.h> diff --git a/mm/cma.c b/mm/cma.c index adfda1c..1833973 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -110,6 +110,25 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, mutex_unlock(&cma->lock); } +bool is_zone_cma_page(const struct page *page) +{ + struct cma *cma; + int i; + unsigned long pfn = page_to_pfn(page); + + for (i = 0; i < cma_area_count; i++) { + cma = &cma_areas[i]; + + if (pfn < cma->base_pfn) + continue; + + if (pfn < cma->base_pfn + cma->count) + return true; + } + + return false; +} + static int __init cma_activate_area(struct cma *cma) { int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cf29227..9399cc4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1596,6 +1596,14 @@ void __init init_cma_reserved_pageblock(struct page *page) unsigned long pfn = page_to_pfn(page); struct page *p = page; int nid = page_to_nid(page); + enum zone_type zone_type; + + /* + * In some cases, we don't have enough place to encode ZONE_CMA + * in page flags. Use ZONE_MOVABLE in this case and do + * post-processing when getting the page's zone. + */ + zone_type = ZONE_CMA_IN_PAGE_FLAGS ? ZONE_CMA : ZONE_MOVABLE; /* * ZONE_CMA will steal present pages from other zones by changing @@ -1609,7 +1617,7 @@ void __init init_cma_reserved_pageblock(struct page *page) set_page_count(p, 0); /* Steal pages from other zones */ - set_page_links(p, ZONE_CMA, nid, pfn); + set_page_links(p, zone_type, nid, pfn); } while (++p, ++pfn, --i); adjust_present_page_count(page, pageblock_nr_pages); -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>