Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote:
> On Fri, Aug 14, 2015 at 3:33 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > On Fri, Aug 14, 2015 at 3:06 PM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote:
> >> On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote:
> >>> On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote:
> >>> > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote:
> > [..]
> >>> > What is the rational for not updating max_pfn, max_low_pfn, ... ?
> >>> >
> >>>
> >>> The idea is that this memory is not meant to be available to the page
> >>> allocator and should not count as new memory capacity.  We're only
> >>> hotplugging it to get struct page coverage.
> >>
> >> But this sounds bogus to me to rely on max_pfn to stay smaller than
> >> first_dev_pfn.  For instance you might plug a device that register
> >> dev memory and then some regular memory might be hotplug, effectively
> >> updating max_pfn to a value bigger than first_dev_pfn.
> >>
> >
> > True.
> >
> >> Also i do not think that the buddy allocator use max_pfn or max_low_pfn
> >> to consider page/zone for allocation or not.
> >
> > Yes, I took it out with no effects.  I'll investigate further whether
> > we should be touching those variables or not for this new usage.
> 
> Although it does not offer perfect protection if device memory is at a
> physically lower address than RAM, skipping the update of these
> variables does seem to be what we want.  For example /dev/mem would
> fail to allow write access to persistent memory if it fails a
> valid_phys_addr_range() check.  Since /dev/mem does not know how to
> write to PMEM in a reliably persistent way, it should not treat a
> PMEM-pfn like RAM.

So i attach is a patch that should keep ZONE_DEVICE out of consideration
for the buddy allocator. You might also want to keep page reserved and not
free inside the zone, you could replace the generic_online_page() using
set_online_page_callback() while hotpluging device memory.

Regarding /dev/mem i would not worry about highmem, as /dev/mem is already
broken in respect to memory hole that might exist (at least that is my
understanding). Alternatively if you really care about /dev/mem you could
add an arch valid_phys_addr_range() that could check valid zone.

Cheers,
Jérôme
>From 45976e1186eee45ecb277fe5293a7cfa7466d740 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@xxxxxxxxxx>
Date: Mon, 17 Aug 2015 17:31:27 -0400
Subject: [PATCH] mm/ZONE_DEVICE: Keep ZONE_DEVICE out of allocation zonelist.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Memory inside a ZONE_DEVICE should never be consider by the buddy
allocator and thus any such zone should never be added to any of
the zonelist. This patch just do that.

Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx>
---
 mm/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef19f22..f3e26de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,13 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
 	do {
 		zone_type--;
 		zone = pgdat->node_zones + zone_type;
+		/*
+		 * Device zone is special memory and should never be consider
+		 * for regular allocation. It is expected that page in device
+		 * zone will be allocated by other means.
+		 */
+		if (is_dev_zone(zone))
+			continue;
 		if (populated_zone(zone)) {
 			zoneref_set_zone(zone,
 				&zonelist->_zonerefs[nr_zones++]);
-- 
1.8.3.1


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]