Ingo Molnar wrote: > * Yinghai Lu <yinghai@xxxxxxxxxx> wrote: > >> Linus Torvalds wrote: >>> On Thu, 16 Apr 2009, Yinghai Lu wrote: >>>> please check. >>>> >>>> [PATCH] x86/pci: make pci_mem_start to be aligned only -v4 >>> I like the approach. That said, I think that rather than do the "modify >>> the e820 array" thing, why not just do it in the in the resource tree, and >>> do it at "e820_reserve_resources_late()" time? >>> >>> IOW, something like this. >>> >>> TOTALLY UNTESTED! The point is to take all RAM resources we haev, and >>> _after_ we've added all the resources we've seen in the E820 tree, we then >>> _also_ try to add fake reserved entries for any "round up to X" at the end >>> of the RAM resources. >>> >>> NOTE! I really didn't want to use "reserve_region_with_split()". I didn't >>> want to recurse into any conflicting resources, I really wanted to just do >>> the other failure cases. >>> >>> THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind >>> of thing. That includes the rough draft of how much to round things up to >>> based on where the end of RAM region is etc. I'm really throwing this out >>> more as a "wouldn't this be a readable way to handle any missing reserved >>> entries" kind of thing.. >>> >>> Linus >>> >>> --- >>> arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++ >>> 1 files changed, 34 insertions(+), 0 deletions(-) >>> >>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c >>> index ef2c356..e8b8d33 100644 >>> --- a/arch/x86/kernel/e820.c >>> +++ b/arch/x86/kernel/e820.c >>> @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void) >>> } >>> } >>> >>> +/* How much should we pad RAM ending depending on where it is? */ >>> +static unsigned long ram_alignment(resource_size_t pos) >>> +{ >>> + unsigned long mb = pos >> 20; >>> + >>> + /* To 64kB in the first megabyte */ >>> + if (!mb) >>> + return 64*1024; >>> + >>> + /* To 1MB in the first 16MB */ >>> + if (mb < 16) >>> + return 1024*1024; >>> + >>> + /* To 32MB for anything above that */ >>> + return 32*1024*1024; >>> +} >>> + >>> void __init e820_reserve_resources_late(void) >>> { >>> int i; >>> @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void) >>> insert_resource_expand_to_fit(&iomem_resource, res); >>> res++; >>> } >>> + >>> + /* >>> + * Try to bump up RAM regions to reasonable boundaries to >>> + * avoid stolen RAM >>> + */ >>> + for (i = 0; i < e820.nr_map; i++) { >>> + struct e820entry *entry = &e820_saved.map[i]; >>> + resource_size_t start, end; >>> + >>> + if (entry->type != E820_RAM) >>> + continue; >>> + start = entry->addr + entry->size; >>> + end = round_up(start, ram_alignment(start)); >>> + if (start == end) >>> + continue; >>> + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); >>> + } >>> } >>> >>> char *__init default_machine_specific_memory_setup(void) >> except need to change >>> + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); >> ==> > + reserve_region_with_split(&iomem_resource, start, end - 1, "RAM buffer"); >> >> it will make sure dynmical allocating code will not use those range. >> >> and could make e820_setup_gap much simple. >> >> --- >> arch/x86/kernel/e820.c | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> Index: linux-2.6/arch/x86/kernel/e820.c >> =================================================================== >> --- linux-2.6.orig/arch/x86/kernel/e820.c >> +++ linux-2.6/arch/x86/kernel/e820.c >> @@ -635,14 +635,12 @@ __init void e820_setup_gap(void) >> #endif >> >> /* >> - * See how much we want to round up: start off with >> - * rounding to the next 1MB area. >> + * e820_reserve_resources_late will protect stolen RAM >> + * so just round it to 1M >> */ >> round = 0x100000; >> - while ((gapsize >> 4) > round) >> - round += round; >> - /* Fun with two's complement */ >> - pci_mem_start = (gapstart + round) & -round; >> + >> + pci_mem_start = roundup(gapstart, round); >> >> printk(KERN_INFO >> "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", >> >> Ingo, can you put those two patches in tip? > > I think the point would be to explore the possibility to have no > 'gap' logic at all - we should extend the e820 table with Linus's > scheme to add 'RAM buffer' entries. > so you prefer the old one aka the -v4, and add new entry type for RAM Buffer? YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html