* Yinghai Lu <yinghai@xxxxxxxxxx> wrote: > Linus Torvalds wrote: > > > > On Thu, 16 Apr 2009, Yinghai Lu wrote: > >> please check. > >> > >> [PATCH] x86/pci: make pci_mem_start to be aligned only -v4 > > > > I like the approach. That said, I think that rather than do the "modify > > the e820 array" thing, why not just do it in the in the resource tree, and > > do it at "e820_reserve_resources_late()" time? > > > > IOW, something like this. > > > > TOTALLY UNTESTED! The point is to take all RAM resources we haev, and > > _after_ we've added all the resources we've seen in the E820 tree, we then > > _also_ try to add fake reserved entries for any "round up to X" at the end > > of the RAM resources. > > > > NOTE! I really didn't want to use "reserve_region_with_split()". I didn't > > want to recurse into any conflicting resources, I really wanted to just do > > the other failure cases. > > > > THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind > > of thing. That includes the rough draft of how much to round things up to > > based on where the end of RAM region is etc. I'm really throwing this out > > more as a "wouldn't this be a readable way to handle any missing reserved > > entries" kind of thing.. > > > > Linus > > > > --- > > arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++ > > 1 files changed, 34 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > > index ef2c356..e8b8d33 100644 > > --- a/arch/x86/kernel/e820.c > > +++ b/arch/x86/kernel/e820.c > > @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void) > > } > > } > > > > +/* How much should we pad RAM ending depending on where it is? */ > > +static unsigned long ram_alignment(resource_size_t pos) > > +{ > > + unsigned long mb = pos >> 20; > > + > > + /* To 64kB in the first megabyte */ > > + if (!mb) > > + return 64*1024; > > + > > + /* To 1MB in the first 16MB */ > > + if (mb < 16) > > + return 1024*1024; > > + > > + /* To 32MB for anything above that */ > > + return 32*1024*1024; > > +} > > + > > void __init e820_reserve_resources_late(void) > > { > > int i; > > @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void) > > insert_resource_expand_to_fit(&iomem_resource, res); > > res++; > > } > > + > > + /* > > + * Try to bump up RAM regions to reasonable boundaries to > > + * avoid stolen RAM > > + */ > > + for (i = 0; i < e820.nr_map; i++) { > > + struct e820entry *entry = &e820_saved.map[i]; > > + resource_size_t start, end; > > + > > + if (entry->type != E820_RAM) > > + continue; > > + start = entry->addr + entry->size; > > + end = round_up(start, ram_alignment(start)); > > + if (start == end) > > + continue; > > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); > > + } > > } > > > > char *__init default_machine_specific_memory_setup(void) > > except need to change > > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); > ==> > + reserve_region_with_split(&iomem_resource, start, end - 1, "RAM buffer"); > > it will make sure dynmical allocating code will not use those range. > > and could make e820_setup_gap much simple. > > --- > arch/x86/kernel/e820.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > Index: linux-2.6/arch/x86/kernel/e820.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/e820.c > +++ linux-2.6/arch/x86/kernel/e820.c > @@ -635,14 +635,12 @@ __init void e820_setup_gap(void) > #endif > > /* > - * See how much we want to round up: start off with > - * rounding to the next 1MB area. > + * e820_reserve_resources_late will protect stolen RAM > + * so just round it to 1M > */ > round = 0x100000; > - while ((gapsize >> 4) > round) > - round += round; > - /* Fun with two's complement */ > - pci_mem_start = (gapstart + round) & -round; > + > + pci_mem_start = roundup(gapstart, round); > > printk(KERN_INFO > "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", > > Ingo, can you put those two patches in tip? I think the point would be to explore the possibility to have no 'gap' logic at all - we should extend the e820 table with Linus's scheme to add 'RAM buffer' entries. That way, if you search for a sufficient size hole, it will be correctly aligned straight away - with no rounding/gap logic at all. (Maybe add a debug warning to catch when this is not the case.) Am i missing something? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html