* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 16 Apr 2009, Yinghai Lu wrote: > > > > please check. > > > > [PATCH] x86/pci: make pci_mem_start to be aligned only -v4 > > I like the approach. That said, I think that rather than do the "modify > the e820 array" thing, why not just do it in the in the resource tree, and > do it at "e820_reserve_resources_late()" time? > > IOW, something like this. > > TOTALLY UNTESTED! The point is to take all RAM resources we haev, and > _after_ we've added all the resources we've seen in the E820 tree, we then > _also_ try to add fake reserved entries for any "round up to X" at the end > of the RAM resources. > > NOTE! I really didn't want to use "reserve_region_with_split()". I didn't > want to recurse into any conflicting resources, I really wanted to just do > the other failure cases. > > THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind > of thing. That includes the rough draft of how much to round things up to > based on where the end of RAM region is etc. I'm really throwing this out > more as a "wouldn't this be a readable way to handle any missing reserved > entries" kind of thing.. > > Linus > > --- > arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++ > 1 files changed, 34 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index ef2c356..e8b8d33 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void) > } > } > > +/* How much should we pad RAM ending depending on where it is? */ > +static unsigned long ram_alignment(resource_size_t pos) > +{ > + unsigned long mb = pos >> 20; > + > + /* To 64kB in the first megabyte */ > + if (!mb) > + return 64*1024; Could we perhaps round up to 1MB in this case too? This would mean that we would never dynamically allocate into the 640k..1MB hole - but even if it's free it's probably a good idea to avoid that range - it's usually quite special. So we'd just have two ganularities for round-up: 1MB and 32MB. Nice, predictable scheme. > + > + /* To 1MB in the first 16MB */ > + if (mb < 16) > + return 1024*1024; > + > + /* To 32MB for anything above that */ > + return 32*1024*1024; > +} > + > void __init e820_reserve_resources_late(void) > { > int i; > @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void) > insert_resource_expand_to_fit(&iomem_resource, res); > res++; > } > + > + /* > + * Try to bump up RAM regions to reasonable boundaries to > + * avoid stolen RAM > + */ > + for (i = 0; i < e820.nr_map; i++) { > + struct e820entry *entry = &e820_saved.map[i]; > + resource_size_t start, end; > + > + if (entry->type != E820_RAM) > + continue; Would it make sense to round up everything that is listed in the E820 map? Just in case the BIOS is not entirely honest about the true extent of that area. It might even make sense to add a small 'guard' area next to such ranges (even if they are aligned well): to prevent the BIOS from accidentally overruning into a device BAR we allocate next to it. > + start = entry->addr + entry->size; > + end = round_up(start, ram_alignment(start)); > + if (start == end) > + continue; Hm, indeed, the continue is needed - reserve_region_with_split() lets zero-sized resources be inserted silently. I'd have missed this case. Do zero-sized memory resources have a special role somewhere? [ Plus i dont see any protection against negative-size resources in kernel/resource.c either. OTOH inserting a negative size resource just locks down the tree for all future resources, so it would be noticed for sure. Zero-size is not an issue it appears, it goes in and prevents only the exactly same-position zero-size resource to be inserted. But it might sense to silently ignore zero-size resources (if we dont rely on them elsewhere), or to WARN() about them and about negative size resources. ] > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html