On Mon, Jun 25, 2012 at 1:59 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > On Mon, Jun 25, 2012 at 12:59 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote: >> Some intel new sandybridge or newer two sockets system do support support numa >> for pci devices. But BIOS does not provide _PXM under those root bus in DSDT. >> >> Add boot command line, so user could have chance to input node info before >> BIOS guys figure out to add _PXM. >> >> Fold fix from Ulrich to use ";" instead of ",". >> | The problem is the pci= parameter >> | handling uses ',' to separate parameters and therefore the second PCI >> | root information, separated by a comma, is interpreted as a new pci= >> | parameter. >> >> -v3: According to Bjorn and Ingo, change to use "user input first" policy >> so it could cover wrong _PXM case. >> -v4: Folded change from Ulrich to record pointer of busnum_node string. >> >> Reported-by: Ulrich Drepper <drepper@xxxxxxxxx> >> Tested-by: Ulrich Drepper <drepper@xxxxxxxxx> >> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> >> >> --- >> Documentation/kernel-parameters.txt | 5 +++++ >> arch/x86/include/asm/pci_x86.h | 3 +++ >> arch/x86/pci/acpi.c | 22 +++++++++++++--------- >> arch/x86/pci/common.c | 36 ++++++++++++++++++++++++++++++++++++ >> 4 files changed, 57 insertions(+), 9 deletions(-) >> >> Index: linux-2.6/Documentation/kernel-parameters.txt >> =================================================================== >> --- linux-2.6.orig/Documentation/kernel-parameters.txt >> +++ linux-2.6/Documentation/kernel-parameters.txt >> @@ -2068,6 +2068,11 @@ bytes respectively. Such letter suffixes >> hardware access methods are allowed. Use this >> if you experience crashes upon bootup and you >> suspect they are caused by the BIOS. >> + busnum_node= [X86] Set node for root bus. >> + Format: >> + <bus>:<node>[;...] >> + Specifies node for bus, will override bios _PXM >> + or probed value from hostbridge. > > I liked the previous argument format that included "pci". Now we're > assuming the only bus type important enough to care about NUMA > information is PCI. which one? > > This should also work on ia64, which also uses ACPI. For that matter, > it'd be nice if it worked on *any* NUMA architecture, though I don't > see any PCI NUMA support at all for anything but x86 and ia64. ia64 platform should be well tested with Linux? > >> conf1 [X86] Force use of PCI Configuration >> Mechanism 1. >> conf2 [X86] Force use of PCI Configuration >> Index: linux-2.6/arch/x86/pci/common.c >> =================================================================== >> --- linux-2.6.orig/arch/x86/pci/common.c >> +++ linux-2.6/arch/x86/pci/common.c >> @@ -494,6 +494,39 @@ int __init pcibios_init(void) >> return 0; >> } >> >> +static const char *busnum_node_param; >> + >> +static void remember_busnum_node(const char *str) >> +{ >> + busnum_node_param = str; > > Can you convince me this is safe? pci_setup() is an early_param, so > it looks to me like we might be saving a pointer to initdata in this > call path: > > setup_arch > parse_early_param > strlcpy(tmp_cmdline, boot_command_line) > parse_early_options(__initdata tmp_cmdline) > parse_args > do_early_param > ... > pci_setup (early_param) > pcibios_setup > remember_busnum_node > > And then we use that saved pointer to parse the string at host bridge > add-time, which might be after initdata has been freed. ok, that will need one separate buffer. > >> +} >> + >> +int get_user_busnum_node(int busnum) >> +{ >> + int bus, node, count; >> + const char *p = busnum_node_param; >> + >> + if (!p) >> + return -1; >> + >> + while (*p) { >> + count = 0; >> + if (sscanf(p, "%x:%x%n", &bus, &node, &count) != 2) { >> + printk(KERN_ERR "PCI: Can't parse busnum_node input: %s\n", >> + p); >> + break; >> + } >> + if (bus == busnum) >> + return node; >> + p += count; >> + if (*p != ';') >> + break; >> + p++; >> + } >> + >> + return -1; >> +} >> + >> char * __devinit pcibios_setup(char *str) >> { >> if (!strcmp(str, "off")) { >> @@ -579,6 +612,9 @@ char * __devinit pcibios_setup(char *st >> } else if (!strcmp(str, "nocrs")) { >> pci_probe |= PCI_ROOT_NO_CRS; >> return NULL; >> + } else if (!strncmp(str, "busnum_node=", 12)) { >> + remember_busnum_node(str + 12); >> + return NULL; >> } else if (!strcmp(str, "earlydump")) { >> pci_early_dump_regs = 1; >> return NULL; >> Index: linux-2.6/arch/x86/include/asm/pci_x86.h >> =================================================================== >> --- linux-2.6.orig/arch/x86/include/asm/pci_x86.h >> +++ linux-2.6/arch/x86/include/asm/pci_x86.h >> @@ -46,6 +46,9 @@ enum pci_bf_sort_state { >> pci_dmi_bf, >> }; >> >> +/* pci-common.c */ >> +int get_user_busnum_node(int busnum); >> + >> /* pci-i386.c */ >> >> void pcibios_resource_survey(void); >> Index: linux-2.6/arch/x86/pci/acpi.c >> =================================================================== >> --- linux-2.6.orig/arch/x86/pci/acpi.c >> +++ linux-2.6/arch/x86/pci/acpi.c >> @@ -453,7 +453,7 @@ struct pci_bus * __devinit pci_acpi_scan >> struct pci_sysdata *sd; >> int node; >> #ifdef CONFIG_ACPI_NUMA >> - int pxm; >> + int pxm = -1; >> #endif >> >> if (domain && !pci_domains_supported) { >> @@ -463,16 +463,20 @@ struct pci_bus * __devinit pci_acpi_scan >> return NULL; >> } >> >> - node = -1; >> + node = get_user_busnum_node(busnum); >> + if (node == -1) { >> #ifdef CONFIG_ACPI_NUMA >> - pxm = acpi_get_pxm(device->handle); >> - if (pxm >= 0) >> - node = pxm_to_node(pxm); >> - if (node != -1) >> - set_mp_bus_to_node(busnum, node); >> - else >> -#endif >> + pxm = acpi_get_pxm(device->handle); >> + if (pxm >= 0) >> + node = pxm_to_node(pxm); > > The code above (everything except the calls to set_mp_bus_to_node() > and get_mp_bus_to_node(), which are x86-specific) should be the same > between x86 and ia64. Can you rationalize them? It'd be better if > they used the same #ifdefs and the same code structure. this patch does not change set_mp_bus_to_node/get_mp_bus_to_node... only let user_busnum_node override them. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html