On Tue, Mar 05, 2019 at 09:46:08AM -0300, Daniel Henrique Barboza wrote: > The NVLink2 support in QEMU implements the detection of NVLink2 > capable devices by verfying the attributes of the VFIO mem region > QEMU allocates for the NVIDIA GPUs. To properly allocate an > adequate amount of memLock, Libvirt needs this information before > a QEMU instance is even created. > > An alternative is presented in this patch. Given a PCI device, > we'll traverse the device tree at /proc/device-tree to check if > the device has a NPU bridge, retrieve the node of the NVLink2 bus, > find the memory-node that is related to the bus and see if it's a > NVLink2 bus by inspecting its 'reg' value. This logic is contained > inside the 'device_is_nvlink2_capable' function, which uses other > new helper functions to navigate and fetch values from the device > tree nodes. > > Signed-off-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx> > --- This fails with a bunch of compilation errors, please make sure you run make check and make syntax-check on each patch of the series. > src/qemu/qemu_domain.c | 194 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 194 insertions(+) > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > index 77548c224c..97de5793e2 100644 > --- a/src/qemu/qemu_domain.c > +++ b/src/qemu/qemu_domain.c > @@ -10343,6 +10343,200 @@ qemuDomainUpdateCurrentMemorySize(virDomainObjPtr vm) > } > > > +/** > + * Reads a phandle file and returns the phandle value. > + */ > +static int > +read_dt_phandle(const char* file) ..we prefer camelCase naming of functions...all functions should have a prefix, e.g. vir,<driver>, etc. > +{ > + unsigned int buf[1]; > + size_t read; > + FILE *f; > + > + f = fopen(file, "r"); > + if (!f) > + return -1; > + > + read = fread(buf, sizeof(unsigned int), 1, f); We already have safe helpers that take care of such operations. > + > + if (!read) { > + VIR_CLOSE(f); You could use VIR_AUTOCLOSE for this > + return 0; > + } > + > + VIR_CLOSE(f); > + return be32toh(buf[0]); Is this part of gnulib? We need to make sure it's portable otherwise we'd need to make the conversion ourselves, just like for le64toh > +} > + > + > +/** > + * Reads a memory reg file and returns the first 4 int values. > + * > + * The caller is responsible for freeing the returned array. > + */ > +static unsigned int * > +read_dt_memory_reg(const char *file) > +{ > + unsigned int *buf; > + size_t read, i; > + FILE *f; > + > + f = fopen(file, "r"); > + if (!f) > + return NULL; > + > + if (VIR_ALLOC_N(buf, 4) < 0) > + return NULL; > + > + read = fread(buf, sizeof(unsigned int), 4, f); > + > + if (!read && read < 4) > + /* shouldn't happen */ > + VIR_FREE(buf); > + else for (i = 0; i < 4; i++) > + buf[i] = be32toh(buf[i]); > + > + VIR_CLOSE(f); > + return buf; Same comments as above... > +} > + > + > +/** > + * This wrapper function receives arguments to be used in a > + * 'find' call to retrieve the file names that matches > + * the criteria inside the /proc/device-tree dir. > + * > + * A 'find' call with '-iname phandle' inside /proc/device-tree > + * provides more than a thousand matches. Adding '-path' to > + * narrow it down further is necessary to keep the file > + * listing sane. > + * > + * The caller is responsible to free the buffer returned by > + * this function. > + */ > +static char * > +retrieve_dt_files_pattern(const char *path_pattern, const char *file_pattern) > +{ > + virCommandPtr cmd = NULL; > + char *output = NULL; > + > + cmd = virCommandNew("find"); > + virCommandAddArgList(cmd, "/proc/device-tree/", "-path", path_pattern, > + "-iname", file_pattern, NULL); > + virCommandSetOutputBuffer(cmd, &output); > + > + if (virCommandRun(cmd, NULL) < 0) > + VIR_FREE(output); > + > + virCommandFree(cmd); > + return output; > +} > + > + > +/** > + * Helper function that receives a listing of file names and > + * calls read_dt_phandle() on each one finding for a match > + * with the given phandle argument. Returns the file name if a > + * match is found, NULL otherwise. > + */ > +static char * > +find_dt_file_with_phandle(char *files, int phandle) > +{ > + char *line, *tmp; > + int ret; > + > + line = strtok_r(files, "\n", &tmp); > + do { > + ret = read_dt_phandle(line); > + if (ret == phandle) > + break; > + } while ((line = strtok_r(NULL, "\n", &tmp)) != NULL); > + > + return line; > +} > + > + > +/** > + * This function receives a string that represents a PCI device, > + * such as '0004:04:00.0', and tells if the device is NVLink2 capable. > + * > + * The logic goes as follows: > + * > + * 1 - get the phandle of a nvlink of the device, reading the 'ibm,npu' > + * attribute; > + * 2 - find the device tree node of the nvlink bus using the phandle > + * found in (1) > + * 3 - get the phandle of the memory region of the nvlink bus > + * 4 - find the device tree node of the memory region using the > + * phandle found in (3) > + * 5 - read the 'reg' value of the memory region. If the value of > + * the second 64 bit value is 0x02 0x00, the device is attached > + * to a NVLink2 bus. > + * > + * If any of these steps fails, the function returns false. > + */ > +static bool > +device_is_nvlink2_capable(const char *device) > +{ > + char *file, *files, *tmp; > + unsigned int *reg; > + int phandle; > + > + if ((virAsprintf(&file, "/sys/bus/pci/devices/%s/of_node/ibm,npu", > + device)) < 0) > + return false; > + > + /* Find phandles of nvlinks: */ > + if ((phandle = read_dt_phandle(file)) == -1) > + return false; > + > + /* Find a DT node for the phandle found */ > + files = retrieve_dt_files_pattern("*device-tree/pci*", "phandle"); > + if (!files) > + return false; > + > + if ((file = find_dt_file_with_phandle(files, phandle)) == NULL) > + goto fail; > + > + /* Find a phandle of the GPU memory region of the device. The > + * file found above ends with '/phandle' - the memory region > + * of the GPU ends with '/memory-region */ > + tmp = strrchr(file, '/'); > + *tmp = '\0'; > + file = strcat(file, "/memory-region"); > + > + if ((phandle = read_dt_phandle(file)) == -1) > + goto fail; > + > + file = NULL; > + VIR_FREE(files); > + > + /* Find the memory node for the phandle found above */ > + files = retrieve_dt_files_pattern("*device-tree/memory*", "phandle"); > + if (!files) > + return false; > + > + if ((file = find_dt_file_with_phandle(files, phandle)) == NULL) > + goto fail; > + > + /* And see its size in the second 64bit value of 'reg'. First, > + * the end of the file needs to be changed from '/phandle' to > + * '/reg' */ > + tmp = strrchr(file, '/'); > + *tmp = '\0'; > + file = strcat(file, "/reg"); > + > + reg = read_dt_memory_reg(file); > + if (reg && reg[2] == 0x20 && reg[3] == 0x00) > + return true; This function is very complex just to find out whether a PCI device is capable of NVLink or not. Feels wrong to do it this way, I believe it would be much easier if NVIDIA exposed a sysfs attribute saying whether a PCI device supports NVLink so that our node-device driver would take care of this during libvirtd startup and then you'd only call a single API from the PPC64 helper you're introducing in the next patch to find out whether you need the alternative formula or not. Honestly, apart from the coding style issues, I don't like this approach and unless there's a really compelling reason for libvirt to do it in a way which involves spawning a 'find' process because of a complex pattern and a bunch of data necessary to filter out, I'd really suggest contacting NVIDIA about this. It's the same for mdevs, NVIDIA exposes a bunch of attributes in sysfs which we're able to read. Thinking about it a bit more, since this is NVIDIA-specific, having an NVIDIA-only sysfs attribute doesn't help the node-device driver I mentioned above. In general, we try to avoid introducing vendor-specific code, so nodedev might not be the right place to check for the attribute (because it's not universal and would require vendor-specific elements), but I think we could have an NVLink helper reading this sysfs attribute(s) elsewhere in libvirt codebase. Erik -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list