On Mon, Feb 13, 2023 at 01:31:17PM -0500, Gregory Price wrote: > On Mon, Feb 13, 2023 at 01:22:17PM -0500, Gregory Price wrote: > > On Fri, Feb 10, 2023 at 01:05:21AM -0800, Dan Williams wrote: > > > Changes since v1: [1] > > > [... snip ...] > > [... snip ...] > > Really i see these decoders and device mappings setup: > > port1 -> mem2 > > port2 -> mem1 > > port3 -> mem0 > > small correction: > port1 -> mem1 > port3 -> mem0 > port2 -> mem2 > > > > > Therefore I should expect > > decoder0.0 -> mem2 > > decoder0.1 -> mem1 > > decoder0.2 -> mem0 > > > > this end up mapping this way, which is still further jumbled. > > Something feels like there's an off-by-one > Currently, the naming of memdevs can be out-of-order due to the following two reasons, 1. At kernel side, cxl port driver does async device probe, which can change the memdev naming even within a single OS boot and among multiple time of device enumeration. The pattern can be observed with following steps in the guest, loop(){ a) modprobe cxl_xxx b)cxl list --> you will see the memdev name changes (like mem0->mem1). c) rmmod cxl_xxx } This behaviour can be avoided by using sync device probe by making the following change -------------------------------------------- diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 258004f34281..f3f90fad62b5 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -663,7 +663,7 @@ static struct pci_driver cxl_pci_driver = { .probe = cxl_pci_probe, .err_handler = &cxl_error_handlers, .driver = { - .probe_type = PROBE_PREFER_ASYNCHRONOUS, + .probe_type = PROBE_FORCE_SYNCHRONOUS, }, }; ------------------------------------------- The above patch, you will see consistent memdev naming within one OS boot, however, the order can be still different from what we expect with the qemu config options we use. We need to make some change at QEMU side also as shown below. 2. Currently in Qemu, multiple components at the same topology level are stored in a data structure called QLIST as defined in include/qemu/queue.h. When enqueuing a component, current qemu code uses QLIST_INSERT_HEAD to insert the item at the head, but when iterating, it uses QLIST_FOREACH/QLIST_FOREACH_SAFE which is also from the head of the list. That is to say, if we enqueue items P1,P2,P3 in order, when iterating, we get P3,P2,P1. I have a simple test with the below code change(always insert to the list tail), the order issue is fixed. ---------------------------------------------------------------------------- diff --git a/include/qemu/queue.h b/include/qemu/queue.h index e029e7bf66..15491960e1 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -130,7 +130,7 @@ struct { \ (listelm)->field.le_prev = &(elm)->field.le_next; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD(head, elm, field) do { \ +#define QLIST_INSERT_HEAD_OLD(head, elm, field) do { \ if (((elm)->field.le_next = (head)->lh_first) != NULL) \ (head)->lh_first->field.le_prev = &(elm)->field.le_next;\ (head)->lh_first = (elm); \ @@ -146,6 +146,20 @@ struct { \ (elm)->field.le_prev = NULL; \ } while (/*CONSTCOND*/0) +#define QLIST_INSERT_TAIL(head, elm, field) do { \ + typeof(elm) last_p = (head)->lh_first; \ + while (last_p && last_p->field.le_next) \ + last_p = last_p->field.le_next; \ + if (last_p) \ + QLIST_INSERT_AFTER(last_p, elm, field); \ + else \ + QLIST_INSERT_HEAD_OLD(head, elm, field); \ +} while (/*CONSTCOND*/0) + +#define QLIST_INSERT_HEAD(head, elm, field) do { \ + QLIST_INSERT_TAIL(head, elm, field); \ +} while (/*CONSTCOND*/0) + /* * Like QLIST_REMOVE() but safe to call when elm is not in a list */ ----------------------------------------------------------------------------- The memdev naming order can also cause confusion when creating regions for multiple memdevs under different HBs as in the kernel code, we enforce HB check to ensure the target position matches the CFMW configuration. To avoid the confusion, we can use "cxl list -TD" to find out the target position for a memdev, but it is kind of annoying to do it before creating region.