On Fri, Nov 18 2022 at 01:58, Thomas Gleixner wrote: > On Thu, Nov 17 2022 at 15:33, Reinette Chatre wrote: >> When calling pci_ims_alloc_irq(), msi_insert_desc() ends up being >> called twice, first with index = MSI_ANY_INDEX, second with index = 0. >> (domid = 1 both times) > > How so? > >>> } >>> >>> hwsize = msi_domain_get_hwsize(dev, domid); >>> - if (index >= hwsize) { >>> - ret = -ERANGE; >>> - goto fail; >>> - } >>> >>> - desc->msi_index = index; >>> - index += baseidx; >>> - ret = xa_insert(&md->__store, index, desc, GFP_KERNEL); >>> - if (ret) >>> - goto fail; >>> - return 0; >>> + if (index == MSI_ANY_INDEX) { >>> + struct xa_limit limit; >>> + unsigned int index; >>> + >>> + limit.min = baseidx; >>> + limit.max = baseidx + hwsize - 1; >>> >>> + /* Let the xarray allocate a free index within the limits */ >>> + ret = xa_alloc(&md->__store, &index, desc, limit, GFP_KERNEL); >>> + if (ret) >>> + goto fail; >>> + >> >> This path (index == MSI_ANY_INDEX) is followed when msi_insert_desc() >> is called the first time and the xa_alloc() succeeds at index 65536. >> >>> + desc->msi_index = index; >> >> This is problematic with desc->msi_index being a u16, assigning >> 65536 to it becomes 0. > > You are partially right. I need to fix that and make it explicit as it's > a "works by chance or maybe not" construct right now. > > But desc->msi_index is correct to be truncated because it's the index > within the domain space which is zero based. It should obviously do: desc->msi_index = index - baseidx; >>> + return 0; >>> + } else { >>> + if (index >= hwsize) { >>> + ret = -ERANGE; >>> + goto fail; >>> + } >>> + >>> + desc->msi_index = index; >>> + index += baseidx; >>> + ret = xa_insert(&md->__store, index, desc, GFP_KERNEL); >>> + if (ret) >>> + goto fail; >> >> This "else" path is followed when msi_insert_desc() is called the second >> time with "index = 0". The xa_insert() above fails at index 65536 >> (baseidx = 65536) with -EBUSY, trickling up as the return code to >> pci_ims_alloc_irq(). > > Why is it called with index=0 the second time? >>> + desc = msi_alloc_desc(dev, 1, affdesc); >>> + if (!desc) { >>> + map.index = -ENOMEM; >>> + goto unlock; >>> + } >>> + >>> + if (cookie) >>> + desc->data.cookie = *cookie; >>> + >>> + ret = msi_insert_desc(dev, desc, domid, index); >>> + if (ret) { >>> + map.index = ret; >>> + goto unlock; >>> + } >> >> Above is the first call to msi_insert_desc(/* index = MSI_ANY_INDEX */) >> >>> + >>> + map.index = desc->msi_index; >> >> msi_insert_desc() did attempt to set desc->msi_index to 65536 but map.index ends >> up being 0. > > which is kinda correct. > >>> + ret = msi_domain_alloc_irqs_range_locked(dev, domid, map.index, map.index); >> >> Here is where the second call to msi_insert_desc() originates: >> >> msi_domain_alloc_irqs_range_locked() -> msi_domain_alloc_locked() -> \ >> __msi_domain_alloc_locked() -> msi_domain_alloc_simple_msi_descs() -> \ >> msi_domain_add_simple_msi_descs() -> msi_insert_desc() > > but yes, that's bogus because it tries to allocate what is allocated already. > > Too tired to decode this circular dependency right now. Will stare at it > with brain awake in the morning. Duh! Duh. I'm a moron. Of course I "tested" this by flipping default and secondary domain around and doing dynamic allocations from PCI/MSI-X but that won't catch the bug because PCI/MSI-X does not have the ALLOC_SIMPLE_DESCS flag set. Let me fix that. Thanks, tglx