Re: [PATCH] pci/msi: Allow arch code to return the number of MSI-X available

Matthew Wilcox <matthew@xxxxxx> · Tue, 10 Feb 2009 22:53:51 -0700

On Wed, Feb 11, 2009 at 03:37:43PM +1100, Michael Ellerman wrote:
> > Would it be fair to say that those platforms are the same ones which
> > implement arch_msi_check_device(), which already works for this case?
> 
> Yes and yes but ..
> 
> We can do some checking in arch_msi_check_device() and return a
> "could-be-allocated" value from there[1].
> 
> But we still want to be able to return a "could-be-allocated" value from
> arch_setup_msi_irqs(), without going through the business of setting up
> msi_descs.
> 
> Simply because when we do the firmware call to actually allocate the
> MSIs, it might give us less than we ask for. At that point we can just
> return that number.

Hm, yes, OK.

> We /could/ try to allocate in arch_msi_check_device(), see what we get,
> deallocate, and return the number. But it's a) ugly as hell, and b)
> still possible that when we come back to do the real allocation we get a
> different number.

Yes.  I also considered that you could actually do the allocation in
arch_msi_check_device() and then fill in during arch_setup_msi_irqs(),
but you could leak if, say, ioremap_nocache() fails.

So, I'm OK with this patch, given the change I suggested below is
implemented.  Now, onto more fun matters ...

I've been working on support for systems with large numbers of MSI-X
vectors.  Here's the architecture interface I'm currently working with:

extern int arch_msi_check_device(struct pci_dev *, int nvec, int type);
extern int arch_setup_msi_irqs(struct pci_dev *, int nvec);
extern int arch_setup_msix_irq(struct pci_dev *, int irq);
extern void arch_teardown_msi_irqs(struct pci_dev *);

and the MSI-X code looks like this:

        for (i = 0; i < nvec; i++) {
                unsigned irq, table_entry = entries[i].entry;
                struct msi_desc desc;
                ret = arch_setup_msix_irq(dev, table_entry);

I now perceive this would throw a bit of a spanner in the works for ppc.
How about the following API?

/*
 * Set up nvec MSI interrupts.  Returns 0 on success, a positive number if
 * it could have set up fewer interrupts than requested, or -Errno for other
 * errors
 */
int arch_msi_setup_irqs(struct pci_dev *, int nvec);
void arch_msi_free_irqs(struct pci_dev *);

/*
 * Allocate nvec MSI-X interrupts.  Returns 0 on success, a positive number if
 * it could have set up fewer interrupts than requested, or -Errno for other
 * errors
 */
int arch_msix_alloc_irqs(struct pci_dev *, int nvec);

/*
 * Program the @table_entry MSI-X interrupt on the card.  Returns the Linux
 * interrupt number on success or -Errno if an error occurs.
 */
int arch_msix_setup_irq(struct pci_dev *, int table_entry);
void arch_msix_free_irqs(struct pci_dev *);

There's a lot more I can say about the heavily patched msi.c file I have
here ... but perhaps I'll save that for another day.

> > > @@ -43,7 +43,8 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> > >  	list_for_each_entry(entry, &dev->msi_list, list) {
> > >  		ret = arch_setup_msi_irq(dev, entry);
> > >  		if (ret)
> > > -			return ret;
> > > +			/* Ensure the error code is always < 0 */
> > > +			return min(ret, -ret);
> > >  	}
> > >  
> > >  	return 0;
> > 
> > I really don't like this.  If we're returning an error, we should return
> > a sensible error value.  Something like:
> > 
> > 		if (ret < 0)
> > 			return ret;
> > 		if (ret > 0)
> > 			return -ENOSPC;
> 
> Fair enough. My reasoning was that negating it at least left some chance
> that you could work out where it came from. But I'm not fussed, and I
> admit it's a bit ugly.

I have three points which contradict each other.  Bear with me ;-)

First, I just noticed that this was a patch to arch_setup_msi_irqs() so
really we should return a count of how far we got through the list.

Second, arch_setup_msi_irq() isn't permitted to return a number >0, so
we don't really need this hunk at all.

Third, I did push a git tree while on holiday that included eliminating
arch_setup_msi_irqs() from this file and pushing it to all architectures.

I told Jesse, but didn't send email about it (connectivity was
intermittent and I couldn't really send email).  I _intended_ to tell
Jesse to pass it along, but clearly I didn't do that.  Here's the git
tree:

http://git.kernel.org/?p=linux/kernel/git/willy/pci.git;a=shortlog;h=msi-20090126

There's several lines of development in there, so it's a bit messy.  Oh
well.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html