Re: [PATCH] pci/msi: Allow arch code to return the number of MSI-X available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 11, 2009 at 10:16:28PM +1100, Michael Ellerman wrote:
> > How about the following API?
> > 
> > /*
> >  * Set up nvec MSI interrupts.  Returns 0 on success, a positive number if
> >  * it could have set up fewer interrupts than requested, or -Errno for other
> >  * errors
> >  */
> > int arch_msi_setup_irqs(struct pci_dev *, int nvec);
> > void arch_msi_free_irqs(struct pci_dev *);
> > 
> > /*
> >  * Allocate nvec MSI-X interrupts.  Returns 0 on success, a positive number if
> >  * it could have set up fewer interrupts than requested, or -Errno for other
> >  * errors
> >  */
> > int arch_msix_alloc_irqs(struct pci_dev *, int nvec);
> > 
> > /*
> >  * Program the @table_entry MSI-X interrupt on the card.  Returns the Linux
> >  * interrupt number on success or -Errno if an error occurs.
> >  */
> > int arch_msix_setup_irq(struct pci_dev *, int table_entry);
> > void arch_msix_free_irqs(struct pci_dev *);
> > 
> > There's a lot more I can say about the heavily patched msi.c file I have
> > here ... but perhaps I'll save that for another day.
> 
> I'm not quite sure I follow. Your MSI-X routine is still setting up just
> one irq, that won't work for us.

You would set up nvec irqs in arch_msix_alloc_irqs().  Then you would
program each one in arch_msix_setup_irq().

> I don't understand why you'd want a different interface for MSI vs
> MSI-X, rather than just passing the type as we currently do.

That's probably because you don't understand the horrific pain involved
in multiple MSI --
http://git.kernel.org/?p=linux/kernel/git/willy/pci.git;a=commitdiff;h=ca750d8f036b23a2ed04d3e178d7d3dbee1aa475;hp=db9a0751c60c3845061d239410f402e5c1e0b726
shows some of the pain -- but not too much as I evaded a lot of pain by
basically introducing a completely separate path for MSI and MSI-X.

The requirement that MSI vectors be assigned as a contiguous, aligned
block is absolutely killer, at least on x86.  Maybe on ppc this is much
less of a big deal.

> I take it you're planning on replacing the list of msi_descs with
> something more compact? It seems to me we could do that fairly easily
> without changing the arch interface much at all - just replace the list
> with some per-pci_dev attributes and then an array of per-MSI info.

This is also true.  I'll post the whole patch, but here's the important
data structures currently in my rewrite:

struct pci_dev {
-       unsigned int    msi_enabled:1;
-       unsigned int    msix_enabled:1;
+       unsigned int    irq_type:2;     /* 0 - pin, 1 - MSI, 2 - MSI-X */

 #ifdef CONFIG_PCI_MSI
-       struct list_head msi_list;
+       union {
+               struct msi_ctrl msi;
+               struct msix_ctrl *msix;
+       };
 #endif
}

struct msi_ctrl {
        u8      is_64    : 1;   /* 64-bit addresses supported           */
        u8      maskable : 1;   /* per-vector mask and pending bits     */
        u8      multiple : 3;   /* log2 number of messages allocated    */
        u8      pos;            /* address of the msi capability        */
        u8      mask_pos;       /* address of the mask bits             */
        u32     masked;         /* masked interrupts                    */

        unsigned old_irq;       /* pin-based irq                        */
};

struct msix_ctrl {
        void __iomem *table;
        unsigned irq_count;
        unsigned irqs[0];
};


struct irq_desc {
-       struct msi_desc         *msi_desc;
+       struct msi_desc         msi_desc;
}

struct msi_desc {
#ifdef CONFIG_PCI_MSI
        struct pci_dev *dev;
        unsigned short local_irq;       /* Interrupt offset within this dev */
        unsigned char multiple;         /* 1 for MSI-X, 1-32 for MSI */
#endif
};


While there's still something called an msi_desc, you can see that it's
not quite the same thing any more -- it's allocated as part of the struct
irq_desc.  The 'local_irq' is either the table entry number for MSI-X,
or it's the offset from the base MSI interrupt number for MSI.  'multiple'
is cached here so we don't have to do
	(dev->irq_type == msi) ? (1 << dev->msi->multiple) : 1;
in a couple of places.

The msi_ctrl is a mere 12 bytes which is smaller on 64-bit than the
list_head (larger on 32-bit ... oh well), so I feel justified in sticking
the msi_ctrl directly in the pci_dev.  The msix_ctrl is going to depend
on the number of vectors allocated, so needs to be externally allocated.

I'm not even sure if this version of the msix redesign will compile
on x86, and it certainly won't compile on any other arch.  I've never
tried to boot it, and I doubt it will work.  But it's probably a
good place to start from in having design arguments.  It's against the
http://git.kernel.org/?p=linux/kernel/git/willy/pci.git;a=shortlog;h=msi-20090126
tree.

There's a few misc cleanups in here I should rescue, such as the
__msi_set_enable / msi_set_enable changes.  You know how it gets once
you start hacking on stuff ...

diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c
index f3ca952..6ad5666 100644
--- a/arch/x86/kernel/io_apic.c
+++ b/arch/x86/kernel/io_apic.c
@@ -568,19 +568,17 @@ set_desc_affinity(struct irq_desc *desc, const struct cpumask *mask)
 {
 	struct irq_cfg *cfg;
 	unsigned int irq;
-	struct msi_desc *msi_desc = desc->msi_desc;
-	unsigned i, count = 1;
-
-	if (msi_desc)
-		count <<= msi_desc->msi_attrib.multiple;
+	struct msi_desc *msi_desc = get_irq_desc_msi(desc);
 
 	if (!cpumask_intersects(mask, cpu_online_mask))
 		return BAD_APICID;
 
 	irq = desc->irq;
 	cfg = desc->chip_data;
-	if (count > 1) {
+	if (msi_desc->multiple > 1) {
+		unsigned count = msi_desc->multiple;
 		/* Multiple MSIs all go to the same destination */
+		irq -= msi_desc->local_irq;
 		if (assign_irq_vector_block(irq, count, mask))
 			return BAD_APICID;
 		for (i = 0; i < count; i++) {
diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
index de01174..229af57 100644
--- a/drivers/pci/irq.c
+++ b/drivers/pci/irq.c
@@ -35,17 +35,12 @@ static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
  */
 enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
 {
-	if (pdev->msi_enabled || pdev->msix_enabled) {
-		enum pci_lost_interrupt_reason ret;
-
-		if (pdev->msix_enabled) {
-			pci_note_irq_problem(pdev, "MSIX routing failure");
-			ret = PCI_LOST_IRQ_DISABLE_MSIX;
-		} else {
-			pci_note_irq_problem(pdev, "MSI routing failure");
-			ret = PCI_LOST_IRQ_DISABLE_MSI;
-		}
-		return ret;
+	if (is_msix(pdev)) {
+		pci_note_irq_problem(pdev, "MSI-X routing failure");
+		return PCI_LOST_IRQ_DISABLE_MSIX;
+	} else if (is_msi(pdev)) {
+		pci_note_irq_problem(pdev, "MSI routing failure");
+		return PCI_LOST_IRQ_DISABLE_MSI;
 	}
 #ifdef CONFIG_ACPI
 	if (!(acpi_disabled || acpi_noirq)) {
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 95e3f65..773d001 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -25,7 +25,7 @@
 
 static int pci_msi_enable = 1;
 
-/* Arch hooks */
+/* Architectures can override this if they need to */
 
 int __attribute__ ((weak))
 arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
@@ -33,7 +33,7 @@ arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
 	return 0;
 }
 
-static void __msi_set_enable(struct pci_dev *dev, int pos, int enable)
+static void msi_set_enable(struct pci_dev *dev, int pos, int enable)
 {
 	u16 control;
 
@@ -46,11 +46,6 @@ static void __msi_set_enable(struct pci_dev *dev, int pos, int enable)
 	}
 }
 
-static void msi_set_enable(struct pci_dev *dev, int enable)
-{
-	__msi_set_enable(dev, pci_find_capability(dev, PCI_CAP_ID_MSI), enable);
-}
-
 static void msix_set_enable(struct pci_dev *dev, int enable)
 {
 	int pos;
@@ -93,17 +88,17 @@ static inline __attribute_const__ u32 msi_enabled_mask(u16 control)
  * Returns 1 if it succeeded in masking the interrupt and 0 if the device
  * doesn't support MSI masking.
  */
-static int msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
+static int msi_mask_irq(struct pci_dev *dev, struct msi_ctrl *ctrl, u32 mask, u32 flag)
 {
-	u32 mask_bits = desc->masked;
+	u32 mask_bits = ctrl->masked;
 
-	if (!desc->msi_attrib.maskbit)
+	if (!ctrl->maskable)
 		return 0;
 
 	mask_bits &= ~mask;
 	mask_bits |= flag;
-	pci_write_config_dword(desc->dev, desc->mask_pos, mask_bits);
-	desc->masked = mask_bits;
+	pci_write_config_dword(dev, ctrl->mask_pos, mask_bits);
+	ctrl->masked = mask_bits;
 
 	return 1;
 }
@@ -115,28 +110,30 @@ static int msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
  * file.  This saves a few milliseconds when initialising devices with lots
  * of MSI-X interrupts.
  */
-static void msix_mask_irq(struct msi_desc *desc, u32 flag)
+static void msix_mask_irq(struct msix_ctrl *ctrl, unsigned entry, u32 flag)
 {
-	u32 mask_bits = desc->masked;
-	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+	unsigned offset = entry * PCI_MSIX_ENTRY_SIZE +
 					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+	u32 mask_bits = readl(ctrl->table + offset);
 	mask_bits &= ~1;
 	mask_bits |= flag;
-	writel(mask_bits, desc->mask_base + offset);
-	desc->masked = mask_bits;
+	writel(mask_bits, ctrl->table + offset);
 }
 
 static int msi_set_mask_bit(unsigned irq, u32 flag)
 {
 	struct msi_desc *desc = get_irq_msi(irq);
+	struct pci_dev *dev = desc->dev;
 
-	if (desc->msi_attrib.is_msix) {
-		msix_mask_irq(desc, flag);
-		readl(desc->mask_base);		/* Flush write to device */
+	if (is_msix(dev)) {
+		struct msix_ctrl *ctrl = msix_ctrl(dev);
+		msix_mask_irq(ctrl, desc->local_irq, flag);
+		readl(ctrl->table);		/* Flush write to device */
 		return 1;
 	} else {
-		unsigned offset = irq - desc->dev->irq;
-		return msi_mask_irq(desc, 1 << offset, flag << offset);
+		struct msi_ctrl *ctrl = msi_ctrl(dev);
+		unsigned offset = desc->local_irq;
+		return msi_mask_irq(dev, ctrl, 1 << offset, flag << offset);
 	}
 }
 
@@ -152,22 +149,25 @@ void unmask_msi_irq(unsigned int irq)
 
 void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
-	struct msi_desc *entry = get_irq_desc_msi(desc);
-	if (entry->msi_attrib.is_msix) {
-		void __iomem *base = entry->mask_base +
-			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+	struct msi_desc *msi_desc = get_irq_desc_msi(desc);
+	struct pci_dev *dev = msi_desc->dev;
+
+	if (is_msix(dev)) {
+		struct msix_ctrl *msix = msix_ctrl(dev);
+		void __iomem *base = msix->table +
+			msi_desc->local_irq * PCI_MSIX_ENTRY_SIZE;
 
 		msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 		msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
 		msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
 	} else {
-		struct pci_dev *dev = entry->dev;
-		int pos = entry->msi_attrib.pos;
+		struct msi_ctrl *msi = msi_ctrl(dev);
+		int pos = msi->pos;
 		u16 data;
 
 		pci_read_config_dword(dev, msi_lower_address_reg(pos),
 					&msg->address_lo);
-		if (entry->msi_attrib.is_64) {
+		if (msi->is_64) {
 			pci_read_config_dword(dev, msi_upper_address_reg(pos),
 						&msg->address_hi);
 			pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
@@ -188,11 +188,13 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
 
 void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
-	struct msi_desc *entry = get_irq_desc_msi(desc);
-	if (entry->msi_attrib.is_msix) {
-		void __iomem *base;
-		base = entry->mask_base +
-			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+	struct msi_desc *msi_desc = get_irq_desc_msi(desc);
+	struct pci_dev *dev = msi_desc->dev;
+
+	if (is_msix(dev)) {
+		struct msix_ctrl *msix = msix_ctrl(dev);
+		void __iomem *base = msix->table +
+			msi_desc->local_irq * PCI_MSIX_ENTRY_SIZE;
 
 		writel(msg->address_lo,
 			base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
@@ -200,18 +202,18 @@ void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 			base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
 		writel(msg->data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
 	} else {
-		struct pci_dev *dev = entry->dev;
-		int pos = entry->msi_attrib.pos;
+		struct msi_ctrl *msi = msi_ctrl(dev);
+		int pos = msi->pos;
 		u16 msgctl;
 
 		pci_read_config_word(dev, msi_control_reg(pos), &msgctl);
 		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
-		msgctl |= entry->msi_attrib.multiple << 4;
+		msgctl |= msi->multiple << 4;
 		pci_write_config_word(dev, msi_control_reg(pos), msgctl);
 
 		pci_write_config_dword(dev, msi_lower_address_reg(pos),
 					msg->address_lo);
-		if (entry->msi_attrib.is_64) {
+		if (msi->is_64) {
 			pci_write_config_dword(dev, msi_upper_address_reg(pos),
 						msg->address_hi);
 			pci_write_config_word(dev, msi_data_reg(pos, 1),
@@ -221,7 +223,6 @@ void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 						msg->data);
 		}
 	}
-	entry->msg = *msg;
 }
 
 void write_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -231,20 +232,6 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)
 	write_msi_msg_desc(desc, msg);
 }
 
-static int msi_free_irqs(struct pci_dev* dev);
-
-static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
-{
-	struct msi_desc *desc = kzalloc(sizeof(*desc), GFP_KERNEL);
-	if (!desc)
-		return NULL;
-
-	INIT_LIST_HEAD(&desc->list);
-	desc->dev = dev;
-
-	return desc;
-}
-
 static void pci_intx_for_msi(struct pci_dev *dev, int enable)
 {
 	if (!(dev->dev_flags & PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG))
@@ -255,46 +242,46 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
 {
 	int pos;
 	u16 control;
-	struct msi_desc *entry;
+	struct msi_ctrl *ctrl;
 
-	if (!dev->msi_enabled)
+	if (!is_msi(dev))
 		return;
-
-	entry = get_irq_msi(dev->irq);
-	pos = entry->msi_attrib.pos;
+	ctrl = msi_ctrl(dev);
+	pos = ctrl->pos;
 
 	pci_intx_for_msi(dev, 0);
-	msi_set_enable(dev, 0);
-	write_msi_msg(dev->irq, &entry->msg);
+	msi_set_enable(dev, pos, 0);
+	/* MRW fix save/restore MSI state */
+//	write_msi_msg(dev->irq, &entry->msg);
 
 	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
-	msi_mask_irq(entry, msi_capable_mask(control), entry->masked);
+	msi_mask_irq(dev, ctrl, msi_capable_mask(control), ctrl->masked);
 	control &= ~PCI_MSI_FLAGS_QSIZE;
-	control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE;
+	control |= (ctrl->multiple << 4) | PCI_MSI_FLAGS_ENABLE;
 	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, control);
 }
 
 static void __pci_restore_msix_state(struct pci_dev *dev)
 {
 	int pos;
-	struct msi_desc *entry;
+	struct msix_ctrl *ctrl;
 	u16 control;
 
-	if (!dev->msix_enabled)
+	if (!is_msix(dev))
 		return;
+	ctrl = msix_ctrl(dev);
 
 	/* route the table */
 	pci_intx_for_msi(dev, 0);
 	msix_set_enable(dev, 0);
 
-	list_for_each_entry(entry, &dev->msi_list, list) {
-		write_msi_msg(entry->irq, &entry->msg);
-		msix_mask_irq(entry, entry->masked);
-	}
+	/* MRW fix save/restore MSI state */
+//	list_for_each_entry(entry, &dev->msi_list, list) {
+//		write_msi_msg(entry->irq, &entry->msg);
+//		msix_mask_irq(entry, entry->masked);
+//	}
 
-	BUG_ON(list_empty(&dev->msi_list));
-	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
-	pos = entry->msi_attrib.pos;
+	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
 	pci_read_config_word(dev, pos + PCI_MSIX_FLAGS, &control);
 	control &= ~PCI_MSIX_FLAGS_MASKALL;
 	control |= PCI_MSIX_FLAGS_ENABLE;
@@ -308,6 +295,33 @@ void pci_restore_msi_state(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_restore_msi_state);
 
+static void msi_free_irqs(struct pci_dev *dev)
+{
+	if (is_msi(dev)) {
+		struct msi_ctrl *ctrl = msi_ctrl(dev);
+		unsigned i, nvec;
+		nvec = 1 << ctrl->multiple;
+		for (i = 0; i < nvec; i++)
+			BUG_ON(irq_has_action(dev->irq + i));
+	} else {
+		struct msix_ctrl *ctrl = msix_ctrl(dev);
+		unsigned i;
+		for (i = 0; i < ctrl->irq_count; i++) {
+			unsigned irq = ctrl->irqs[i];
+			if (!irq)
+				continue;
+			BUG_ON(irq_has_action(irq));
+		}
+	}
+
+	arch_teardown_msi_irqs(dev);
+
+	if (is_msix(dev)) {
+		kfree(dev->msix);
+		dev->msix = NULL;
+	}
+}
+
 /**
  * msi_capability_init - configure device's MSI capability structure
  * @dev: pointer to the pci_dev data structure of MSI device function
@@ -321,37 +335,33 @@ EXPORT_SYMBOL_GPL(pci_restore_msi_state);
  */
 static int msi_capability_init(struct pci_dev *dev, int nvec)
 {
-	struct msi_desc *entry;
+	struct msi_ctrl *ctrl;
+	struct msi_msg msg;
 	int pos, ret;
 	u16 control;
 	unsigned mask;
 
-	msi_set_enable(dev, 0);	/* Ensure msi is disabled as I set it up */
-
    	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+
+	msi_set_enable(dev, pos, 0);	/* Disable MSI during set up */
+
 	pci_read_config_word(dev, msi_control_reg(pos), &control);
 	/* MSI Entry Initialization */
-	entry = alloc_msi_entry(dev);
-	if (!entry)
-		return -ENOMEM;
-
-	entry->msi_attrib.is_msix = 0;
-	entry->msi_attrib.is_64 = is_64bit_address(control);
-	entry->msi_attrib.entry_nr = 0;
-	entry->msi_attrib.maskbit = is_mask_bit_support(control);
-	entry->msi_attrib.default_irq = dev->irq;	/* Save IOAPIC IRQ */
-	entry->msi_attrib.pos = pos;
-
-	entry->mask_pos = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
+	ctrl = &dev->msi;
+
+	ctrl->is_64 = is_64bit_address(control);
+	ctrl->maskable = is_mask_bit_support(control);
+	ctrl->old_irq = dev->irq;	/* Save pin-based IRQ */
+	ctrl->pos = pos;
+
+	ctrl->mask_pos = msi_mask_bits_reg(pos, ctrl->is_64);
 	/* All MSIs are unmasked by default, Mask them all */
-	pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
+	pci_read_config_dword(dev, ctrl->mask_pos, &ctrl->masked);
 	mask = msi_capable_mask(control);
-	msi_mask_irq(entry, mask, mask);
-
-	list_add_tail(&entry->list, &dev->msi_list);
+	msi_mask_irq(dev, ctrl, mask, mask);
 
 	/* Configure MSI capability structure */
-	ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
+	ret = arch_setup_msi_irqs(dev, nvec, &msg);
 	if (ret) {
 		msi_free_irqs(dev);
 		return ret;
@@ -359,10 +369,9 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
 
 	/* Set MSI enabled bits	 */
 	pci_intx_for_msi(dev, 0);
-	msi_set_enable(dev, 1);
-	dev->msi_enabled = 1;
+	msi_set_enable(dev, pos, 1);
+	dev->irq_type = 1;
 
-	dev->irq = entry->irq;
 	return 0;
 }
 
@@ -379,80 +388,71 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
 static int msix_capability_init(struct pci_dev *dev,
 				struct msix_entry *entries, int nvec)
 {
-	struct msi_desc *entry;
-	int pos, i, j, nr_entries, ret;
-	unsigned long phys_addr;
+	struct msix_ctrl *ctrl;
+	int pos, i, nr_entries, ret;
+	resource_size_t phys_addr;
 	u32 table_offset;
  	u16 control;
 	u8 bir;
 	void __iomem *base;
 
-	msix_set_enable(dev, 0);/* Ensure msix is disabled as I set it up */
+	msix_set_enable(dev, 0);	/* Disable MSI-X during set up */
 
    	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
 	/* Request & Map MSI-X table region */
  	pci_read_config_word(dev, msi_control_reg(pos), &control);
 	nr_entries = multi_msix_capable(control);
 
+	ret = -ENOMEM;
+	ctrl = kmalloc(sizeof(msix_ctrl) + nr_entries * sizeof(int),
+								GFP_KERNEL);
+	if (!ctrl)
+		goto out;
+
  	pci_read_config_dword(dev, msix_table_offset_reg(pos), &table_offset);
-	bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+	bir = table_offset & PCI_MSIX_FLAGS_BIRMASK;
 	table_offset &= ~PCI_MSIX_FLAGS_BIRMASK;
-	phys_addr = pci_resource_start (dev, bir) + table_offset;
+	phys_addr = pci_resource_start(dev, bir) + table_offset;
 	base = ioremap_nocache(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
 	if (base == NULL)
-		return -ENOMEM;
+		goto free_ctrl;
 
-	/* MSI-X Table Initialization */
-	for (i = 0; i < nvec; i++) {
-		entry = alloc_msi_entry(dev);
-		if (!entry)
-			break;
-
- 		j = entries[i].entry;
-		entry->msi_attrib.is_msix = 1;
-		entry->msi_attrib.is_64 = 1;
-		entry->msi_attrib.entry_nr = j;
-		entry->msi_attrib.default_irq = dev->irq;
-		entry->msi_attrib.pos = pos;
-		entry->mask_base = base;
-		entry->masked = readl(base + j * PCI_MSIX_ENTRY_SIZE +
-					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
-		msix_mask_irq(entry, 1);
-
-		list_add_tail(&entry->list, &dev->msi_list);
-	}
+	ctrl->table = base;
+	ctrl->irq_count = nr_entries;
+	for (i = 0; i < nr_entries; i++)
+		msix_mask_irq(ctrl, i, 1);
+	dev->msix = ctrl;
 
-	ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
-	if (ret) {
-		int avail = 0;
-		list_for_each_entry(entry, &dev->msi_list, list) {
-			if (entry->irq != 0) {
-				avail++;
-			}
+	for (i = 0; i < nvec; i++) {
+		unsigned irq, local_irq = entries[i].entry;
+		struct msi_msg msg;
+		struct msi_desc desc;
+		ret = arch_setup_msix_irq(dev, local_irq, &msg);
+		if (ret) {
+			if (i > 0)
+				ret = i;
+			goto free_irqs;
 		}
 
-		msi_free_irqs(dev);
-
-		/* If we had some success report the number of irqs
-		 * we succeeded in setting up.
-		 */
-		if (avail == 0)
-			avail = ret;
-		return avail;
-	}
-
-	i = 0;
-	list_for_each_entry(entry, &dev->msi_list, list) {
-		entries[i].vector = entry->irq;
-		set_irq_msi(entry->irq, entry);
-		i++;
+		irq = ctrl->irqs[local_irq];
+		entries[i].vector = irq;	/* XXX: truncation */
+		desc.dev = dev;
+		desc.local_irq = local_irq;
+		set_irq_msi(irq, &desc);
 	}
 	/* Set MSI-X enabled bits */
 	pci_intx_for_msi(dev, 0);
 	msix_set_enable(dev, 1);
-	dev->msix_enabled = 1;
+	dev->irq_type = 2;
 
 	return 0;
+
+ free_irqs:
+	msi_free_irqs(dev);
+ free_ctrl:
+	kfree(ctrl);
+ out:
+	return ret;
 }
 
 /**
@@ -532,10 +532,10 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
 	if (status)
 		return status;
 
-	WARN_ON(!!dev->msi_enabled);
+	WARN_ON(is_msi(dev));
 
 	/* Check whether driver already requested MSI-X irqs */
-	if (dev->msix_enabled) {
+	if (is_msix(dev)) {
 		dev_info(&dev->dev, "can't enable MSI "
 			 "(MSI-X already enabled)\n");
 		return -EINVAL;
@@ -548,75 +548,36 @@ EXPORT_SYMBOL(pci_enable_msi_block);
 
 void pci_msi_shutdown(struct pci_dev *dev)
 {
-	struct msi_desc *desc;
+	struct msi_ctrl *ctrl;
 	u32 mask;
-	u16 ctrl;
+	u16 control;
 
-	if (!pci_msi_enable || !dev || !dev->msi_enabled)
+	if (!pci_msi_enable || !dev || !is_msi(dev))
 		return;
 
-	msi_set_enable(dev, 0);
+	ctrl = msi_ctrl(dev);
+	msi_set_enable(dev, ctrl->pos, 0);
 	pci_intx_for_msi(dev, 1);
-	dev->msi_enabled = 0;
 
-	BUG_ON(list_empty(&dev->msi_list));
-	desc = list_first_entry(&dev->msi_list, struct msi_desc, list);
-	pci_read_config_word(dev, desc->msi_attrib.pos + PCI_MSI_FLAGS, &ctrl);
-	mask = msi_capable_mask(ctrl);
-	msi_mask_irq(desc, mask, ~mask);
+	pci_read_config_word(dev, ctrl->pos + PCI_MSI_FLAGS, &control);
+	mask = msi_capable_mask(control);
+	msi_mask_irq(dev, ctrl, mask, ~mask);
 
 	/* Restore dev->irq to its default pin-assertion irq */
-	dev->irq = desc->msi_attrib.default_irq;
+	dev->irq_type = 0;
+	dev->irq = ctrl->old_irq;
 }
 
-void pci_disable_msi(struct pci_dev* dev)
+void pci_disable_msi(struct pci_dev *dev)
 {
-	struct msi_desc *entry;
-
-	if (!pci_msi_enable || !dev || !dev->msi_enabled)
+	if (!pci_msi_enable || !dev || !is_msi(dev))
 		return;
 
 	pci_msi_shutdown(dev);
-
-	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
-	if (entry->msi_attrib.is_msix)
-		return;
-
 	msi_free_irqs(dev);
 }
 EXPORT_SYMBOL(pci_disable_msi);
 
-static int msi_free_irqs(struct pci_dev* dev)
-{
-	struct msi_desc *entry, *tmp;
-
-	list_for_each_entry(entry, &dev->msi_list, list) {
-		int i, nvec;
-		if (!entry->irq)
-			continue;
-		nvec = 1 << entry->msi_attrib.multiple;
-		for (i = 0; i < nvec; i++)
-			BUG_ON(irq_has_action(entry->irq + i));
-	}
-
-	arch_teardown_msi_irqs(dev);
-
-	list_for_each_entry_safe(entry, tmp, &dev->msi_list, list) {
-		if (entry->msi_attrib.is_msix) {
-			writel(1, entry->mask_base + entry->msi_attrib.entry_nr
-				  * PCI_MSIX_ENTRY_SIZE
-				  + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
-
-			if (list_is_last(&entry->list, &dev->msi_list))
-				iounmap(entry->mask_base);
-		}
-		list_del(&entry->list);
-		kfree(entry);
-	}
-
-	return 0;
-}
-
 /**
  * pci_enable_msix - configure device's MSI-X capability structure
  * @dev: pointer to the pci_dev data structure of MSI-X device function
@@ -632,7 +593,7 @@ static int msi_free_irqs(struct pci_dev* dev)
  * of irqs available. Driver should use the returned value to re-send
  * its request.
  **/
-int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
+int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
 {
 	int status, pos, nr_entries;
 	int i, j;
@@ -660,10 +621,10 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
 				return -EINVAL;	/* duplicate entry */
 		}
 	}
-	WARN_ON(!!dev->msix_enabled);
+	WARN_ON(is_msix(dev));
 
-	/* Check whether driver already requested for MSI irq */
-   	if (dev->msi_enabled) {
+	/* Check whether device already has an MSI irq */
+   	if (is_msi(dev)) {
 		dev_info(&dev->dev, "can't enable MSI-X "
 		       "(MSI IRQ already assigned)\n");
 		return -EINVAL;
@@ -673,28 +634,23 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
 }
 EXPORT_SYMBOL(pci_enable_msix);
 
-static void msix_free_all_irqs(struct pci_dev *dev)
-{
-	msi_free_irqs(dev);
-}
-
-void pci_msix_shutdown(struct pci_dev* dev)
+void pci_msix_shutdown(struct pci_dev *dev)
 {
-	if (!pci_msi_enable || !dev || !dev->msix_enabled)
+	if (!pci_msi_enable || !dev || !is_msix(dev))
 		return;
 
 	msix_set_enable(dev, 0);
 	pci_intx_for_msi(dev, 1);
-	dev->msix_enabled = 0;
+	dev->irq_type = 0;
 }
-void pci_disable_msix(struct pci_dev* dev)
+
+void pci_disable_msix(struct pci_dev *dev)
 {
-	if (!pci_msi_enable || !dev || !dev->msix_enabled)
+	if (!pci_msi_enable || !dev || !is_msix(dev))
 		return;
 
 	pci_msix_shutdown(dev);
-
-	msix_free_all_irqs(dev);
+	msi_free_irqs(dev);
 }
 EXPORT_SYMBOL(pci_disable_msix);
 
@@ -712,11 +668,7 @@ void msi_remove_pci_irq_vectors(struct pci_dev* dev)
 	if (!pci_msi_enable || !dev)
  		return;
 
-	if (dev->msi_enabled)
-		msi_free_irqs(dev);
-
-	if (dev->msix_enabled)
-		msix_free_all_irqs(dev);
+	msi_free_irqs(dev);
 }
 
 void pci_no_msi(void)
@@ -735,8 +687,3 @@ int pci_msi_enabled(void)
 	return pci_msi_enable;
 }
 EXPORT_SYMBOL(pci_msi_enabled);
-
-void pci_msi_init_pci_dev(struct pci_dev *dev)
-{
-	INIT_LIST_HEAD(&dev->msi_list);
-}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7ff5420..9ef00a6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -887,9 +887,9 @@ static void pcim_release(struct device *gendev, void *res)
 	struct pci_devres *this = res;
 	int i;
 
-	if (dev->msi_enabled)
+	if (is_msi(dev))
 		pci_disable_msi(dev);
-	if (dev->msix_enabled)
+	if (is_msix(dev))
 		pci_disable_msix(dev);
 
 	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
@@ -2146,7 +2146,7 @@ int pci_reset_function(struct pci_dev *dev)
 	if (r < 0)
 		return r;
 
-	if (!dev->msi_enabled && !dev->msix_enabled && dev->irq != 0)
+	if (dev->irq_type == 0 && dev->irq != 0)
 		disable_irq(dev->irq);
 	pci_save_state(dev);
 
@@ -2155,7 +2155,7 @@ int pci_reset_function(struct pci_dev *dev)
 	r = pci_execute_reset_function(dev);
 
 	pci_restore_state(dev);
-	if (!dev->msi_enabled && !dev->msix_enabled && dev->irq != 0)
+	if (dev->irq_type == 0 && dev->irq != 0)
 		enable_irq(dev->irq);
 
 	return r;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 07c0aa5..d7ef4c8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -111,10 +111,8 @@ extern unsigned int pci_pm_d3_delay;
 
 #ifdef CONFIG_PCI_MSI
 void pci_no_msi(void);
-extern void pci_msi_init_pci_dev(struct pci_dev *dev);
 #else
 static inline void pci_no_msi(void) { }
-static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
 #endif
 
 #ifdef CONFIG_PCIEAER
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 55ec44a..3af4681 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -957,9 +957,6 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 
 static void pci_init_capabilities(struct pci_dev *dev)
 {
-	/* MSI/MSI-X list */
-	pci_msi_init_pci_dev(dev);
-
 	/* Buffers for saving PCIe and PCI-X capabilities */
 	pci_allocate_cap_save_buffers(dev);
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 619bdf1..068dfee 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -20,6 +20,7 @@
 #include <linux/irqreturn.h>
 #include <linux/irqnr.h>
 #include <linux/errno.h>
+#include <linux/msi.h>
 
 #include <asm/irq.h>
 #include <asm/ptrace.h>
@@ -75,7 +76,6 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
 #endif
 
 struct proc_dir_entry;
-struct msi_desc;
 
 /**
  * struct irq_chip - hardware interrupt chip descriptor
@@ -169,7 +169,7 @@ struct irq_desc {
 #endif
 	irq_flow_handler_t	handle_irq;
 	struct irq_chip		*chip;
-	struct msi_desc		*msi_desc;
+	struct msi_desc		msi_desc;
 	void			*handler_data;
 	void			*chip_data;
 	struct irqaction	*action;	/* IRQ action list */
@@ -411,12 +411,12 @@ extern int set_irq_msi(unsigned int irq, struct msi_desc *entry);
 #define get_irq_chip(irq)	(irq_to_desc(irq)->chip)
 #define get_irq_chip_data(irq)	(irq_to_desc(irq)->chip_data)
 #define get_irq_data(irq)	(irq_to_desc(irq)->handler_data)
-#define get_irq_msi(irq)	(irq_to_desc(irq)->msi_desc)
+#define get_irq_msi(irq)	(&irq_to_desc(irq)->msi_desc)
 
 #define get_irq_desc_chip(desc)		((desc)->chip)
 #define get_irq_desc_chip_data(desc)	((desc)->chip_data)
 #define get_irq_desc_data(desc)		((desc)->handler_data)
-#define get_irq_desc_msi(desc)		((desc)->msi_desc)
+#define get_irq_desc_msi(desc)		(&(desc)->msi_desc)
 
 #endif /* CONFIG_GENERIC_HARDIRQS */
 
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 5d16e8c..9b044c8 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -1,12 +1,14 @@
 #ifndef LINUX_MSI_H
 #define LINUX_MSI_H
 
-#include <linux/list.h>
+#include <linux/types.h>
+
+struct pci_dev;
 
 struct msi_msg {
 	u32	address_lo;	/* low 32 bits of msi message address */
 	u32	address_hi;	/* high 32 bits of msi message address */
-	u32	data;		/* 16 bits of msi message data */
+	u32	data;		/* msi message data */
 };
 
 /* Helper functions */
@@ -18,33 +20,43 @@ extern void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg);
 extern void read_msi_msg(unsigned int irq, struct msi_msg *msg);
 extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);
 
+struct msi_ctrl {
+	u8	is_64	 : 1;	/* 64-bit addresses supported		*/
+	u8	maskable : 1; 	/* per-vector mask and pending bits	*/
+	u8	multiple : 3;	/* log2 number of messages allocated	*/
+	u8	pos;	 	/* address of the msi capability	*/
+	u8	mask_pos;	/* address of the mask bits		*/
+	u32	masked;		/* masked interrupts			*/
+
+	unsigned old_irq;	/* pin-based irq			*/
+};
+
+struct msix_ctrl {
+	void __iomem *table;
+	unsigned irq_count;
+	unsigned irqs[0];
+};
+
 struct msi_desc {
-	struct {
-		__u8	is_msix	: 1;
-		__u8	multiple: 3;	/* log2 number of messages */
-		__u8	maskbit	: 1; 	/* mask-pending bit supported ?   */
-		__u8	is_64	: 1;	/* Address size: 0=32bit 1=64bit  */
-		__u8	pos;	 	/* Location of the msi capability */
-		__u16	entry_nr;    	/* specific enabled entry 	  */
-		unsigned default_irq;	/* default pre-assigned irq	  */
-	} msi_attrib;
-
-	u32 masked;			/* mask bits */
-	unsigned int irq;
-	struct list_head list;
-
-	union {
-		void __iomem *mask_base;
-		u8 mask_pos;
-	};
+#ifdef CONFIG_PCI_MSI
 	struct pci_dev *dev;
-
-	/* Last set MSI message */
-	struct msi_msg msg;
+	unsigned short local_irq;	/* Interrupt offset within this dev */
+	unsigned char multiple;		/* 1 for MSI-X, 1-32 for MSI */
+#endif
 };
 
-extern int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
-extern void arch_teardown_msi_irqs(struct pci_dev *dev);
-extern int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
+#ifdef CONFIG_PCI_MSI
+static inline int multiple_msi(struct msi_desc *desc)
+{
+	return desc->multiple;
+}
+#else
+#define multiple_msi(x)	0
+#endif
+
+extern int arch_msi_check_device(struct pci_dev *, int nvec, int type);
+extern int arch_setup_msi_irqs(struct pci_dev *, int nvec, struct msi_msg *);
+extern int arch_setup_msix_irq(struct pci_dev *, int irq, struct msi_msg *);
+extern void arch_teardown_msi_irqs(struct pci_dev *);
 
 #endif /* LINUX_MSI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1b16d71..126bedf 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -52,6 +52,7 @@
 #include <asm/atomic.h>
 #include <linux/device.h>
 #include <linux/io.h>
+#include <linux/msi.h>
 
 /* Include the ID list */
 #include <linux/pci_ids.h>
@@ -251,8 +252,7 @@ struct pci_dev {
 	unsigned int	block_ucfg_access:1;	/* userspace config space access is blocked */
 	unsigned int	broken_parity_status:1;	/* Device generates false positive parity */
 	unsigned int	irq_reroute_variant:2;	/* device needs IRQ rerouting variant */
-	unsigned int 	msi_enabled:1;
-	unsigned int	msix_enabled:1;
+	unsigned int	irq_type:2;	/* 0 - pin, 1 - MSI, 2 - MSI-X */
 	unsigned int	ari_enabled:1;	/* ARI forwarding */
 	unsigned int	is_managed:1;
 	unsigned int	is_pcie:1;
@@ -267,7 +267,10 @@ struct pci_dev {
 	struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
 	struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
 #ifdef CONFIG_PCI_MSI
-	struct list_head msi_list;
+	union {
+		struct msi_ctrl msi;
+		struct msix_ctrl *msix;
+	};
 #endif
 	struct pci_vpd *vpd;
 };
@@ -344,7 +347,7 @@ struct pci_bus {
 #ifdef CONFIG_PCI_MSI
 static inline bool pci_dev_msi_enabled(struct pci_dev *pci_dev)
 {
-	return pci_dev->msi_enabled || pci_dev->msix_enabled;
+	return pci_dev->irq_type != 0;
 }
 #else
 static inline bool pci_dev_msi_enabled(struct pci_dev *pci_dev) { return false; }
@@ -832,6 +835,28 @@ extern void pci_restore_msi_state(struct pci_dev *dev);
 extern int pci_msi_enabled(void);
 #endif
 
+static inline int is_msi(struct pci_dev *dev)
+{
+	return dev->irq_type == 1;
+}
+
+static inline int is_msix(struct pci_dev *dev)
+{
+	return dev->irq_type == 2;
+}
+
+static inline struct msi_ctrl *msi_ctrl(struct pci_dev *dev)
+{
+	BUG_ON(!is_msi(dev));
+	return &dev->msi;
+}
+
+static inline struct msix_ctrl *msix_ctrl(struct pci_dev *dev)
+{
+	BUG_ON(!is_msix(dev));
+	return dev->msix;
+}
+
 #ifndef CONFIG_PCIEASPM
 static inline int pcie_aspm_enabled(void)
 {
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index f63c706..fc7e2c6 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -39,7 +39,6 @@ void dynamic_irq_init(unsigned int irq)
 	desc->chip = &no_irq_chip;
 	desc->handle_irq = handle_bad_irq;
 	desc->depth = 1;
-	desc->msi_desc = NULL;
 	desc->handler_data = NULL;
 	desc->chip_data = NULL;
 	desc->action = NULL;
@@ -72,7 +71,6 @@ void dynamic_irq_cleanup(unsigned int irq)
 			irq);
 		return;
 	}
-	desc->msi_desc = NULL;
 	desc->handler_data = NULL;
 	desc->chip_data = NULL;
 	desc->handle_irq = handle_bad_irq;
@@ -168,7 +166,7 @@ EXPORT_SYMBOL(set_irq_data);
  *
  *	Set the hardware irq controller data for an irq
  */
-int set_irq_msi(unsigned int irq, struct msi_desc *entry)
+int set_irq_msi(unsigned int irq, struct msi_desc *msi_desc)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
 	unsigned long flags;
@@ -180,9 +178,7 @@ int set_irq_msi(unsigned int irq, struct msi_desc *entry)
 	}
 
 	spin_lock_irqsave(&desc->lock, flags);
-	desc->msi_desc = entry;
-	if (entry)
-		entry->irq = irq;
+	desc->msi_desc = *msi_desc;
 	spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
 }

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux