RE: [PATCH v1] PCI: pciehp: Fix presence detect change interrupt handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
> 
> > >
> > > > On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:
> > > > > > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:
> > > >
> > > > > > > We need to do something about that *in addition * to the
> > > > > > > above patch to cover the
> > > > > > > whole story. However I think there still will be a room for some
> > > > > > > interrupt misses because we are
> > > > > > > collecting the interrupts in intr_loc, and theoretically we could be
> > > > > > > in a situation where in the pcie_isr, the
> > > > > > >
> > > > > > > do {
> > > > > > >     ...
> > > > > > > } while(detected)
> > > > > > >
> > > > > > > loop gets a removal->insertion->removal all while in the same
> > > > > > > invocation of pcie_isr().
> > > > > > > If this happens, the intr_loc will have recorded a single insertion
> > > > > > > and a single removal, and
> > > > > > > the final result will depend on the order in which we decide to
> > > > > > > process the events in intr_loc.
> > > > > >
> > > > > > I don't quite understand how that "do { .. } while (detected)" loop
> > > > > > works or why it's done that way.  Collecting interrupt status bits in
> > > > > > an ISR is obviously a very common task; it seems like there should be
> > > > > > a standard, idiomatic way of doing it, but I don't know it.
> > > > > >
> > > > > > > Or, may be we can make the calls to pciehp_queue_interrupt_event()
> > > > > > > before clearing the
> > > > > > > RW1C in the slot status register (in the loop)?
> > > > > >
> > > > > > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
> > > > > > events related to it, then clear the relevant SLTSTA bits.
> > > > > >
> > > > >
> > > > > Do you mean to remove the do {...} while loop and just
> > > > > read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?
> > > >
> > > > I don't know if removing the loop is the right thing or not.  We need
> > > > to understand why the loop is there in the first place and make sure
> > > > removing it wouldn't break things.
> > > >
> > >
> > >
> > > The pcie base spec. 3.0 says from chapter 6.7.3.1. Slot Events:
> > >
> > >         A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed
> > >         above. When one of these slot events is detected, the Port indicates that the event has occurred by
> > >         setting the status field associated with the event. At that point, the event is pending until software
> > >         clears the status field.
> > >         Once a slot event is pending on a particular slot, all subsequent events of that type are ignored on
> > >         that slot until the event is cleared. The Port must continue to monitor the slot for all other slot
> > >         event types and report them as they occur.
> > >
> > >
> > > So does it mean that the port would continue providing MSI if there has been
> > > any other events occurred apart from the event which is not cleared? If that
> > > is the case then it's not sure why the loop is still needed.
> >
> > I'd think that the loop is needed because we don't want to handle just
> > one event on one interrupt. We want to handle as many events as we can
> > (to keep the interrupt latency overhead low), that have happened
> > during the ISR invocation. And hence the loop.
> >
> 
> Hmm, Right. I created a proposal patch
> (=>[v1,2/2] PCI: pciehp: Rework hotplug interrupt routine)
> to remove the do {...} while loop which
> I guess then should be rejected, right? As we decide to keep this loop?
> 
> > >
> > > Also with having a loop, pcie_isr() is reading the PCI_EXP_SLTSTA again and taking action just based on
> > > latest event happens. That may mean removal->insertion happens fast enough then
> > > removal could be overwritten by insertion and pciehp_get_adapter_status ()
> > > will return insertion only and we may miss the removal event to be processed for the loop.
> >
> > Current. This is the problem that Bjorn highlighted above. I think the
> > current code works well when events of different kinds have happened
> > (due to the loop).
> >
> > In case of similar kind of events (insertion / removal), I think there
> > are 2 problems: (1) we use the transient values of slot status which
> > may not be valid at that time. (2) because we ack the event before we
> > queue it up, it leaves chance for error. I think we'd need more
> > careful examination in order to fix these issues.
> >
> 
> I also see 3rd problem too with this Loop.
> For example, if we are in the loop,
> If only slot removal comes -> detect = PCI_EXP_SLTSTA_PDC -> intr_lock |= detected
> -> detected==true so clear PDC interrupt bit => So we have a chance to detect
> another insertion event.
> 
> We are in the same loop & if insertion comes again -> so detected = PCI_EXP_SLTSTA_PDC
> -> then detected &= ~intr_loc -> detected == false => This means
> PCI_EXP_SLTSTA would not be cleared for insertion event happened
> and there we could get a problem that No further PDC events can get triggered by HW.
> 
> So if we want to keep the loop, I think clearing PCI_EXP_SLTSTA  should be shifted
> outside of the loop to clear Slot events(but not based on detected) to ensure that loop
> just looks for the events Which has not occurred, also before clearing SLTSTA outside
> the loop, we can update link and slot_status to trigger the right events for DLLSC and PDC types.
> 
> What do you think about following proposal?
> 
> @@ -542,7 +542,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  	struct pci_bus *subordinate = pdev->subordinate;
>  	struct pci_dev *dev;
>  	struct slot *slot = ctrl->slot;
> -	u16 detected, intr_loc;
> +	u16 detected, intr_loc, slot_status;
>  	u8 present;
>  	bool link;
> 
> @@ -553,26 +553,30 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  	 */
>  	intr_loc = 0;
>  	do {
> -		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected);
> -		if (detected == (u16) ~0) {
> +		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> +		if (slot_status == (u16) ~0) {
>  			ctrl_info(ctrl, "%s: no response from device\n",
>  				  __func__);
>  			return IRQ_HANDLED;
>  		}
> 
> -		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> +		detected = slot_status & (PCI_EXP_SLTSTA_ABP |
> +			     PCI_EXP_SLTSTA_PFD |
>  			     PCI_EXP_SLTSTA_PDC |
>  			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>  		detected &= ~intr_loc;
>  		intr_loc |= detected;
>  		if (!intr_loc)
>  			return IRQ_NONE;
> -		if (detected)
> -			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -						   intr_loc);
>  	} while (detected);
> 
>  	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", intr_loc);
> +	if (intr_loc & PCI_EXP_SLTSTA_PDC)
> +		present = !!(slot_status & PCI_EXP_SLTSTA_PDS);
> +	if (intr_loc & PCI_EXP_SLTSTA_DLLSC)
> +		link = pciehp_check_link_active(ctrl);
> +
> +	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, intr_loc);
> 
>  	/* Check Command Complete Interrupt Pending */
>  	if (intr_loc & PCI_EXP_SLTSTA_CC) {
> @@ -603,7 +607,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
> 
>  	/* Check Presence Detect Changed */
>  	if (intr_loc & PCI_EXP_SLTSTA_PDC) {
> -		pciehp_get_adapter_status(slot, &present);
>  		ctrl_info(ctrl, "Card %spresent on Slot(%s)\n",
>  			  present ? "" : "not ", slot_name(slot));
>  		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
> @@ -618,7 +621,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  	}
> 
>  	if (intr_loc & PCI_EXP_SLTSTA_DLLSC) {
> -		link = pciehp_check_link_active(ctrl);
>  		ctrl_info(ctrl, "slot(%s): Link %s event\n",
>  			  slot_name(slot), link ? "Up" : "Down");
>  		pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :
> 
> 

Hi Rajat/Bjorn
Could you please comment, if the proposal can be worked out or we should consider
removing do {..} while at all as Keith, Busch commented as that's not really performance
path or is there any other way/suggestion you think of?


> > >
> > > If we don't clear the Slot event quick enough then according to spec. statement above
> > > that event could get ignored and SW may never get notified.
> > >
> > >
> > >
> > > > But I do think that in the resulting code, the connection between
> > > >
> > > >   (1) the events we learn from the interrupt and
> > > >   (2) the queued work items
> > > >
> > > > needs to be crystal clear.  Right now it's a bit muddy because of
> > > > things like the case you fixed: a work item that goes back and looks
> > > > at PCI_EXP_SLTSTA after it's been cleared and the hardware may have
> > > > already set bits for new events.
> > > >
> > >
> > > I have made a prototype patch in my follow up reply and tested ok on my existing setup
> > > on which I caught the previous issue although I cant say it will work on any HW.
> > > Clearing events after queuing events gave some problems and did not work properly on my
> > > existing HW where I test very fast insertion and removal events, in which
> > > case only "removal" event comes and "insertion" does not occur even if HW gets powered.
> > >
> 
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
��.n��������+%������w��{.n�����{���"�)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux