Re: [PATCH v4] pciehp: only wait command complete for real hotplug control

Bjorn Helgaas <bhelgaas@xxxxxxxxxx> · Tue, 20 May 2014 14:53:40 -0600

On Thu, May 01, 2014 at 10:40:55AM -0700, Yinghai Lu wrote:
> On system with 16 PCI express hotplug slots, customer complain every slot
> will report "Command not completed in 1000 msec" during initialization.
> 
> Intel says that we should only wait command complete only for
>            Electromechanical Interlock Control
>            Power Controller Control
>            Power Indicator Control
>            Attention Indicator Control
> 
> For detail please check CF118 on
>   http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> 
> We saw same problems on system with AMD/Nvidia chipsets.
> 
> Two ways to address the problem:
> 1. change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
> 2. or check CMD_COMPLETE and wait before write command. Only wait
>    when CC is set and cmd_busy is true. For chipset from intel
>    will only have CC set for real hotplug control, so we could
>    skip the wait for others.
> 
> This patch is using second way.
> 
> With that we could save 16 seconds during booting, later with 32 sockets system
> with 64 pcie hotplug slots we could save 64 seconds.

Doesn't this just move a timeout out of the boot path and into a future
hotplug event?

These controllers do not set the "No Command Completed Support" bit, so
pciehp expects that they will generate Command Completed events after they
process commands.

During initialization, pcie_enable_notification() writes a command to set
the DLLSCE, ABPE, PDCE, MRLSCE, HPIE, CCIE bits.  The (EIC, PCC, PIC, AIC)
bits aren't changed, so per CF118, the controller will not set the Command
Completed bit.

Later, when we turn on power to the slot, ctrl->cmd_busy will still be set,
so won't we timeout waiting for the Command Completed bit?

(I think there's a bug in your patch (see below) that would explain why you
wouldn't see this timeout in testing.)

The CF118 erratum only affects commands that change DLLSCE, ABPE, PDCE,
etc., without also changing (EIC, PCC, PIC, AIC).  Any command *after* one
of these should time out.

But we only issue commands like that at boot/suspend/resume/reset time.
Maybe it's acceptable for the first "real" hotplug command to timeout.
Subsequent commands shouldn't have any problems.

Or, since the actual hotplug events where we do change (EIC, PCC, PIC, AIC)
normally occur long after boot, maybe we could avoid the timeout by noting
the time when the initialization command (to set DLLSCE, ABPE, etc.) was
issued.  It's very likely that the next command will be more than 1000 msec
later, so there's no point in waiting for another timeout period.

> This patch address spurious "cmd completed" event problem that that
> Rajat met.
> 
> -v2: use second way that is suggested by Bjorn.
> -v3: rebase after
> 	PCI: pciehp: Acknowledge spurious "cmd completed" event
> -v4: add comments according to Bjorn.
>      also add URL for Intel doc about the CC set on intel chipset.
> 
> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
> Tested-by: Rajat Jain <rajatxjain@xxxxxxxxx>
> 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   48 ++++++++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> @@ -155,20 +155,30 @@ static void pcie_write_cmd(struct contro
>  	u16 slot_status;
>  	u16 slot_ctrl;
>  
> +	/*
> +	 * We basically have two strategies for dealing with Command Complete.
> +	 * Both are legal according to the PCIe spec:
> +	 * A: Write command at first and wait for Command Complete is set.
> +	 * B: Wait at first iff Command Complete and cmd_busy are set,
> +	 *    that means previous command is not complete yet.
> +	 *    Then write command.
> +	 *
> +	 * Second way also cover "Command not completed in 1000 msec"
> +	 * Problem on system with chipsets that only set Command Complete
> +	 * for some "real" hotplug events like:
> +	 *	Electromechanical Interlock Control
> +	 *	Power Controller Control
> +	 *	Power Indicator Control
> +	 *	Attention Indicator Control
> +	 * like CF118 in
> +	 * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> +	 */
> +
>  	mutex_lock(&ctrl->ctrl_lock);
>  
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -	if (slot_status & PCI_EXP_SLTSTA_CC) {
> -		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -					   PCI_EXP_SLTSTA_CC);
> -		if (!ctrl->no_cmd_complete) {
> -			/*
> -			 * After 1 sec and CMD_COMPLETED still not set, just
> -			 * proceed forward to issue the next command according
> -			 * to spec. Just print out the error message.
> -			 */
> -			ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
> -		} else if (!NO_CMD_CMPL(ctrl)) {
> +	if (ctrl->no_cmd_complete && slot_status & PCI_EXP_SLTSTA_CC) {
> +		if (!NO_CMD_CMPL(ctrl)) {
>  			/*
>  			 * This controller seems to notify of command completed
>  			 * event even though it supports none of power
> @@ -184,16 +194,11 @@ static void pcie_write_cmd(struct contro
>  	}
>  
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
> -	slot_ctrl &= ~mask;
> -	slot_ctrl |= (cmd & mask);
> -	ctrl->cmd_busy = 1;
> -	smp_mb();
> -	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> -
>  	/*
>  	 * Wait for command completion.
>  	 */
> -	if (!ctrl->no_cmd_complete) {
> +	if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
> +	    ctrl->cmd_busy) {

I don't understand this change.  Why do you test "slot_status &
PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:

  if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...

instead?  I think we want to wait if (1) the controller generates Command
Complete events and (2) Slot Status indicates that a command is still in
progress.

>  		int poll = 0;
>  		/*
>  		 * if hotplug interrupt is not enabled or command
> @@ -205,6 +210,13 @@ static void pcie_write_cmd(struct contro
>  			poll = 1;
>                  pcie_wait_cmd(ctrl, poll);
>  	}
> +
> +	slot_ctrl &= ~mask;
> +	slot_ctrl |= (cmd & mask);
> +	ctrl->cmd_busy = 1;
> +	smp_mb();
> +	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> +
>  	mutex_unlock(&ctrl->ctrl_lock);
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html