On Tuesday 08 December 2009, James Bottomley wrote: > On Tue, 2009-12-01 at 14:10 -0800, Jesse Barnes wrote: > > On Tue, 1 Dec 2009 22:40:48 +0100 > > "Rafael J. Wysocki" <rjw@xxxxxxx> wrote: > > > > > On Tuesday 01 December 2009, Jesse Barnes wrote: > > > > On Tue, 01 Dec 2009 12:13:47 -0500 > > > > James Bottomley <James.Bottomley@xxxxxxx> wrote: > > > > > > > > > On Tue, 2009-12-01 at 14:54 -0200, Kleber Sacilotto de Souza > > > > > wrote: > > > > > > Can you please add the patch from "[PATCH] ipr: fix EEH > > > > > > recovery" sent to this list? > > > > > > > > > > Adding linux-pci because this hack actually tampers with internal > > > > > PCI device state, which looks wrong. > > > > > > > > > > The thread is here: > > > > > > > > > > http://marc.info/?l=linux-scsi&m=125918723218627 > > > > > > > > > > and the proposed full patch and explanation below. > > > > > > > > > > PCI people, is this correct, or is there a better way to do it? > > > > > > > > > > James > > > > > > > > > > --- > > > > > > > > > > Hi, > > > > > > > > > > After commits c82f63e411f1b58427c103bd95af2863b1c96dd1 (PCI: check > > > > > saved state before restore) and > > > > > 4b77b0a2ba27d64f58f16d8d4d48d8319dda36ff (PCI: Clear saved_state > > > > > after the state has been restored) PCI drivers are prevented from > > > > > restoring the device standard configuration registers twice in a > > > > > row. These changes introduced a regression on ipr EEH recovery. > > > > > > > > > > The ipr device driver saves the PCI state only during the device > > > > > probe and restores it on ipr_reset_restore_cfg_space() during IOA > > > > > resets. This behavior is causing the EEH recovery to fail after > > > > > the second error detected, since the registers are not being > > > > > restored. > > > > > > > > > > One possible solution would be saving the registers after > > > > > restoring them. The problem with this approach is that while > > > > > recovering from an EEH error if pci_save_state() results in an > > > > > EEH error, the adapter/slot will be reset, and end up back in > > > > > ipr_reset_restore_cfg_space(), but it won't have a valid saved > > > > > state to restore, so pci_restore_state() will fail. > > > > > > > > > > The following patch introduces a workaround for this problem, > > > > > hacking around the PCI API by setting pdev->state_saved = true > > > > > before we do the restore. It fixes the EEH regression and > > > > > prevents that we hit another EEH error during EEH recovery. > > > > > > > > > > > > > > > Thanks, > > > > > Kleber > > > > > > > > > > > > > > > > > > > > Signed-off-by: Kleber Sacilotto de Souza > > > > > <klebers@xxxxxxxxxxxxxxxxxx> --- > > > > > drivers/scsi/ipr.c | 1 + > > > > > 1 files changed, 1 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c > > > > > index 76d294f..c3ff9a6 100644 > > > > > --- a/drivers/scsi/ipr.c > > > > > +++ b/drivers/scsi/ipr.c > > > > > @@ -6516,6 +6516,7 @@ static int > > > > > ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd) > > > > > int rc; > > > > > > > > > > ENTER; > > > > > + ioa_cfg->pdev->state_saved = true; > > > > > rc = pci_restore_state(ioa_cfg->pdev); > > > > > > > > > > if (rc != PCIBIOS_SUCCESSFUL) { > > > > > > > > Rafael may have input here, but it seems like we need a low level > > > > save/restore routine that ignores the flag (which is generally used > > > > for suspend/resume I think?). > > > > > > There are some other users, but they are only a few. > > > > > > > Maybe adding low level _pci_save_state/_pci_restore_state that don't > > > > check/set the flags would help? > > > > > > That might work, but how do we know that the state we're going to > > > restore is actually valid at this particular point? > > > > > > Perhaps we need a version using a separate storage area? > > > > Yeah, that would probably be best. Let the caller allocate the space > > and save/restore it all it wants for special cases like error handling. > > OK, so could I have a resolution for this, please, guys? > > Do I just apply the patch to fiddle with the internal state which looks > ugly but will fix the bug, or are you going to provide us with the > correct interface to use? I guess at the moment it's better to apply the workaround first and then remove it when the correct interface is ready. I really wouldn't like to hurry with reworking pci_save_state(), because that has a potential of infroducing some new nasty bugs if not done with care. Jesse, what's your opinion? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html