Patch "ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()" has been added to the 6.1-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     ata-libata-eh-do-not-clear-ata_pflag_eh_pending-in-a.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 8fb108a05ccf67229e3846086b32a769347211ca
Author: Niklas Cassel <niklas.cassel@xxxxxxx>
Date:   Thu Sep 14 00:19:16 2023 +0200

    ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
    
    [ Upstream commit 80cc944eca4f0baa9c381d0706f3160e491437f2 ]
    
    ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
    before calling ap->ops->error_handler() (without holding the ap->lock).
    
    If an error IRQ is received while ap->ops->error_handler() is running,
    the irq handler will set ATA_PFLAG_EH_PENDING.
    
    Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
    checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
    of ATA EH is performed.
    
    The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
    ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
    
    ata_eh_reset() is called by ap->ops->error_handler(). This additional
    clearing done by ata_eh_reset() breaks the whole retry logic in
    ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
    ap->ops->error_handler() is running, the port will currently remain
    frozen and will never get re-enabled.
    
    The additional clearing in ata_eh_reset() was introduced in commit
    1e641060c4b5 ("libata: clear eh_info on reset completion").
    
    Looking at the original error report:
    https://marc.info/?l=linux-ide&m=124765325828495&w=2
    
    We can see the following happening:
    [    1.074659] ata3: XXX port freeze
    [    1.074700] ata3: XXX hardresetting link, stopping engine
    [    1.074746] ata3: XXX flipping SControl
    
    [    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
    [    1.411475] ata3: XXX port freeze
    
    [    1.420049] ata3: XXX starting engine
    [    1.420096] ata3: XXX rc=0, class=1
    [    1.420142] ata3: XXX clearing IRQs for thawing
    [    1.420188] ata3: XXX port thawed
    [    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    
    We are not supposed to be able to receive an error IRQ while the port is
    frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
    
    AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
    "Each bit location can be thought of as reporting a '1' if the virtual
    "interrupt line" for that port is indicating it wishes to generate an
    interrupt. That is, if a port has one or more interrupt status bit set,
    and the enables for those status bits are set, then this bit shall be set."
    
    Additionally, AHCI state P:ComInit clearly shows that the state machine
    will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
    is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
    
    So IS.IPS(x) only gets set if PxIS and PxIE is set.
    
    AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
    "The bits in this register are read/write clear. It is set by the level of
    the virtual interrupt line being a set, and cleared by a write of '1' from
    the software."
    
    So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
    IS.IPS(x) for that port.
    
    Since PxIE is cleared, the only way to get an interrupt while the port is
    frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
    the port is frozen, is if it was set before the port was frozen.
    
    However, since commit 737dd811a3db ("ata: libahci: clear pending interrupt
    status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
    before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
    clear eh_info on reset completion") fixed can no longer happen.
    
    Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
    completion"), so that the retry logic in ata_scsi_port_error_handler()
    works once again. (The retry logic is still needed, since we can still
    get an error IRQ _after_ the port has been thawed, but before
    ata_scsi_port_error_handler() takes the ap->lock in order to check
    if ATA_PFLAG_EH_PENDING is set.)
    
    Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
    Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index a3ae5fc2a42fc..6d4c80b6daaef 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2704,18 +2704,11 @@ int ata_eh_reset(struct ata_link *link, int classify,
 		}
 	}
 
-	/*
-	 * Some controllers can't be frozen very well and may set spurious
-	 * error conditions during reset.  Clear accumulated error
-	 * information and re-thaw the port if frozen.  As reset is the
-	 * final recovery action and we cross check link onlineness against
-	 * device classification later, no hotplug event is lost by this.
-	 */
+	/* clear cached SError */
 	spin_lock_irqsave(link->ap->lock, flags);
-	memset(&link->eh_info, 0, sizeof(link->eh_info));
+	link->eh_info.serror = 0;
 	if (slave)
-		memset(&slave->eh_info, 0, sizeof(link->eh_info));
-	ap->pflags &= ~ATA_PFLAG_EH_PENDING;
+		slave->eh_info.serror = 0;
 	spin_unlock_irqrestore(link->ap->lock, flags);
 
 	if (ap->pflags & ATA_PFLAG_FROZEN)



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux