Re: kexec -e not working: root disk not able to detect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Jens, ahci.c maintainer]

On Mon, Jan 06, 2020 at 05:24:44PM +0530, Prabhakar Kushwaha wrote:
> Hi All,
> 
> I am trying kexec -e with latest kernel i.e. Linux-5.5.0-rc4.  Here
> second kernel is not able to detect/mount hard-disk having root file
> system (INTEL SSDSC2BB240G7).
> 
> [  279.690575] ata1: softreset failed (1st FIS failed)
> [  279.695446] ata1: limiting SATA link speed to 3.0 Gbps
> [  280.910020] ata1: SATA link down (SStatus 0 SControl 320)
> [  282.626018] ata1: SATA link down (SStatus 0 SControl 300)
> [  282.631409] ata1: link online but 1 devices misclassified, retrying
> [  282.637665] ata1: reset failed (errno=-11), retrying in 9 secs
> [  298.294546] ata1: failed to reset engine (errno=-5)
> [  302.042967] ata1: softreset failed (1st FIS failed)
> [  308.798609] ata1: failed to reset engine (errno=-5)
> [  337.546605] ata1: softreset failed (1st FIS failed)
> [  337.551475] ata1: limiting SATA link speed to 3.0 Gbps
> [  338.766022] ata1: SATA link down (SStatus 0 SControl 320)
> [  339.270943] ata1: EH pending after 5 tries, giving up
> 
> I found following two workaround for this issue.
> A) Define ".shutdown" in driver/ata/ahci.c.
> 
> reboot --> kernel_kexec() --> kernel_restart_prepare() -->
> device_shutdown() --> pci_device_shutdown() --> ahci_shutdown_one()
> --> new function
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 4bfd1b14b390..50a101002885 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -81,6 +81,7 @@ enum board_ids {
> 
>  static int ahci_init_one(struct pci_dev *pdev, const struct
> pci_device_id *ent);
>  static void ahci_remove_one(struct pci_dev *dev);
>  +static void ahci_shutdown_one(struct pci_dev *dev);
>  static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class,
>                                   unsigned long deadline);
>   static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class,
>  @@ -606,6 +607,7 @@ static struct pci_driver ahci_pci_driver = {
>          .id_table               = ahci_pci_tbl,
>          .probe                  = ahci_init_one,
>          .remove                 = ahci_remove_one,
>  +       .shutdown               = ahci_shutdown_one,
>          .driver = {
>                  .pm             = &ahci_pci_pm_ops,
>          },
> 
>  +static void ahci_shutdown_one(struct pci_dev *pdev)
>  +{
>  +       pm_runtime_get_noresume(&pdev->dev);
>  +       ata_pci_remove_one(pdev);
>  +}
>  +
> Note: After defining shutdown, error related to file-system write
> seen. Looks like even after device_shutdown, file system related
> transaction goes to disk.
> 
> B)) Commenting of pci_clear_master() from pci_device_shutdown()
> reboot --> kernel_kexec() --> kernel_restart_prepare() -->
> device_shutdown() --> pci_device_shutdown()
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0454ca0e4e3f..ddffaa9321bb 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -481,8 +481,10 @@ static void pci_device_shutdown(struct device *dev)
>         /*
>          * If this is a kexec reboot, turn off Bus Master bit on the
> @@ -491,8 +493,16 @@ static void pci_device_shutdown(struct device *dev)
>          * If it is not a kexec reboot, firmware will hit the PCI
>          * devices with big hammer and stop their DMA any way.
>          */
> 
>  - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
>  -                pci_clear_master(pci_dev);

I doubt we would remove this without a much clearer justification.

> Here pci_dev current_state. It is "0" i.e. D0.
> 
> From A and B. Looks like even after pci_clear_master(), Some DMA
> transactions going on PCIe device  causing device in unstable.
> Not sure if this is the reason and how to solve this problem.

Is it possible the ahci driver depends on receiving the device with
bus mastering already enabled?  I would guess that would be the common
case in a non-kexec boot -- the BIOS probably hands off the device
with bus mastering enabled.

ahci_init_one() does turn on bus mastering itself (it calls
pci_set_master()), but it's near the end, do if anything before that
depends on DMA, it wouldn't work.

And I don't know how adding a shutdown method would also be a
workaround.

Bjorn



[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux