Re: Clocksource boot issues 4.9.13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 2, 2017 at 9:22 PM, Sarah Newman <srn@xxxxxxxxx> wrote:
> On 04/02/2017 02:49 AM, Chris Elliott wrote:
>> Hi all
>>
>> I’ve got a few Intel Z87 chipset machines with Adaptec 5405 raid cards (latest firmware), they work fine on 3.18 but during Dom0 boot using kernel
>> 4.9.13 it hangs at “Using clocksource tsc” and the aacraid driver keeps trying to reset
>>
>> Has anyone seen anything like this?
>>
>> I’ve tried specifying clocksource=xen in grub instead of the default of tsc, and that has the same issue. HPET is enabled and Xen is seeing it:
>>
>> (XEN) ACPI: HPET D9649CB0, 0038 (r1 ALASKA    A M I  1072009 AMI.        5) (XEN) Platform timer is 14.318MHz HPET
>
> I saw a hang at a similar place in the boot process when trying to boot xen-on-xen for our test system. On a hunch I was going to to try recompiling
> without the PVHVM PCI related driver (pci-platform ? platform-pci ? ) before saying anything about it.
>
> Since you tried changing the clock source I'm wondering is that the boot issue is unrelated to the clock source, in which case you may get a better
> idea of what's hanging by comparing the boot logs from 3.18 to 4.9 and seeing what's present in 3.18 but not in 4.9. Presumably the messages in the
> 3.18 but not 4.9 logs are either removed from the kernel source or happen after whatever is hanging.

The Xen-on-xen thing is a specific problem with nested Xen; I asked on
xen-devel and was pointed to this commit.

Unfortunately it's pretty unlikely this one will help Chris.

But perhaps, Chris, if you follow my example and post a bug report to
xen-devel (with serial output from Xen and the guest kernel), someone
may be able to find a patch which fixes the problem.

 -George
commit eb857975e4eb182a145764d2d06acff6ef696494
Author: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Commit: George Dunlap <george.dunlap@xxxxxxxxxx>

    partially revert "xen: Remove event channel notification through Xen PCI platform device"
    
    Commit 72a9b186292d ("xen: Remove event channel notification through Xen
    PCI platform device") broke Linux when booting as Dom0 on Xen in a
    nested Xen environment (Xen installed inside a Xen VM). In this
    scenario, Linux is a PV guest, but at the same time it uses the
    platform-pci driver to receive notifications from L0 Xen. vector
    callbacks are not available because L1 Xen doesn't allow them.
    
    Partially revert the offending commit, by restoring IRQ based
    notifications for PV guests only. I restored only the code which is
    strictly needed and replaced the xen_have_vector_callback checks within
    it with xen_pv_domain() checks.
    
    Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index b59c9455..549c618 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -42,6 +42,7 @@
 static unsigned long platform_mmio;
 static unsigned long platform_mmio_alloc;
 static unsigned long platform_mmiolen;
+static uint64_t callback_via;
 
 static unsigned long alloc_xen_mmio(unsigned long len)
 {
@@ -54,6 +55,51 @@ static unsigned long alloc_xen_mmio(unsigned long len)
 	return addr;
 }
 
+static uint64_t get_callback_via(struct pci_dev *pdev)
+{
+	u8 pin;
+	int irq;
+
+	irq = pdev->irq;
+	if (irq < 16)
+		return irq; /* ISA IRQ */
+
+	pin = pdev->pin;
+
+	/* We don't know the GSI. Specify the PCI INTx line instead. */
+	return ((uint64_t)0x01 << HVM_CALLBACK_VIA_TYPE_SHIFT) | /* PCI INTx identifier */
+		((uint64_t)pci_domain_nr(pdev->bus) << 32) |
+		((uint64_t)pdev->bus->number << 16) |
+		((uint64_t)(pdev->devfn & 0xff) << 8) |
+		((uint64_t)(pin - 1) & 3);
+}
+
+static irqreturn_t do_hvm_evtchn_intr(int irq, void *dev_id)
+{
+	xen_hvm_evtchn_do_upcall();
+	return IRQ_HANDLED;
+}
+
+static int xen_allocate_irq(struct pci_dev *pdev)
+{
+	return request_irq(pdev->irq, do_hvm_evtchn_intr,
+			IRQF_NOBALANCING | IRQF_TRIGGER_RISING,
+			"xen-platform-pci", pdev);
+}
+
+static int platform_pci_resume(struct pci_dev *pdev)
+{
+	int err;
+	if (!xen_pv_domain())
+		return 0;
+	err = xen_set_callback_via(callback_via);
+	if (err) {
+		dev_err(&pdev->dev, "platform_pci_resume failure!\n");
+		return err;
+	}
+	return 0;
+}
+
 static int platform_pci_probe(struct pci_dev *pdev,
 			      const struct pci_device_id *ent)
 {
@@ -92,6 +138,28 @@ static int platform_pci_probe(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
+	/* 
+	 * Xen HVM guests always use the vector callback mechanism.
+	 * L1 Dom0 in a nested Xen environment is a PV guest inside in an
+	 * HVM environment. It needs the platform-pci driver to get
+	 * notifications from L0 Xen, but it cannot use the vector callback
+	 * as it is not exported by L1 Xen.
+	 */
+	if (xen_pv_domain()) {
+		ret = xen_allocate_irq(pdev);
+		if (ret) {
+			dev_warn(&pdev->dev, "request_irq failed err=%d\n", ret);
+			goto out;
+		}
+		callback_via = get_callback_via(pdev);
+		ret = xen_set_callback_via(callback_via);
+		if (ret) {
+			dev_warn(&pdev->dev, "Unable to set the evtchn callback "
+					 "err=%d\n", ret);
+			goto out;
+		}
+	}
+
 	max_nr_gframes = gnttab_max_grant_frames();
 	grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
 	ret = gnttab_setup_auto_xlat_frames(grant_frames);
@@ -123,6 +191,9 @@ static struct pci_driver platform_driver = {
 	.name =           DRV_NAME,
 	.probe =          platform_pci_probe,
 	.id_table =       platform_pci_tbl,
+#ifdef CONFIG_PM
+	.resume_early =   platform_pci_resume,
+#endif
 };
 
 static int __init platform_pci_init(void)
_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos-virt

[Index of Archives]     [CentOS Users]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]

  Powered by Linux