Re: [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 5/5/24 3:07 PM, Weifeng Liu wrote:
> v3 changes:
> * better formatting, nothing special
> 
> v2 changes:
> * resolves Andy's comments
> 
> Original letter:
> 
> Greetings,
> 
> This series is intended to remedy a race condition where surface
> aggregator module (SAM) which is a serdev driver could fail to probe if
> the underlying UART port is not ready to open.  In such circumstance,
> invoking serdev_device_open() gets errno -ENXIO, leading to failure in
> probing of SAM.  However, if the probe is retried in a short delay,
> serdev_device_open() would work as expected and everything just goes
> fine.  As a workaround, adding the serial driver (8250_dw) into
> initramfs or building it into the kernel image significantly mitigates
> the likelihood of encountering this race condition, as in this way the
> serial device would be initialized much earlier than probing of SAM.
> 
> However, IMO we should reliably avoid this sort of race condition.  A
> good way is returning -EPROBE_DEFER when serdev_device_open returns
> -ENXIO so that the kernel will be able to retry the probing later.  This
> is what the first patch tries to do.
> 
> Though this solution might be a good enough solution for this specific
> issue, I am wondering why this kind of race condition could ever happen,
> i.e., why a serdes device could be not ready yet at the moment the
> serdev driver gets called and tries to bind it.  And even if this is an
> expected behavior how serdev driver works, I do feel it a little bit
> weird that we need to identify serdev_device_open() returning -ENXIO as
> non-fatal error and thus return -EPROBE_DEFER manually in such case, as
> I don't see this sort of behavior in other serdev driver.  Maximilian
> and Hans suggested discussing the root cause of the race condition here.
> I will be grateful if you could provide some reasoning and insights on
> this.
> 
> Following is the code path when the issue occurs:
> 
> 	ssam_serial_hub_probe()
> 	serdev_device_open()
> 	ctrl->ops->open()	/* this callback being ttyport_open() */
> 	tty->ops->open()	/* this callback being uart_open() */
> 	tty_port_open()
> 	port->ops->activate()	/* this callback being uart_port_activate() */
> 	Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.
> 
> I notice that flag UPF_DEAD would be set in serial_core_register_port()
> during calling serial_core_add_one_port() but don't have much idea
> what's going on under the hood.

Thank you this explanation + Andy's questions about this were quite
useful. I think I have found the root cause of this problem and I have
attached a patch which should fix this.

After dropping your own fix from your local kernel you should be able
to reproduce this 100% of the time by making the surface_aggregator
module builtin (CONFIG_SURFACE_AGGREGATOR=y) while making 8250_dw
a module (CONFIG_SERIAL_8250_DW=m). Although I'm not sure if the uart
will then not simply be initialzed as something generic. You could also
try building both into the kernel and see if the issue reproduces then.

Once you can reproduce this reliably, give the attached patch a try
to fix this please.

Regards,

Hans



> 
> Additionally, add logs to the probe procedure of SAM in the second
> patch, hoping it could help provide context to user when something
> miserable happens.
> 
> Context of this series is available in [1].
> 
> Best regards,
> Weifeng
> 
> Weifeng Liu (2):
>   platform/surface: aggregator: Defer probing when serdev is not ready
>   platform/surface: aggregator: Log critical errors during SAM probing
> 
>  drivers/platform/surface/aggregator/core.c | 53 ++++++++++++++++------
>  1 file changed, 39 insertions(+), 14 deletions(-)
> 
From 256e1b83d4a2015b16b036f2aaa0c3c6f6c63e4a Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@xxxxxxxxxx>
Date: Mon, 6 May 2024 16:20:45 +0200
Subject: [PATCH] serial: Clear UPF_DEAD before calling
 tty_port_register_device_attr_serdev()

If a serdev_device_driver is already loaded for a serdev_tty_port when it
gets registered by tty_port_register_device_attr_serdev() then that
driver's probe() method will be called immediately.

The serdev_device_driver's probe() method should then be able to call
serdev_device_open() successfully, but because UPF_DEAD is still dead
serdev_device_open() will fail with -ENXIO in this scenario:

  serdev_device_open()
  ctrl->ops->open()	/* this callback being ttyport_open() */
  tty->ops->open()	/* this callback being uart_open() */
  tty_port_open()
  port->ops->activate()	/* this callback being uart_port_activate() */
  Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.

Fix this be clearing UPF_DEAD before tty_port_register_device_attr_serdev()
note this only moves up the UPD_DEAD clearing a small bit, before:

  tty_port_register_device_attr_serdev();
  mutex_unlock(&tty_port.mutex);
  uart_port.flags &= ~UPF_DEAD;
  mutex_unlock(&port_mutex);

after:

  uart_port.flags &= ~UPF_DEAD;
  tty_port_register_device_attr_serdev();
  mutex_unlock(&tty_port.mutex);
  mutex_unlock(&port_mutex);

Reported-by: Weifeng Liu <weifeng.liu.z@xxxxxxxxx>
Closes: https://lore.kernel.org/platform-driver-x86/20240505130800.2546640-1-weifeng.liu.z@xxxxxxxxx/
Cc: Maximilian Luz <luzmaximilian@xxxxxxxxx>
Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
---
 drivers/tty/serial/serial_core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index ff85ebd3a007..d9424fe6513b 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3196,6 +3196,9 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
 	if (uport->attr_group)
 		uport->tty_groups[1] = uport->attr_group;
 
+	/* Ensure serdev drivers can call serdev_device_open() right away */
+	uport->flags &= ~UPF_DEAD;
+
 	/*
 	 * Register the port whether it's detected or not.  This allows
 	 * setserial to be used to alter this port's parameters.
@@ -3206,6 +3209,7 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
 	if (!IS_ERR(tty_dev)) {
 		device_set_wakeup_capable(tty_dev, 1);
 	} else {
+		uport->flags |= UPF_DEAD;
 		dev_err(uport->dev, "Cannot register tty device on line %d\n",
 		       uport->line);
 	}
@@ -3411,8 +3415,6 @@ int serial_core_register_port(struct uart_driver *drv, struct uart_port *port)
 	if (ret)
 		goto err_unregister_port_dev;
 
-	port->flags &= ~UPF_DEAD;
-
 	mutex_unlock(&port_mutex);
 
 	return 0;
-- 
2.44.0


[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux