Re: [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, 2024-05-06 at 16:41 +0200, Hans de Goede wrote:
> Hi,
> 
> On 5/5/24 3:07 PM, Weifeng Liu wrote:
> > v3 changes:
> > * better formatting, nothing special
> > 
> > v2 changes:
> > * resolves Andy's comments
> > 
> > Original letter:
> > 
> > Greetings,
> > 
> > This series is intended to remedy a race condition where surface
> > aggregator module (SAM) which is a serdev driver could fail to probe if
> > the underlying UART port is not ready to open.  In such circumstance,
> > invoking serdev_device_open() gets errno -ENXIO, leading to failure in
> > probing of SAM.  However, if the probe is retried in a short delay,
> > serdev_device_open() would work as expected and everything just goes
> > fine.  As a workaround, adding the serial driver (8250_dw) into
> > initramfs or building it into the kernel image significantly mitigates
> > the likelihood of encountering this race condition, as in this way the
> > serial device would be initialized much earlier than probing of SAM.
> > 
> > However, IMO we should reliably avoid this sort of race condition.  A
> > good way is returning -EPROBE_DEFER when serdev_device_open returns
> > -ENXIO so that the kernel will be able to retry the probing later.  This
> > is what the first patch tries to do.
> > 
> > Though this solution might be a good enough solution for this specific
> > issue, I am wondering why this kind of race condition could ever happen,
> > i.e., why a serdes device could be not ready yet at the moment the
> > serdev driver gets called and tries to bind it.  And even if this is an
> > expected behavior how serdev driver works, I do feel it a little bit
> > weird that we need to identify serdev_device_open() returning -ENXIO as
> > non-fatal error and thus return -EPROBE_DEFER manually in such case, as
> > I don't see this sort of behavior in other serdev driver.  Maximilian
> > and Hans suggested discussing the root cause of the race condition here.
> > I will be grateful if you could provide some reasoning and insights on
> > this.
> > 
> > Following is the code path when the issue occurs:
> > 
> > 	ssam_serial_hub_probe()
> > 	serdev_device_open()
> > 	ctrl->ops->open()	/* this callback being ttyport_open() */
> > 	tty->ops->open()	/* this callback being uart_open() */
> > 	tty_port_open()
> > 	port->ops->activate()	/* this callback being uart_port_activate() */
> > 	Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.
> > 
> > I notice that flag UPF_DEAD would be set in serial_core_register_port()
> > during calling serial_core_add_one_port() but don't have much idea
> > what's going on under the hood.
> 
> Thank you this explanation + Andy's questions about this were quite
> useful. I think I have found the root cause of this problem and I have
> attached a patch which should fix this.
> 
> After dropping your own fix from your local kernel you should be able
> to reproduce this 100% of the time by making the surface_aggregator
> module builtin (CONFIG_SURFACE_AGGREGATOR=y) while making 8250_dw
> a module (CONFIG_SERIAL_8250_DW=m). Although I'm not sure if the uart
> will then not simply be initialzed as something generic. You could also
> try building both into the kernel and see if the issue reproduces then.
> 

As per your instructions, I

* removed my fixes,
* built surface_aggregator into the kernal,
* built 8250_dw as a module and
* removed 8250_dw from initramfs as well,

and found the occurrence rate of the issue was around 30%.  With your
patch applied, the issue disappeared completely.

Tested-by: Weifeng Liu <weifeng.liu.z@xxxxxxxxx>

Really glad to see the final solution in serial core for this issue. 
This might also help other serdev drivers.  Thank you all.

Best regards,
Weifeng

> Once you can reproduce this reliably, give the attached patch a try
> to fix this please.
> 
> Regards,
> 
> Hans
> 
> 
> 
> > 
> > Additionally, add logs to the probe procedure of SAM in the second
> > patch, hoping it could help provide context to user when something
> > miserable happens.
> > 
> > Context of this series is available in [1].
> > 
> > Best regards,
> > Weifeng
> > 
> > Weifeng Liu (2):
> >   platform/surface: aggregator: Defer probing when serdev is not ready
> >   platform/surface: aggregator: Log critical errors during SAM probing
> > 
> >  drivers/platform/surface/aggregator/core.c | 53 ++++++++++++++++------
> >  1 file changed, 39 insertions(+), 14 deletions(-)






[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux