Hi, On 3/28/24 8:10 AM, Hans de Goede wrote: > Hi, > > On 3/18/24 7:52 PM, Peter Collingbourne wrote: >> On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko >> <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote: >>> >>> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote: >>>> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at >>>> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and >>>> Cherry Trail (CHT) SoCs. >>>> >>>> Before this change the RTL8732BS Bluetooth HCI which is found >>>> connected over the dw UART on both BYT and CHT boards works properly: >>>> >>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723 >>>> Bluetooth: hci0: RTL: rom_version status=0 version=1 >>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin >>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin >>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508 >>>> Bluetooth: hci0: RTL: fw version 0x365d462e >>>> >>>> where as after this change probing it fails: >>>> >>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723 >>>> Bluetooth: hci0: RTL: rom_version status=0 version=1 >>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin >>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin >>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508 >>>> Bluetooth: hci0: command 0xfc20 tx timeout >>>> Bluetooth: hci0: RTL: download fw command failed (-110) >>>> >>>> Revert the changes to fix this regression. >>> >>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> >>> >>>> Note it is not entirely clear to me why this commit is causing >>>> this issue. Maybe probe() needs to explicitly set the clk rate >>>> which it just got (that feels like a clk driver issue) or maybe >>>> the issue is that unless setup before hand by firmware / >>>> the bootloader serial8250_update_uartclk() needs to be called >>>> at least once to setup things ? Note that probe() does not call >>>> serial8250_update_uartclk(), this is only called from the >>>> dw8250_clk_notifier_cb() >>>> >>>> This requires more debugging which is why I'm proposing >>>> a straight revert to fix the regression ASAP and then this >>>> can be investigated further. >>> >>> Yep. When I reviewed the original submission I was got puzzled with >>> the CLK APIs. Now I might remember that ->set_rate() can't be called >>> on prepared/enabled clocks and it's possible the same limitation >>> is applied to ->round_rate(). >>> >>> I also tried to find documentation about the requirements for those >>> APIs, but failed (maybe was not pursuing enough, dunno). If you happen >>> to know the one, can you point on it? >> >> To me it seems to be unlikely to be related to round_rate(). It seems >> more likely that my patch causes us to never actually set the clock >> rate (e.g. because uartclk was initialized to the intended clock rate >> instead of the current actual clock rate). > > I agree that the likely cause is that we never set the clk-rate. I'm not > sure if the issue is us never actually calling clk_set_rate() or if > the issue is that by never calling clk_set_rate() dw8250_clk_notifier_cb() > never gets called and thus we never call serial8250_update_uartclk() > >> It should be possible to >> confirm by checking the behavior with my patch with `&& p->uartclk != >> rate` removed, which I would expect to unbreak Hans's scenario. If my >> hypothesis is correct, the fix might involve querying the clock with >> clk_get_rate() in the if instead of reading from uartclk. > > Querying the clk with clk_get_rate() instead of reading it from > uartclk will not help as uartclk gets initialized with clk_get_rate() > in dw8250_probe(). So I believe that in my scenario clk_get_rate() > already returns the desired rate causing us to never call clk_set_rate() > at all which leaves 2 possible root causes for the regressions: > > 1. The clk generator has non readable registers and the returned > rate from clk_get_rate() is a default rate and the actual hw is > programmed differently, iow we need to call clk_set_rate() at > least once on this hw to ensure that the clk generator is prggrammed > properly. > > 2. The 8250 code is not working as it should because > serial8250_update_uartclk() has never been called. Ok, so it looks like this actually is an issue with how clk_round_rate() works on this hw (atm, maybe the clk driver needs fixing). I have added the following to debug this: diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index a3acbf0f5da1..3152872e50b2 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -306,6 +306,8 @@ static void dw8250_clk_work_cb(struct work_struct *work) if (rate <= 0) return; + pr_info("uartclk work_cb clk_get_rate() returns: %ld\n", rate); + up = serial8250_get_port(d->data.line); serial8250_update_uartclk(&up->port, rate); @@ -353,11 +355,15 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios, { unsigned long newrate = tty_termios_baud_rate(termios) * 16; struct dw8250_data *d = to_dw8250_data(p->private_data); + unsigned long currentrate = clk_get_rate(d->clk); long rate; int ret; + rate = clk_round_rate(d->clk, newrate); - if (rate > 0 && p->uartclk != rate) { + pr_info("uartclk set_termios new: %ld new-rounded: %ld current %ld cached %d\n", + newrate, rate, currentrate, p->uartclk); + if (rate > 0) { clk_disable_unprepare(d->clk); /* * Note that any clock-notifer worker will block in @@ -593,6 +599,8 @@ static int dw8250_probe(struct platform_device *pdev) if (!p->uartclk) return dev_err_probe(dev, -EINVAL, "clock rate not defined\n"); + pr_info("uartclk initial cached %d\n", p->uartclk); + data->pclk = devm_clk_get_optional_enabled(dev, "apb_pclk"); if (IS_ERR(data->pclk)) return PTR_ERR(data->pclk); And then I get the following output: [ 3.119182] uartclk initial cached 44236800 [ 3.139923] uartclk work_cb clk_get_rate() returns: 44236800 [ 3.152469] uartclk initial cached 44236800 [ 3.172165] uartclk work_cb clk_get_rate() returns: 44236800 [ 34.128257] uartclk set_termios new: 153600 new-rounded: 44236800 current 44236800 cached 44236800 [ 34.130039] uartclk work_cb clk_get_rate() returns: 153600 [ 34.131975] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600 [ 34.132091] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600 [ 34.132140] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600 [ 34.132187] uartclk set_termios new: 1843200 new-rounded: 153600 current 153600 cached 153600 [ 34.133536] uartclk work_cb clk_get_rate() returns: 1843200 Notice how the new-rounded just returns the current rate of the clk, rather then a rounded value of new. I'm not familiar enough with the clk framework to debug this further. Peter, IMHO we really must revert your commit since it is completely breaking UARTs on many different Intel boards. Can you please give your ack for reverting this for now ? Regards, Hans p.s. For anyone who wants to dive into the clk_round_rate() issue deeper, the code registering the involved clks is here: drivers/acpi/acpi_lpss.c: register_device_clock() And for the clocks in question fixed_clk_rate is 0 and both the LPSS_CLK_GATE and LPSS_CLK_DIVIDER flags are set, so for a single UART I get: [root@fedora ~]# ls -d /sys/kernel/debug/clk/80860F0A:01* /sys/kernel/debug/clk/80860F0A:01 /sys/kernel/debug/clk/80860F0A:01-update /sys/kernel/debug/clk/80860F0A:01-div With the 80860F0A:01-update clk being the clk which is actually used / controlled by the 8250_dw.c code.