(I like your analysis and a workaround, however there are quite a few style minors, see below) On Fri, Dec 1, 2023 at 5:56 PM Jan Kundrát <jan.kundrat@xxxxxxxxx> wrote: > > The TL;DR summary is that the regmap_noinc_write spills over the data regmap_noinc_write() (we refer to function as func() in the text and code comments) > that are correctly written to the HW also to the following registers in > the regcache. As a result, regcache then contains user-controlled > garbage which will be used later for bit updates on unrelated registers. > > This patch is a "wrong" fix; a real fix would involve fixing regmap > and/or regcache, but that code has too many indirections for my little > mind. > > I was investigating a regression that happened somewhere between 5.12.4 v5.12.4 (you even used v later) > (plus 14 of our patches) and v6.5.9 (plus 7 of our patches). Our > MAX14830 UART would work fine the first time, but when our application > opens the UART the second time it just wouldn't send anything over the > physical TX pin. With the help of a logical analyzer, I found out that > the kernel was sending value 0xcd to the MODE1 register, which on this > chip is a request to set the UART's TX pin to the Hi-Z mode and to > switch off RX completely. That's certainly not the intention of the > code, but that's what I was seeing on the physical SPI bus, and also in > the log when I instrumented the regmap layer. > > It turned out that one of the *data* bytes which were sent over the UART > was 0xdd, and that this *data byte* somehow ended up in the regcache's > idea about the value within the MODE1 register. When the UART is opened > up the next time and max310x_startup updates a single unrelated bit in > MODE1, that code consults the regcache, notices the 0xdd data byte in > there, and ends up sending 0xcd over SPI. > > Here's what dump_stack() shows: According to the Submitting Patches, this can be reduced to show only important lines. > max310x spi1.2: regcache_write: reg 0x9 value 0xdd > max310x spi1.2: PWNED > CPU: 1 PID: 26 Comm: kworker/1:1 Not tainted 6.5.9-7-g9e090fe75fd8 #7 This is not important... > Hardware name: Marvell Armada 380/385 (Device Tree) > Workqueue: events max310x_tx_proc > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from regcache_write+0xc0/0xc4 ...nor these... > regcache_write from _regmap_raw_write_impl+0x178/0x828 > _regmap_raw_write_impl from _regmap_raw_write+0xb8/0x134 > _regmap_raw_write from regmap_noinc_write+0x130/0x178 > regmap_noinc_write from max310x_tx_proc+0xd4/0x1a4 > max310x_tx_proc from process_one_work+0x21c/0x4e4 > process_one_work from worker_thread+0x50/0x54c > worker_thread from kthread+0xe0/0xfc > kthread from ret_from_fork+0x14/0x28 ...neither of these. > Clearly, regmap_noinc_write of a register 0x00 (that's the TX FIFO on () > this chip) has no business updating register 0x09, but that's what was > happening here. The regmap_config is already set up in a way that > register 0x00 is marked precious and volatile, so it has no business > going through the cache at all. Also, the documentation for > regmap_noinc_write suggests that this driver was using the regmap () > infrastructure correctly, and that the real bug is somewhere in > regmap/regcache where a call to regmap_noinc_write end up updating an () ends > unrelated register in regcache. > > Until regmap/regcache is fixed, let's just use an adapted version of the > old code that bypasses regmap altogether, and just sends out an SPI > transaction. > > This is related to my commit 3f42b142ea1171967e40e10e4b0241c0d6d28d41 > ("serial: max310x: fix IO data corruption in batched operations") which 12 characters of the SHA-1 is enough (for now). > introduced usage of regmap_noinc_write() to this driver. That commit is > a fixup of commit 285e76fc049c4d32c772eea9460a7ef28a193802 ("serial: > max310x: use regmap methods for SPI batch operations") which started Ditto. > using regmap_raw_write(), which was however also a wrong function. Oh, cool! Here you used () :-) > Fixes: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") > Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations") > Signed-off-by: Jan Kundrát <jan.kundrat@xxxxxxxxx> > To: Mark Brown <broonie@xxxxxxxxxx> > To: Cosmin Tanislav <cosmin.tanislav@xxxxxxxxxx> > To: linux-serial@xxxxxxxxxxxxxxx > Cc: Andy Shevchenko <andy.shevchenko@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx "To:" lines are so unusual. I have kinda "smart" script [1] that has a heuristics that works in 99.9+% cases well. Consider using it or taking an idea from it. [1]: https://github.com/andy-shev/home-bin-tools/blob/master/ge2maintainer.sh ... > static void max310x_batch_write(struct uart_port *port, u8 *txbuf, unsigned int len) > { > - struct max310x_one *one = to_max310x_port(port); > - > - regmap_noinc_write(one->regmap, MAX310X_THR_REG, txbuf, len); > + const u8 header = (port->iobase * 0x20 + MAX310X_THR_REG) | MAX310X_WRITE_BIT; > + struct spi_transfer xfer[] = { > + { > + .tx_buf = &header, > + .len = 1, > + }, > + { > + .tx_buf = txbuf, > + .len = len, > + }, > + }; > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); Can you add a comment like FIXME explaining why regmap is not used here, otherwise I will expect some clever guy to "fix" this again. > } > > static void max310x_batch_read(struct uart_port *port, u8 *rxbuf, unsigned int len) > { > - struct max310x_one *one = to_max310x_port(port); > - > - regmap_noinc_read(one->regmap, MAX310X_RHR_REG, rxbuf, len); > + const u8 header = port->iobase * 0x20 + MAX310X_RHR_REG; > + struct spi_transfer xfer[] = { > + { > + .tx_buf = &header, > + .len = 1, > + }, > + { > + .rx_buf = rxbuf, > + .len = len, > + }, > + }; > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); Ditto. > } -- With Best Regards, Andy Shevchenko