Hi Dirk, On Fri, Mar 29, 2019 at 1:13 PM Dirk Behme <dirk.behme@xxxxxxxxxxxx> wrote: > On 29.03.2019 10:46, Geert Uytterhoeven wrote: > > On Fri, Mar 29, 2019 at 8:05 AM Dirk Behme <dirk.behme@xxxxxxxxxxxx> wrote: > >> On 28.03.2019 12:30, Dirk Behme wrote: > >>> On 28.03.2019 11:16, Dirk Behme wrote: > >>>> On 28.03.2019 10:24, Geert Uytterhoeven wrote: > >>>>> On Wed, Mar 27, 2019 at 7:36 PM Eugeniu Rosca <erosca@xxxxxxxxxxxxxx> > >>>>> wrote: > >>>>>> We've recently switched from rcar-3.7.x to rcar-3.9.x [1] kernel and > >>>>>> the > >>>>>> latter contains this patch [2] by virtue of rcar-3.9.0 commit [3], > >>>>>> which > >>>>>> mirrors v4.18-rc1 commit [4] in mainline. > >>>>>> > >>>>>> JFYI, quite far away in the delivery chain, we've received below > >>>>>> report: > >>>>>> > >>>>>>> With this patch [2-4] there are reports about broken data > >>>>>>> communication with 115200 baud with SXM module. Reverting > >>>>>>> this patch results in successful communication, again. > >>>>>> > >>>>>> While this scarce information barely helps anybody, I thought that > >>>>>> sharing it with you might be beneficial in case you collect several > >>>>>> reports linked to this specific commit in future, meaning it > >>>>>> potentially > >>>>>> adds a regression. > >>>>>> > >>>>>> Also, if you are aware of any userland changes that might be > >>>>>> required/assumed by this patch or in case you have any alternative > >>>>>> ideas how to avoid reverting this patch, your feedback would be very > >>>>>> appreciated. > >>>>> > >>>>> Thanks for your report! > >>>>> > >>>>> [TLDR: skip to the suggested fix below; I only noticed the bug after > >>>>> writing the below paragraphs, which are still useful > >>>>> questions to > >>>>> let us reproduce the issue] > >>>>> > >>>>> Which SoC are you using? > >>>>> I assume this is on a custom board, as Salvator-X(S) and ULCB have > >>>>> external SCIF clock crystals, which allow to use a perfect 115200 bps, > >>>>> hence the affected code path is not exercised: > >>>>> > >>>>> sh-sci e6550000.serial: BRG: 115200+0 bps using DL 4 SR 32 > >>>>> sh-sci e6550000.serial: Using clk scif for 115200+0 bps > >>>>> > >>>>> Does your board have an external SCIF clock? Which frequency? > >>>>> Can you check the clock values and deviation for your configuration, by > >>>>> changing the calls to print the above information from dev_dbg() to > >>>>> dev_info()? > >>>>> > >>>>> Does adding the DIV_ROUND_CLOSEST(), as suggested in my review > >>>>> of the posted patch, help? > >>>>> Perhaps the sampling point shift is inverted? Does -shift work better? > >>>>> > >>>>> [possible solution] > >>>>> > >>>>>> + int shift = min(-8, max(7, deviation > >>>>>> / 2)); > >>>>> > >>>>> Oops, min and max are exchanged! > >>>>> > >>>>> I guess using > >>>>> > >>>>> int shift = clamp(deviation / 2, -8, 7) > >>>>> > >>>>> instead fixes the issue? > >>>> > >>>> > >>>> Uh, that was fast :) Many thanks! > >>>> > >>>> We will test this as fast as possible! But due to the long delivery > >>>> chain Eugeniu mentioned this will take some time. I'll try my best to > >>>> come back to you as fast as possible. > >>> > >>> > >>> Just for the archives: We are testing the attached patch. > >> > >> > >> * Testing the patch [5] > >> > >> - int shift = min(-8, max(7, deviation / 2)); > >> + int shift = clamp(deviation / 2, -8, 7); > >> > >> does *not* fix our issue. Or in other words: Testing was *not* successful. > > > > I'm sorry to hear that. > > > >> * However, from review point of view we think that it fixes a serious > >> bug. So maybe it should be applied, anyhow? > > > > Yes, submitted. > > > >> * Using strace we managed to get some more information about the usage > >> of the serial port [6]. With this, we are talking about 57600 and not 115200 > >> > >> * Switching to dev_info() [7] as requested above we get > >> > >> [ 0.553256] e6560000.serial: ttySC3 at MMIO 0xe6560000 (irq = 41, > >> base_baud = 0) is a hscif > >> [ 161.418527] sh-sci e6560000.serial: BRG: 9600+0 bps using DL 1462 SR 19 > >> [ 161.418543] sh-sci e6560000.serial: Using clk s3d1 for 9600+0 bps > >> [ 161.418813] sh-sci e6560000.serial: BRG: 57600-5 bps using DL 463 SR 10 > >> [ 161.418824] sh-sci e6560000.serial: Using clk s3d1 for 57600-5 bps > Do you have any idea what might be the difference between reverting > "serial: sh-sci: Support for HSCIF RX sampling point adjustment" (works) > and not reverting that (doesn't work for us), then? Before that commit, the RX sampling point was not shifted. After that commit, it was incorrectly shifted by -8[*]. With my fix, it is shifted by 7[*], to compensate for a clock rate that is slightly off. [*] In units of cycles of the sampling clock, which runs at SR * 57595 = 575950 Hz. However, doing the above calculation shows that's something wrong with the formula used by the driver: with SR = 10, the default sampling point at the center is at SR / 2 = 5, so the shift must be within [-4, +4], which is exceeded by using a value of 7. deviation = min_err * srr * last_stop / 2 / baud; With: min_err = -5 srr = 9 last_stop = 19 baud = 57600 Note that srr and baud are unsigned. Hence the multiplication and divisions are done in unsigned arithmetic, and we get deviation = 37282 instead of 0. Oops... Fixed by: - int deviation = min_err * srr * last_stop / 2 / baud; + int deviation = (int)(min_err * srr * last_stop) / 2 / + (int)baud; Before I sent a patch: Uli, shouldn't the formula use "(srr + 1)" instead of "srr", as the actual sampling rate factor is one more than the value programmed in HSSRR.SRCYC? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds