Sat, Jun 26, 2021 at 06:11:32AM +0200, Maciej W. Rozycki kirjoitti: > Oxford Semiconductor PCIe (Tornado) serial port devices are driven by a > fixed 62.5MHz clock input derived from the 100MHz PCI Express clock. > > We currently drive the device using its default oversampling rate of 16 > and the clock prescaler disabled, consequently yielding the baud base of > 3906250. This base is inadequate for some of the high-speed baud rates > such as 460800bps, for which the closest rate possible can be obtained > by dividing the baud base by 8, yielding the baud rate of 488281.25bps, > which is off by 5.9638%. This is enough for data communication to break > with the remote end talking actual 460800bps where missed stop bits have > been observed. > > We can do better however, by taking advantage of a reduced oversampling > rate, which can be set to any integer value from 4 to 16 inclusive by > programming the TCR register, and by using the clock prescaler, which > can be set to any value from 1 to 63.875 in increments of 0.125 in the > CPR/CPR2 register pair[1][2][3][4]. The prescaler has to be explicitly > enabled though by setting bit 7 in the MCR or otherwise it is bypassed > as if the value of 1 was used. > > By using these parameters rates from 15625000bps down to 1bps can be > obtained, with either exact or highly-accurate actual bit rates for > standard and many non-standard rates. > > Make use of these features then as follows: > > - Set the baud base to 15625000, reflecting the minimum oversampling > rate of 4 with the clock prescaler and divisor both set to 1. > > - Set the MCR mask and force bits in the UART template so as to have > MCR[7] always set and then have 8250 core propagate those settings, if > supplied as non-zero, overriding the ALPHA_KLUDGE_MCR default. > > - Override the `get_divisor' handler and determine a good combination of > parameters by using a lookup table with predetermined value pairs of > the oversampling rate and the clock prescaler and finding a pair that > divides the input clock such that the quotient, when rounded to the > nearest integer, deviates the least from the exact result. Calculate > the clock divisor accordingly. > > Scale the resulting oversampling rate (only by powers of two) if > possible so as to maximise it, reducing the divisor accordingly, and > avoid a divisor overflow for very low baud rates by scaling the > oversampling rate and/or the prescaler even if that causes some > accuracy loss. > Also handle the historic spd_cust feature so as to allow one to set > all the three parameters manually to arbitrary values, by keeping the > low 16 bits for the divisor and then putting TCR in bits 19:16 and > CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as > to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 1.000. > This preserves compatibility with any existing setups, that is where > requesting a custom divisor that only has any bits set among the low > 16 the oversampling rate of 16 and the clock prescaler of 1 will be > used. Please no. We really would like to get rid of that ugly hack. The BOTHER exists for ages. > Finally abuse the `frac' argument to store the determined bit patterns > for the TCR, CPR and CPR2 registers. > > - Override the `set_divisor' handler so as to set the TCR, CPR and CPR2 > registers from the `frac' value supplied. Set the divisor as usually. > > With the baud base set to 15625000 and the unsigned 16-bit UART_DIV_MAX > limitation imposed by `serial8250_get_baud_rate' standard baud rates > below 300bps become unavailable in the regular way, e.g. the rate of > 200bps requires the baud base to be divided by 78125 and that is beyond > the unsigned 16-bit range. The historic spd_cust feature can still be > used to obtain such rates if so required. > > Here are the figures for the standard and some non-standard baud rates > (including those quoted in Oxford Semiconductor documentation), giving > the requested rate (r), the actual rate yielded (a) and its deviation > from the requested rate (d), and the values of the oversampling rate > (tcr), the clock prescaler (cpr) and the divisor (div) produced by the > new `get_divisor' handler: > > r: 15625000, a: 15625000.00, d: 0.0000%, tcr: 4, cpr: 1.000, div: 1 > r: 12500000, a: 12500000.00, d: 0.0000%, tcr: 5, cpr: 1.000, div: 1 > r: 10416666, a: 10416666.67, d: 0.0000%, tcr: 6, cpr: 1.000, div: 1 > r: 8928571, a: 8928571.43, d: 0.0000%, tcr: 7, cpr: 1.000, div: 1 > r: 7812500, a: 7812500.00, d: 0.0000%, tcr: 8, cpr: 1.000, div: 1 > r: 4000000, a: 4000000.00, d: 0.0000%, tcr: 5, cpr: 3.125, div: 1 > r: 3686400, a: 3676470.59, d: -0.2694%, tcr: 8, cpr: 2.125, div: 1 > r: 3500000, a: 3496503.50, d: -0.0999%, tcr: 13, cpr: 1.375, div: 1 > r: 3000000, a: 2976190.48, d: -0.7937%, tcr: 14, cpr: 1.500, div: 1 > r: 2500000, a: 2500000.00, d: 0.0000%, tcr: 10, cpr: 2.500, div: 1 > r: 2000000, a: 2000000.00, d: 0.0000%, tcr: 10, cpr: 3.125, div: 1 > r: 1843200, a: 1838235.29, d: -0.2694%, tcr: 16, cpr: 2.125, div: 1 > r: 1500000, a: 1492537.31, d: -0.4975%, tcr: 5, cpr: 8.375, div: 1 > r: 1152000, a: 1152073.73, d: 0.0064%, tcr: 14, cpr: 3.875, div: 1 > r: 921600, a: 919117.65, d: -0.2694%, tcr: 16, cpr: 2.125, div: 2 > r: 576000, a: 576036.87, d: 0.0064%, tcr: 14, cpr: 3.875, div: 2 > r: 460800, a: 460829.49, d: 0.0064%, tcr: 7, cpr: 3.875, div: 5 > r: 230400, a: 230414.75, d: 0.0064%, tcr: 14, cpr: 3.875, div: 5 > r: 115200, a: 115207.37, d: 0.0064%, tcr: 14, cpr: 1.250, div: 31 > r: 57600, a: 57603.69, d: 0.0064%, tcr: 8, cpr: 3.875, div: 35 > r: 38400, a: 38402.46, d: 0.0064%, tcr: 14, cpr: 3.875, div: 30 > r: 19200, a: 19201.23, d: 0.0064%, tcr: 8, cpr: 3.875, div: 105 > r: 9600, a: 9600.06, d: 0.0006%, tcr: 9, cpr: 1.125, div: 643 > r: 4800, a: 4799.98, d: -0.0004%, tcr: 7, cpr: 2.875, div: 647 > r: 2400, a: 2400.02, d: 0.0008%, tcr: 9, cpr: 2.250, div: 1286 > r: 1200, a: 1200.00, d: 0.0000%, tcr: 14, cpr: 2.875, div: 1294 > r: 300, a: 300.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 7215 > r: 200, a: 200.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 15625 > r: 150, a: 150.00, d: 0.0000%, tcr: 13, cpr: 2.250, div: 14245 > r: 134, a: 134.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 16153 > r: 110, a: 110.00, d: 0.0000%, tcr: 12, cpr: 1.000, div: 47348 > r: 75, a: 75.00, d: 0.0000%, tcr: 4, cpr: 5.875, div: 35461 > r: 50, a: 50.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 62500 > r: 25, a: 25.00, d: 0.0000%, tcr: 16, cpr: 2.500, div: 62500 > r: 4, a: 4.00, d: 0.0000%, tcr: 16, cpr: 20.000, div: 48828 > r: 2, a: 2.00, d: 0.0000%, tcr: 16, cpr: 40.000, div: 48828 > r: 1, a: 1.00, d: 0.0000%, tcr: 16, cpr: 63.875, div: 61154 > > References: > > [1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor, > Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65 > > [2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port", > Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode", > p. 20 > > [3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford > Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20 > > [4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford > Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20 Is it possible to reduce a commit message by shifting some stuff to the dedicated documentation? ... > drivers/tty/serial/8250/8250_pci.c | 331 ++++++++++++++++++++++++++++-------- Can we, please, split the quirk driver first as it's done in a lot of examples (_exar, _mid, _lpss, _...) and then modify it? ... > +/* Tornado-specific constants for the TCR and CPR registers; see below. */ > +#define OXSEMI_TORNADO_TCR_MASK 0xf > +#define OXSEMI_TORNADO_CPR_MASK 0x1ff > +#define OXSEMI_TORNADO_CPR_MIN 8 > + > +/* > + * Determine the oversampling rate, the clock prescaler, and the clock > + * divisor for the requested baud rate. The clock rate is 62.5 MHz, > + * which is four times the baud base, and the prescaler increments in > + * steps of 1/8. Therefore to make calculations on integers we need > + * to use a scaled clock rate, which is the baud base multiplied by 32 > + * (or our assumed UART clock rate multiplied by 2). > + * > + * The allowed oversampling rates are from 4 up to 16 inclusive (values > + * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows > + * values between 1.000 and 63.875 inclusive (operation for values from > + * 0.000 to 0.875 has not been specified). The clock divisor is the usual > + * unsigned 16-bit integer. > + * > + * For the most accurate baud rate we use a table of predetermined > + * oversampling rates and clock prescalers that records all possible > + * products of the two parameters in the range from 4 up to 255 inclusive, > + * and additionally 335 for the 1500000bps rate, with the prescaler scaled > + * by 8. The table is sorted by the decreasing value of the oversampling > + * rate and ties are resolved by sorting by the decreasing value of the > + * product. This way preference is given to higher oversampling rates. > + * > + * We iterate over the table and choose the product of an oversampling > + * rate and a clock prescaler that gives the lowest integer division > + * result deviation, or if an exact integer divider is found we stop > + * looking for right away. We do some fixup if the resulting clock > + * divisor required would be out of its unsigned 16-bit integer range. > + * > + * Finally we abuse the supposed fractional part returned to encode the > + * 4-bit value of the oversampling rate and the 9-bit value of the clock > + * prescaler which will end up in the TCR and CPR/CPR2 registers. > + */ > +static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port, > + unsigned int baud, > + unsigned int *frac) > +{ > + static u8 p[][2] = { > + { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, }, > + { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, }, > + { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, }, > + { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, }, > + { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, }, > + { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, }, > + { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, }, > + { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, }, > + { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, }, > + { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, }, > + { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, }, > + { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, }, > + { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, }, > + { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, }, > + { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, }, > + { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, }, > + { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, }, > + { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, }, > + { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, }, > + { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, }, > + { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, }, > + { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, }, > + { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, }, > + { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, }, > + { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, }, > + { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, }, > + { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, }, > + { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, }, > + { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, }, > + { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, }, > + { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, }, > + { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, }, > + { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, }, > + { 4, 9, }, { 4, 8, }, > + }; Oh là là! Please, use rational best approximation algorithm instead (check CONFIG_RATIONAL). -- With Best Regards, Andy Shevchenko