Hi, 2016-01-17 19:51 GMT+01:00 Maxime Ripard <maxime.ripard@xxxxxxxxxxxxxxxxxx>: > On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote: >> >> >> - /* Ensure that we have a parent clock fast enough */ >> >> >> + /* >> >> >> + * Ensure that the parent clock is set to twice the max speed >> >> >> + * of the spi device (possibly rounded up by the clk driver) >> >> >> + */ >> >> >> mclk_rate = clk_get_rate(sspi->mclk); >> >> >> - if (mclk_rate < (2 * tfr->speed_hz)) { >> >> >> - clk_set_rate(sspi->mclk, 2 * tfr->speed_hz); >> >> >> + if (spi->max_speed_hz != sspi->cur_max_speed || >> >> >> + mclk_rate != sspi->cur_mclk) { >> >> > >> >> > Do you need to cache the values? As far as I'm aware, you end up doing >> >> > the computation all the time anyway. >> >> >> >> By caching the values we optimize the case when a single SPI slave >> >> device (or even multiple slave devices with the same max_speed) are >> >> used multiple times in a row. In that case, we're saving two calls: >> >> clk_set_rate and clk_get_rate. I was unsure about how expensive the >> >> clk_* calls were, so I thought it would be safer use caching. But >> >> maybe it's not worth the extra code? >> > >> > Unless you can point that there's a significant performance >> > difference, I'm not sure it's worth it. >> >> I did actually notice a significant transfer latency when a new mod0 >> clock frequency is set, probably due to the __delay in >> drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the >> caching is worth it... at least for the case when there are two slave >> devices with different transfer speeds accessing the same SPI module. > > I'm not sure we even need that delay in the first place, at least not > for all the clocks using factor. > > AFAIK, the only case where it's useful is for PLL, and all of them > have a bit you can busy-wait on that will let you know when the clock > has stabilized. OK, I didn't know that. Does that mean you would like me to drop the caching from this patch and prepare a patch to remove the __delay from clk-factors? >> >> [...] >> >> >> - div = mclk_rate / (2 * tfr->speed_hz); >> >> >> - if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) { >> >> >> - if (div > 0) >> >> >> - div--; >> >> >> - >> >> >> + div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1; >> >> > >> >> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ? >> >> >> >> It is quite often, but not in all cases. The plain division rounds to >> >> the nearest integer, so it rounds down sometimes. Consider the >> >> following case: we have a slow SPI device with a spi-max-frequency of >> >> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it >> >> sets mclk to 214,285Hz. >> >> >> >> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div = >> >> 1 as the old code subtracts 1 two lines further down >> >> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000) = >> >> 3, so div = 2 if we add the -1 >> > >> > Except that you have that extra - 1 after your DIV_ROUND_UP >> > calculation in the line you add. so you have to remove 1 from that >> > line above, and then 1 again when we set the register, which ends up >> > being the exact same thing, or am I missing something? >> >> The -1 after the DIV_ROUND_UP is already the -1 that is needed to set >> the register. There's no need for another -1 after that (and there >> isn't one in the code). > > I was probably hallucinating :) My bad. > > That being said, I still have a hard time figuring out what would not > be solved simply by removing the div--, which seems to match what your > commit log says. I'm probably not doing a good job explaining the change. Let me try it with a few examples. Consider the following three approaches: A (original, unpatched version): ======================== div = mclk_rate / (2 * tfr->speed_hz); if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) { if (div > 0) div--; } else { ... } B (original version, but with div-- removed): ================================= div = mclk_rate / (2 * tfr->speed_hz); if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) { ... } else { ... } C (version after this patch): ===================== div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1; if (div <= SUN4I_CLK_CTL_CDR2_MASK) { ... } else { ... } And now the following values for mclk, tfr->speed and the resulting div and SPI_CLK tfr->speed_hz = 50,000 mclk = 214,285 A: div = 1, SPI_CLK = 53,571(!) B: div = 2, SPI_CLK = 35,714 C: div = 2, SPI_CLK = 35,714 tfr->speed_hz = 50,000 mclk = 200,000 A: div = 1, SPI_CLK = 50,000 B: div = 2, SPI_CLK = 33,333(!) C: div = 1, SPI_CLK = 50,000 tfr->speed_hz = 650,000 mclk = 1,6000,000 A: div = 11, SPI_CLK = 666,667(!) B: div = 12, SPI_CLK = 615,385 C: div = 12, SPI_CLK = 615,385 tfr->speed_hz = 1,000,000 mclk = 2,000,000 A: div = 0, SPI_CLK = 1,000,000 B: div = 1, SPI_CLK = 500,000(!) C: div = 0, SPI_CLK = 1,000,000 I hope that makes it a little bit easier to understand the change. Of course, the change only makes sense if you agree that the acutal SPI transfer speed should never exceed the requested speed. Which I think is the right approach... but maybe you think otherwise? Thanks for taking the time to look at this so carefully! Marcus -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html