On Fri, 2023-12-01 at 09:53 +0100, AngeloGioacchino Del Regno wrote: > Il 29/11/23 04:16, Axe Yang (杨磊) ha scritto: > > On Tue, 2023-11-28 at 11:20 +0100, AngeloGioacchino Del Regno > > wrote: > > > Il 28/11/23 10:38, Axe Yang (杨磊) ha scritto: > > > > > On Tue, 2023-11-28 at 09:53 +0100, AngeloGioacchino Del Regno > > > > > wrote: > > > > > Il 28/11/23 08:01, Axe Yang ha scritto: > > > > > > Previously, during the MSDC calibration process, a full > > > > > > clock > > > > > > cycle > > > > > > actually not be covered, which in some cases didn't yield > > > > > > the > > > > > > best > > > > > > results and could cause CRC errors. This problem is > > > > > > particularly > > > > > > evident when MSDC is used as an SDIO host. In fact, MSDC > > > > > > support > > > > > > tuning up to a maximum of 64 steps, but by default, the > > > > > > step > > > > > > number > > > > > > is 32. By increase the tuning step, we are more likely to > > > > > > cover > > > > > > more > > > > > > parts of a clock cycle, and get better calibration result. > > > > > > > > > > > > To illustrate, when tuning 32 steps, if the obtained window > > > > > > has > > > > > > a > > > > > > hole > > > > > > near the middle, like this: 0xffc07ff (hex), then the > > > > > > selected > > > > > > delay > > > > > > will be the 6 (counting from right to left). > > > > > > > > > > > > (32 <- 1) > > > > > > 1111 1111 1100 0000 0000 0111 11(1)1 1111 > > > > > > > > > > > > However, if we tune 64 steps, the window obtained may look > > > > > > like > > > > > > this: > > > > > > 0xfffffffffffc07ff. The final selected delay will be 44, > > > > > > which > > > > > > is > > > > > > safer as it is further away from the hole: > > > > > > > > > > > > (64 <- 1) > > > > > > 1111 ... (1)111 1111 1111 1111 1111 1100 0000 0000 0111 > > > > > > 1111 > > > > > > 1111 > > > > > > > > > > > > In this case, delay 6 selected through 32 steps tuning is > > > > > > obviously > > > > > > not optimal, and this delay is closer to the hole, using it > > > > > > would > > > > > > easily cause CRC problems. > > > > > > > > > > > > You will need to configure property "mediatek,tuning-step" > > > > > > in > > > > > > MSDC > > > > > > dts node to 64 to extend the steps. > > > > > > > > > > > > > > > > If we can run 64 tuning steps, why should we run 32? > > > > > > > > > > Why isn't it just better to *always* run 64 tuning steps, on > > > > > SoCs > > > > > supporting that? > > > > > > > > > > Thanks, > > > > > Angelo > > > > > > > > Hi Angelo, > > > > > > > > That is a good question. The benefit of preserving 32 steps > > > > tuning > > > > is > > > > that it can save time in certain scenarios. > > > > > > > > On some platforms, when the delay selected through 64 steps > > > > tuning > > > > is > > > > very close to that chosen through 32 steps, we can reduce the > > > > tuning > > > > step from 64 to 32. This can save time sending the tuning block > > > > commands. > > > > > > > > Thus using 32 steps tuning can save kernel boot up time. > > > > > > > > Another case where time can be saved is when accessing the RPMB > > > > partition of eMMC. Each time switch to RPMB partition, there is > > > > a > > > > retune action, causing a certain drop in performance. If we are > > > > certain > > > > that the results of 32 steps tuning are usable and we use it, > > > > this > > > > can > > > > in a sense also guarantee performance when accessing the RPMB > > > > partition. > > > > > > > > > > Thanks for this explanation! Though, I have some more > > > questions... > > > > > > ...regarding boot up time, how much time are we talking about? > > > > Luckily, I have a platform at hand that can be used for > > experiments/ > > Below are the results from testing on this platform: > > > > [ 2.431993][T1200180] kworker/2:21: mtk-msdc bootdevice: > > [name:mtk_sd&]Start tuning > > [ 2.434950][T1200180] kworker/2:21: mtk-msdc bootdevice: > > [name:mtk_sd&]Tuning finished > > [ 2.435957][T1200180] kworker/2:21: mtk-msdc bootdevice: > > [name:mtk_sd&]phase: [map:00000000ffffffc0] [maxlen:26] [final:19] > > [ 2.462375][T1200180] kworker/2:21: [name:mmc_core&]mmc0: new > > HS400 > > MMC card at address 0001 > > ... > > [ 2.519863][T1300069] kworker/3:1: mtk-msdc 11250000.mmc: > > [name:mtk_sd&]Start tuning > > [ 2.526271][T1300069] kworker/3:1: mtk-msdc 11250000.mmc: > > [name:mtk_sd&]Tuning finished > > [ 2.527288][T1300069] kworker/3:1: mtk-msdc 11250000.mmc: > > [name:mtk_sd&]phase: [map:ffffffffffff003f] [maxlen:48] [final:40] > > [ 2.532269][T1300069] kworker/3:1: [name:mmc_core&]mmc2: new > > ultra > > high speed SDR104 SDIO card at address 0001 > > > > As the kernel log indicates, it took 3 ms for eMMC to tune 32 > > steps, > > while it took about 7 ms for SDIO to tune 64 steps. I have to > > admit, > > when it comes to saving boot up time, the benefits of reducing step > > form 64 to 32 are quite minimal. Just as you said, especially when > > async probe is enabled. > > > > That's great to know, and it's *truly* nice information that you can > put into the commit message, as this completes the analysis of this > commit. > > Can you mention that in the commit message for v4 of this commit? > > "As per measurements taken on MT(xxxx), the tuning phase will take: > eMMC - 32 steps: ~3ms > 64 steps: xxxx > SDIO - 32 steps: xxxx > 64 steps: ~7ms > Okay, will update the commit message in v4. > ...but while this won't prolong boot times by any meaningful amount > of time, for eMMC, it should still be preferred to use 32 steps > tuning > because otherwise we lose performance for RPMB I//O, which requires > re-tuning for each access. > " OK, I will make SD/SDIO default to use 64 steps tuning . > > > > > > I'm asking because while now I see - and agree - on using 32- > > > steps > > > tuning > > > on eMMC to guarantee performance during RPMB access, as far as I > > > know, > > > there is no RPMB partition on SD/MicroSD cards (and, of course, > > > SDIO > > > devices). > > > > > > If the boot performance impact isn't big, as in, up to ~100 > > > milliseconds is > > > not big at all (especially with async probe!), we can definitely > > > avoid the > > > addition of a devicetree property for 32-steps tuning, hence use > > > a > > > dynamic > > > selection strategy such that: > > > - On eMMC devices, always perform 32-steps tuning (hence no > > > boot > > > delay) > > > - On SD cards and SDIO, always perform 64-steps tuning > > > > eMMC could also potentially have CRC issue if only tune 32-steps, > > albeit with a lower likelihood. The precondition for using 32-steps > > tuning is that it could provide roughly the same valid results as > > using > > 64-steps tuning. So taking everything into account, controlling the > > tuning step as needed through the use of dts property seems to be a > > more flixible approach. > > > > Yes but since the only performance concern is about eMMC RPMB access, > we could at least make this 64 steps as *default* for SD/SDIO, and > 32 steps as default for eMMC. > Device tree would be an override of those default values. > > Can we set 64 as default for SD/SDIO, 32 as default for eMMC, and > then use > the device tree to override those defaults? > Will update in v4. > Angelo