On 2024-08-23 08:16:32 +0300, Moshe Shemesh wrote: > > > On 8/23/2024 7:08 AM, Przemek Kitszel wrote: > > > > On 8/22/24 19:08, Mohamed Khalfella wrote: > >> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote: > >>> > >>> > >>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote: > >>>> > >>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote: > >>>>> On 8/19/24 23:42, Mohamed Khalfella wrote: > > > > > >>>> > >>>> Putting a cond_resched() every 16 register reads, similar to > >>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this > >>>> will result in cond_resched() every ~0.56ms, which is okay IMO. > >>> > >>> Sorry for the late response, I just got back from vacation. > >>> All your measures looks right. > >>> crdump is the devlink health dump of mlx5 FW fatal health reporter. > >>> In the common case since auto-dump and auto-recover are default for this > >>> health reporter, the crdump will be collected on fatal error of the mlx5 > >>> device and the recovery flow waits for it and run right after crdump > >>> finished. > >>> I agree with adding cond_resched(), but I would reduce the frequency, > >>> like once in 1024 iterations of register read. > >>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is > >>> after 16 retries waiting for the value to change. > >>> Thanks. > >> > >> Thanks for taking a look. Once in every 1024 iterations approximately > >> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time > >> IMO. How about any power-of-two <= 128 (~4.51ms)? > > OK > > > > Such tune-up would matter for interactive use of the machine with very > > little cores, is that the case? Otherwise I see no point [to make it > > overall a little slower, as that is the tradeoff]. > > Yes, as I see it, the point here is host with very few cores. It should make a difference for systems with few cores. Add to that the numbers above is what I was able to get from the lab. It has been seen in the field that collecting crdump takes more than 5 seconds causing issues. If this makes sense I will submit v2 with the updated commit message and cond_resched() every 128 iterations of register read.