Re: [PATCH] net/mlx5: Added cond_resched() to crdump collection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/23/2024 8:41 PM, Mohamed Khalfella wrote:

On 2024-08-23 08:16:32 +0300, Moshe Shemesh wrote:


On 8/23/2024 7:08 AM, Przemek Kitszel wrote:

On 8/22/24 19:08, Mohamed Khalfella wrote:
On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:


On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:

On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
On 8/19/24 23:42, Mohamed Khalfella wrote:



Putting a cond_resched() every 16 register reads, similar to
mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
will result in cond_resched() every ~0.56ms, which is okay IMO.

Sorry for the late response, I just got back from vacation.
All your measures looks right.
crdump is the devlink health dump of mlx5 FW fatal health reporter.
In the common case since auto-dump and auto-recover are default for this
health reporter, the crdump will be collected on fatal error of the mlx5
device and the recovery flow waits for it and run right after crdump
finished.
I agree with adding cond_resched(), but I would reduce the frequency,
like once in 1024 iterations of register read.
mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
after 16 retries waiting for the value to change.
Thanks.

Thanks for taking a look. Once in every 1024 iterations approximately
translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
IMO. How about any power-of-two <= 128 (~4.51ms)?

OK

Such tune-up would matter for interactive use of the machine with very
little cores, is that the case? Otherwise I see no point [to make it
overall a little slower, as that is the tradeoff].

Yes, as I see it, the point here is host with very few cores.

It should make a difference for systems with few cores. Add to that the
numbers above is what I was able to get from the lab. It has been seen
in the field that collecting crdump takes more than 5 seconds causing
issues. If this makes sense I will submit v2 with the updated commit
message and cond_resched() every 128 iterations of register read.

Fine with me.
Thanks.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux