Re: [PATCH] net/mlx5: Added cond_resched() to crdump collection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-08-23 08:16:32 +0300, Moshe Shemesh wrote:
> 
> 
> On 8/23/2024 7:08 AM, Przemek Kitszel wrote:
> > 
> > On 8/22/24 19:08, Mohamed Khalfella wrote:
> >> On 2024-08-22 09:40:21 +0300, Moshe Shemesh wrote:
> >>>
> >>>
> >>> On 8/21/2024 1:27 AM, Mohamed Khalfella wrote:
> >>>>
> >>>> On 2024-08-20 12:09:37 +0200, Przemek Kitszel wrote:
> >>>>> On 8/19/24 23:42, Mohamed Khalfella wrote:
> > 
> > 
> >>>>
> >>>> Putting a cond_resched() every 16 register reads, similar to
> >>>> mlx5_vsc_wait_on_flag(), should be okay. With the numbers above, this
> >>>> will result in cond_resched() every ~0.56ms, which is okay IMO.
> >>>
> >>> Sorry for the late response, I just got back from vacation.
> >>> All your measures looks right.
> >>> crdump is the devlink health dump of mlx5 FW fatal health reporter.
> >>> In the common case since auto-dump and auto-recover are default for this
> >>> health reporter, the crdump will be collected on fatal error of the mlx5
> >>> device and the recovery flow waits for it and run right after crdump
> >>> finished.
> >>> I agree with adding cond_resched(), but I would reduce the frequency,
> >>> like once in 1024 iterations of register read.
> >>> mlx5_vsc_wait_on_flag() is a bit different case as the usleep there is
> >>> after 16 retries waiting for the value to change.
> >>> Thanks.
> >>
> >> Thanks for taking a look. Once in every 1024 iterations approximately
> >> translates to 35284.4ns * 1024 ~= 36.1ms, which is relatively long time
> >> IMO. How about any power-of-two <= 128 (~4.51ms)?
> 
> OK
> > 
> > Such tune-up would matter for interactive use of the machine with very
> > little cores, is that the case? Otherwise I see no point [to make it
> > overall a little slower, as that is the tradeoff].
> 
> Yes, as I see it, the point here is host with very few cores.

It should make a difference for systems with few cores. Add to that the
numbers above is what I was able to get from the lab. It has been seen
in the field that collecting crdump takes more than 5 seconds causing
issues. If this makes sense I will submit v2 with the updated commit
message and cond_resched() every 128 iterations of register read.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux