From: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx> Date: Thu, 29 Aug 2024 15:38:56 -0600 > Collecting crdump involves reading vsc registers from pci config space > of mlx device, which can take long time to complete. This might result > in starving other threads waiting to run on the cpu. > > Numbers I got from testing ConnectX-5 Ex MCX516A-CDAT in the lab: > > - mlx5_vsc_gw_read_block_fast() was called with length = 1310716. > - mlx5_vsc_gw_read_fast() reads 4 bytes at a time. It was not used to > read the entire 1310716 bytes. It was called 53813 times because > there are jumps in read_addr. > - On average mlx5_vsc_gw_read_fast() took 35284.4ns. > - In total mlx5_vsc_wait_on_flag() called vsc_read() 54707 times. > The average time for each call was 17548.3ns. In some instances > vsc_read() was called more than one time when the flag was not set. > As expected the thread released the cpu after 16 iterations in > mlx5_vsc_wait_on_flag(). > - Total time to read crdump was 35284.4ns * 53813 ~= 1.898s. > > It was seen in the field that crdump can take more than 5 seconds to > complete. During that time mlx5_vsc_wait_on_flag() did not release the > cpu because it did not complete 16 iterations. It is believed that pci > config reads were slow. This change adds conditional reschedule call > every 128 register read to release the cpu if needed. > > Reviewed-by: Yuanyuan Zhong <yzhong@xxxxxxxxxxxxxxx> > Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx> > --- > drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > index 6b774e0c2766..bc6c38a68702 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c > @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > { > unsigned int next_read_addr = 0; > unsigned int read_addr = 0; > + unsigned int count = 0; > > while (read_addr < length) { > if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr, > @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, > return read_addr; > > read_addr = next_read_addr; > + /* Yield the cpu every 128 register read */ > + if ((++count & 0x7f) == 0) > + cond_resched(); Why & 0x7f, could it be written more clearly? if (++count == 128) { cond_resched(); count = 0; } Also, I'd make this open-coded value a #define somewhere at the beginning of the file with a comment with a short explanation. BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any explanation in the commitmsg or here in the code why exactly 128. Have you tried different values? > } > return length; > } Thanks, Olek