This series improves the time of .set_map() operations by parallelizing the MKEY creation and deletion for direct MKEYs. Looking at the top level MKEY creation/deletion functions, the following improvement can be seen: |-------------------+-------------| | operation | improvement | |-------------------+-------------| | create_user_mr() | 3-5x | | destroy_user_mr() | 8x | |-------------------+-------------| The last part of the series introduces lazy MKEY deletion which postpones the MKEY deletion to a later point in a workqueue. As this series and the previous ones were targeting live migration, we can also observe improvements on this front: |-------------------+------------------+------------------| | Stage | Downtime #1 (ms) | Downtime #2 (ms) | |-------------------+------------------+------------------| | Baseline | 3140 | 3630 | | Parallel MKEY ops | 1200 | 2000 | | Deferred deletion | 1014 | 1253 | |-------------------+------------------+------------------| Test configuration: 256 GB VM, 32 CPUs x 2 threads per core, 4 x mlx5 vDPA devices x 32 VQs (16 VQPs) This series must be applied on top of the parallel VQ suspend/resume series [0]. [0] https://lore.kernel.org/all/20240816090159.1967650-1-dtatulea@xxxxxxxxxx/ Dragos Tatulea (7): vdpa/mlx5: Create direct MKEYs in parallel vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Rename function vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Postpone MR deletion drivers/vdpa/mlx5/core/mlx5_vdpa.h | 25 ++- drivers/vdpa/mlx5/core/mr.c | 284 ++++++++++++++++++++++++----- drivers/vdpa/mlx5/core/resources.c | 3 - drivers/vdpa/mlx5/net/mlx5_vnet.c | 53 +++--- 4 files changed, 293 insertions(+), 72 deletions(-) -- 2.45.1