On 03/12/2024 17:07, Cédric Le Goater wrote:
Hello Yishai,
On 11/25/24 12:32, Yishai Hadas wrote:
Align the page tracking maximum message size with the device's
capability instead of relying on PAGE_SIZE.
This adjustment resolves a mismatch on systems where PAGE_SIZE is 64K,
but the firmware only supports a maximum message size of 4K.
Now that we rely on the device's capability for max_message_size, we
must account for potential future increases in its value.
Key considerations include:
- Supporting message sizes that exceed a single system page (e.g., an 8K
message on a 4K system).
- Ensuring the RQ size is adjusted to accommodate at least 4
WQEs/messages, in line with the device specification.
Perhaps theses changes deserve two separate patches ?
I had considered that as well at some point.
However, once we transitioned to use the firmware maximum message size
capability, the code needed to be prepared for any future change to that
value. Failing to do so, could result in a broken patch.
Since this is a Fixes patch, I believe it’s better to provide a single
functional patch rather than splitting it into two.
The above has been addressed as part of the patch.
Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty
tracking")
Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx>
---
drivers/vfio/pci/mlx5/cmd.c | 47 +++++++++++++++++++++++++++----------
1 file changed, 35 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 7527e277c898..a61d303d9b6a 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -1517,7 +1517,8 @@ int mlx5vf_start_page_tracker(struct vfio_device
*vdev,
struct mlx5_vhca_qp *host_qp;
struct mlx5_vhca_qp *fw_qp;
struct mlx5_core_dev *mdev;
- u32 max_msg_size = PAGE_SIZE;
+ u32 log_max_msg_size;
+ u32 max_msg_size;
u64 rq_size = SZ_2M;
u32 max_recv_wr;
int err;
@@ -1534,6 +1535,12 @@ int mlx5vf_start_page_tracker(struct
vfio_device *vdev,
}
mdev = mvdev->mdev;
+ log_max_msg_size = MLX5_CAP_ADV_VIRTUALIZATION(mdev,
pg_track_log_max_msg_size);
+ max_msg_size = (1ULL << log_max_msg_size);
+ /* The RQ must hold at least 4 WQEs/messages for successful QP
creation */
+ if (rq_size < 4 * max_msg_size)
+ rq_size = 4 * max_msg_size;
+
memset(tracker, 0, sizeof(*tracker));
tracker->uar = mlx5_get_uars_page(mdev);
if (IS_ERR(tracker->uar)) {
@@ -1623,25 +1630,41 @@ set_report_output(u32 size, int index, struct
mlx5_vhca_qp *qp,
{
u32 entry_size = MLX5_ST_SZ_BYTES(page_track_report_entry);
u32 nent = size / entry_size;
+ u32 nent_in_page;
+ u32 nent_to_set;
struct page *page;
+ void *page_start;
A variable name of 'kaddr' would reflect better that this is a mapping.
OK, I'll rename as part of V1.
+ u32 page_offset;
+ u32 page_index;
+ u32 buf_offset;
I would move these declarations below under the 'do {} while' loop
We could consider moving most of the variables inside the do { } while
block. However, since the majority of the function's body is already
within the do { } while, it seems reasonable and cleaner to declare all
the variables together at the start of the function.
A part from that, it looks good.
Thanks for your review.
I haven't seen any issues on x86 and I have asked QE to test with
a 64k kernel on ARM
Could you please update once the test is completed successfully on the
64K system ?
After that, I'll send out V1 and include the tested-by clause with the
name you'll provide.
Thanks,
Yishai
Thanks,
C.
u64 addr;
u64 *buf;
int i;
- if (WARN_ON(index >= qp->recv_buf.npages ||
+ buf_offset = index * qp->max_msg_size;
+ if (WARN_ON(buf_offset + size >= qp->recv_buf.npages * PAGE_SIZE ||
(nent > qp->max_msg_size / entry_size)))
return;
- page = qp->recv_buf.page_list[index];
- buf = kmap_local_page(page);
- for (i = 0; i < nent; i++) {
- addr = MLX5_GET(page_track_report_entry, buf + i,
- dirty_address_low);
- addr |= (u64)MLX5_GET(page_track_report_entry, buf + i,
- dirty_address_high) << 32;
- iova_bitmap_set(dirty, addr, qp->tracked_page_size);
- }
- kunmap_local(buf);
+ do {
+ page_index = buf_offset / PAGE_SIZE;
+ page_offset = buf_offset % PAGE_SIZE;
+ nent_in_page = (PAGE_SIZE - page_offset) / entry_size;
+ page = qp->recv_buf.page_list[page_index];
+ page_start = kmap_local_page(page);
+ buf = page_start + page_offset;
+ nent_to_set = min(nent, nent_in_page);
+ for (i = 0; i < nent_to_set; i++) {
+ addr = MLX5_GET(page_track_report_entry, buf + i,
+ dirty_address_low);
+ addr |= (u64)MLX5_GET(page_track_report_entry, buf + i,
+ dirty_address_high) << 32;
+ iova_bitmap_set(dirty, addr, qp->tracked_page_size);
+ }
+ kunmap_local(page_start);
+ buf_offset += (nent_to_set * entry_size);
+ nent -= nent_to_set;
+ } while (nent);
}
static void