On 9/6/23 13:51, Jason Gunthorpe wrote:
On Wed, Sep 06, 2023 at 10:55:26AM +0200, Cédric Le Goater wrote:
+ WARN_ON(node);
+ log_addr_space_size = ilog2(total_ranges_len);
+ if (log_addr_space_size <
+ (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_min_addr_space)) ||
+ log_addr_space_size >
+ (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_max_addr_space))) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
We are seeing an issue with dirty page tracking when doing migration
of an OVMF VM guest. The vfio-pci variant driver for the MLX5 VF
device complains when dirty page tracking is initialized from QEMU :
qemu-kvm: 0000:b1:00.2: Failed to start DMA logging, err -95 (Operation not supported)
The 64-bit computed range is :
vfio_device_dirty_tracking_start nr_ranges 2 32:[0x0 - 0x807fffff], 64:[0x100000000 - 0x3838000fffff]
which seems to be too large for the HW. AFAICT, the MLX5 HW has a 42
bits address space limitation for dirty tracking (min is 12). Is it a
FW tunable or a strict limitation ?
It would be good to explain where this is coming from, all devices
need to make some decision on what address space ranges to track and I
would say 2^42 is already pretty generous limit..
QEMU computes the DMA logging ranges for two predefined ranges: 32-bit
and 64-bit. In the OVMF case, QEMU includes in the 64-bit range, RAM
(at the lower part) and device RAM regions (at the top of the address
space). The size of that range can be bigger than the 2^42 limit of
the MLX5 HW for dirty tracking. QEMU is not making much effort to be
smart. There is room for improvement.
Can we go the other direction and reduce the ranges qemu is interested
in?
We can not exclude the device RAM regions since we don't know what
the HW could do with them. Correct me here. But we can certainly
avoid the huge gap in the 64-bit range by adding support for more
than 2 ranges in QEMU.
Then, if the overall size exceeds what HW supports, the vfio-pci
variant driver will report an error, as of today.
Thanks,
C.