On Wed, Feb 02, 2022 at 10:30:41AM -0700, Alex Williamson wrote: > On Wed, 2 Feb 2022 14:34:52 +0000 > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> wrote: > > > > From: Jason Gunthorpe [mailto:jgg@xxxxxxxxxx] > > > > > > > > I see pf_qm_state_pre_save() but didn't understand why it wanted to > > > send the first 32 bytes in the PRECOPY mode? It is fine, but it > > > will add some complexity to continue to do this. > > > > That was mainly to do a quick verification between src and dst compatibility > > before we start saving the state. I think probably we can delay that check > > for later. > > In the v1 migration scheme, this was considered good practice. It > shouldn't be limited to PRECOPY, as there's no requirement to use > PRECOPY, but the earlier in the migration process that we can trigger a > device or data stream compatibility fault, the better. TBH, even in > the case where a device doesn't support live dirty tracking for a > PRECOPY phase, using it for compatibility testing continues to seem > like good practice. At least with our thinking here, we'd rather the device expose an explicit compatibility data via get/test system calls so we can build proper infrastructure around this. Every device will have compatibility requirements and we can build more shared common code this way. ie qemu can ideally fetch the data before migration starts and do an exchange with the live migration target to see if it is OK. Orchestration can inventory the systems, and automation can select live migration targets that can actually work. If it is hidden inside the migration stream it is too invisible to be fully useful. This is something we've been talking about here but don't have much concrete to say for mlx5 yet. The device still has to self-protect itself against a corrupted migration stream impacting integrity, of course. IIRC qemu has a nice spot to put this in the existing protocol. Just overall, now that PRECOPY is optional, we should avoid using it without a good reason. The driver implementation does have a cost. Thanks, Jason