On Mon, Dec 06 2021, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Dec 06, 2021 at 05:03:00PM +0100, Cornelia Huck wrote: > >> > If we're writing a specification, that's really a MAY statement, >> > userspace MAY issue a reset to abort the RESUMING process and return >> > the device to RUNNING. They MAY also write the device_state directly, >> > which MAY return an error depending on various factors such as whether >> > data has been written to the migration state and whether that data is >> > complete. If a failed transitions results in an ERROR device_state, >> > the user MUST issue a reset in order to return it to a RUNNING state >> > without closing the interface. >> >> Are we actually writing a specification? If yes, we need to be more >> clear on what is mandatory (MUST), advised (SHOULD), or allowed >> (MAY). If I look at the current proposal, I'm not sure into which >> category some of the statements fall. > > I deliberately didn't use such formal language because this is far > from what I'd consider an acceptable spec. It is more words about how > things work and some kind of basis for agreement between user and > kernel. We don't really need formal language, but there are too many unclear statements, as the discussion above showed. Therefore my question: What are we actually writing? Even if it is not a formal specification, it still needs to be clear. > > Under Linus's "don't break userspace" guideline whatever userspace > ends up doing becomes the spec the kernel is wedded to, regardless of > what we write down here. All the more important that we actually agree before this is merged! I don't want choices hidden deep inside the mlx5 driver dictating what other drivers should do, it must be reasonably easy to figure out (including what is mandatory, and what is flexible.) > Which basically means whatever mlx5 and qemu does after we go forward > is the definitive spec and we cannot change qemu in a way that is > incompatible with mlx5 or introduce a new driver that is incompatible > with qemu. TBH, I'm not too happy with the current QEMU state, either. We need to take a long, hard look first and figure out what we need to do to make the QEMU support non-experimental. We're discussing a complex topic here, and we really don't want to perpetuate an unclear uAPI. This is where my push for more precise statements is coming from.