On Mon, 25 Oct 2021 09:29:38 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Oct 21, 2021 at 03:47:29PM -0600, Alex Williamson wrote: > > I recall that we previously suggested a very strict interpretation of > > clearing the _RUNNING bit, but again I'm questioning if that's a real > > requirement or simply a nice-to-have feature for some undefined > > debugging capability. In raising the p2p DMA issue, we can see that a > > hard stop independent of other devices is not really practical but I > > also don't see that introducing a new state bit solves this problem any > > more elegantly than proposed here. Thanks, > > I still disagree with this - the level of 'frozenness' of a device is > something that belongs in the defined state exposed to userspace, not > as a hidden internal state that userspace can't see. > > It makes the state transitions asymmetric between suspend/resume as > resume does have a defined uAPI state for each level of frozeness and > suspend does not. > > With the extra bit resume does: > > 0000, 0100, 1000, 0001 > > And suspend does: > > 0001, 1001, 0010, 0000 > > However, without the extra bit suspend is only > > 001, 010, 000 > > With hidden state inside the 010 And what is the device supposed to do if it receives a DMA while in this strictly defined stopped state? If it generates an unsupported request, that can trigger a fatal platform error. If it silently drops the DMA, then we have data loss. We're defining a catch-22 scenario for drivers versus placing the onus on the user to quiesce the set of devices in order to consider the migration status as valid. Thanks, Alex