On Tue, 7 Jan 2020 11:56:02 -0700 Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Tue, 7 Jan 2020 23:23:17 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > There are 3 invalid states: > > * 101b => Invalid state > > * 110b => Invalid state > > * 111b => Invalid state > > > > why only 110b should be used to report error from vendor driver to > > report error? Aren't we adding more confusions in the interface? > > I think the only chance of confusion is poor documentation. If we > define all of the above as invalid and then say any invalid state > indicates an error condition, then the burden is on the user to > enumerate all the invalid states. That's not a good idea. Instead we > could say 101b (_RESUMING|_RUNNING) is reserved, it's not currently > used but it might be useful some day. Therefore there are no valid > transitions into or out of this state. A vendor driver should fail a > write(2) attempting to enter this state. > > That leaves 11Xb, where we consider _RESUMING and _SAVING as mutually > exclusive, so neither are likely to ever be valid states. Logically, > if the device is in a failed state such that it needs to be reset to be > recovered, I would hope the device is not running, so !_RUNNING (110b) > seems appropriate. I'm not sure we need that level of detail yet > though, so I was actually just assuming both 11Xb states would indicate > an error state and the undefined _RUNNING bit might differentiate > something in the future. > > Therefore, I think we'd have: > > * 101b => Reserved > * 11Xb => Error > > Where the device can only self transition into the Error state on a > failed device_state transition and the only exit from the Error state > is via the reset ioctl. The Reserved state is unreachable. The vendor > driver must error on device_state writes to enter or exit the Error > state and must error on writes to enter Reserved states. Is that still > confusing? I think one thing we could do is start to tie the meaning more to the actual state (bit combination) and less to the individual bits. I.e. - bit 0 indicates 'running', - bit 1 indicates 'saving', - bit 2 indicates 'resuming', - bits 3-31 are reserved. [Aside: reserved-and-ignored or reserved-and-must-be-zero?] [Note that I don't specify what happens when a bit is set or unset.] States are then defined as: 000b => stopped state (not saving or resuming) 001b => running state (not saving or resuming) 010b => stop-and-copy state 011b => pre-copy state 100b => resuming state [Transitions between these states defined, as before.] 101b => reserved [for post-copy; no transitions defined] 111b => reserved [state does not make sense; no transitions defined] 110b => error state [state does not make sense per se, but it does not indicate running; transitions into this state *are* possible] To a 'reserved' state, we can later assign a different meaning (we could even re-use 111b for a different error state, if needed); while the error state must always stay the error state. We should probably use some kind of feature indication to signify whether a 'reserved' state actually has a meaning. Also, maybe we also should designate the states > 111b as 'reserved'. Does that make sense?