On Fri, Mar 12, 2021 at 09:00:17AM -0400, Jason Gunthorpe wrote: > On Thu, Mar 11, 2021 at 06:53:16PM -0800, Alexander Duyck wrote: > > On Thu, Mar 11, 2021 at 3:21 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > On Thu, Mar 11, 2021 at 01:49:24PM -0800, Alexander Duyck wrote: > > > > > We don't need to invent new locks and new complexity for something > > > > > that is trivially solved already. > > > > > > > > I am not wanting a new lock. What I am wanting is a way to mark the VF > > > > as being stale/offline while we are performing the update. With that > > > > we would be able to apply similar logic to any changes in the future. > > > > > > I think we should hold off doing this until someone comes up with HW > > > that needs it. The response time here is microseconds, it is not worth > > > any complexity > > > > I disagree. Take a look at section 8.5.3 in the NVMe document that was > > linked to earlier: > > https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4a-2020.03.09-Ratified.pdf > > > > This is exactly what they are doing and I think it makes a ton of > > sense. Basically the VF has to be taken "offline" before you are > > AFAIK this is internal to the NVMe command protocol, not something we > can expose generically to the OS. mlx5 has no protocol to "offline" an > already running VF, for instance. > > The way Leon has it arranged that online/offline scheme has no > relevance because there is no driver or guest attached to the VF to > see the online/offline transition. > > I wonder if people actually do offline a NVMe VF from a hypervisor? > Seems pretty weird. I agree, that would be weird. I'm pretty sure you can't modify these resources once you attach the nvme VF to a guest. The resource allocation needs to happen prior to that. > > Another way to think of this is that we are essentially pulling a > > device back after we have already allocated the VFs and we are > > reconfiguring it before pushing it back out for usage. Having a flag > > that we could set on the VF device to say it is "under > > construction"/modification/"not ready for use" would be quite useful I > > would think. > > Well, yes, the whole SRIOV VF lifecycle is a pretty bad fit for the > modern world. > > I'd rather not see a half-job on a lifecycle model by hacking in > random flags. It needs a proper comprehensive design. > > Jason