On Thu, Mar 11, 2021 at 3:21 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Mar 11, 2021 at 01:49:24PM -0800, Alexander Duyck wrote: > > > We don't need to invent new locks and new complexity for something > > > that is trivially solved already. > > > > I am not wanting a new lock. What I am wanting is a way to mark the VF > > as being stale/offline while we are performing the update. With that > > we would be able to apply similar logic to any changes in the future. > > I think we should hold off doing this until someone comes up with HW > that needs it. The response time here is microseconds, it is not worth > any complexity I disagree. Take a look at section 8.5.3 in the NVMe document that was linked to earlier: https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4a-2020.03.09-Ratified.pdf This is exactly what they are doing and I think it makes a ton of sense. Basically the VF has to be taken "offline" before you are allowed to start changing resources on it. It would basically consist of one extra sysfs file and has additional uses beyond just the configuration of MSI-X vectors. We would just have to add one additional sysfs file, maybe modify the "dead" device flag to be "offline", and we could make this work with minimal changes to the patch set you already have. We could probably toggle to "offline" while holding just the VF lock. To toggle the VF back to being "online" we might need to take the PF device lock since it is ultimately responsible for guaranteeing we have the resources. Another way to think of this is that we are essentially pulling a device back after we have already allocated the VFs and we are reconfiguring it before pushing it back out for usage. Having a flag that we could set on the VF device to say it is "under construction"/modification/"not ready for use" would be quite useful I would think.