> -----Original Message----- > From: Christoph Hellwig <hch@xxxxxx> > Sent: Tuesday, December 6, 2022 7:36 AM > To: Jason Gunthorpe <jgg@xxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx>; Rao, Lei <Lei.Rao@xxxxxxxxx>; > kbusch@xxxxxxxxxx; axboe@xxxxxx; kch@xxxxxxxxxx; sagi@xxxxxxxxxxx; > alex.williamson@xxxxxxxxxx; cohuck@xxxxxxxxxx; yishaih@xxxxxxxxxx; > shameerali.kolothum.thodi@xxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; > mjrosato@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux- > nvme@xxxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Dong, Eddie > <eddie.dong@xxxxxxxxx>; Li, Yadong <yadong.li@xxxxxxxxx>; Liu, Yi L > <yi.l.liu@xxxxxxxxx>; Wilk, Konrad <konrad.wilk@xxxxxxxxxx>; > stephen@xxxxxxxxxxxxx; Yuan, Hang <hang.yuan@xxxxxxxxx> > Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device > > On Tue, Dec 06, 2022 at 11:28:12AM -0400, Jason Gunthorpe wrote: > > I'm interested as well, my mental model goes as far as mlx5 and > > hisillicon, so if nvme prevents the VFs from being contained units, it > > is a really big deviation from VFIO's migration design.. > > In NVMe the controller (which maps to a PCIe physical or virtual > function) is unfortunately not very self contained. A lot of state is subsystem- > wide, where the subsystem is, roughly speaking, the container for all > controllers that shared storage. That is the right thing to do for say dual > ported SSDs that are used for clustering or multi-pathing, for tentant isolation > is it about as wrong as it gets. NVMe spec is general, but the implementation details (such as internal state) may be vendor specific. If the migration happens between 2 identical NVMe devices (from same vendor/device w/ same firmware version), migration of subsystem-wide state can be naturally covered, right? > > There is nothing in the NVMe spec that prohibits your from implementing > multiple subsystems for multiple functions of a PCIe device, but if you do that > there is absolutely no support in the spec to manage shared resources or any > other interaction between them. In IPU/DPU area, it seems multiple VFs with SR-IOV is widely adopted. In VFs, the usage of shared resource can be viewed as implementation specific, and load/save state of a VF can rely on the hardware/firmware itself. Migration of NVMe devices crossing vendor/device is another story: it may be useful, but brings additional challenges.