Re: [PATCHv3 2/7] file: add ops to dma map bvec

Keith Busch <kbusch@xxxxxxxxxx> · Mon, 8 Aug 2022 09:28:49 -0600

On Mon, Aug 08, 2022 at 05:31:34PM +1000, Dave Chinner wrote:
> On Mon, Aug 08, 2022 at 03:49:09AM +0100, Matthew Wilcox wrote:
> 
> > So you hot-unplug the failed
> > device, plug in a new NVMe drive and add it to the RAID.  The pages now
> > need to be DMA mapped to that new PCI device.
> 
> yup, and now the dma tags for the mappings to that sub-device return
> errors, which then tell the application that it needs to remap the
> dma buffers it is using.
> 
> That's just bog standard error handling - if a bdev goes away,
> access to the dma tags have to return IO errors, and it is up to the
> application level (i.e. the io_uring code) to handle that sanely.

I didn't think anyone should see IO errors in such scenarios. This feature is
more of an optional optimization, and everything should work as it does today
if a tag becomes invalid.

For md raid or multi-device filesystem, I imagined this would return dma tag
that demuxes to dma tags of the member devices. If any particular member device
doesn't have a dma tag for whatever reason, the filesystem or md would
transparently fall back to the registered bvec that it currently uses when it
needs to do IO to that device.

If you do a RAID hot-swap, MD could request a new dma tag for the new device
without io_uring knowing about the event. MD can continue servicing new IO
referencing its dma tag, and use the new device's tag only once the setup is
complete.

I'm not familiar enough with the networking side, but I thought the file level
abstraction would allow similar handling without io_uring's knowledge.