On Mon, Aug 08, 2022 at 05:31:34PM +1000, Dave Chinner wrote: > On Mon, Aug 08, 2022 at 03:49:09AM +0100, Matthew Wilcox wrote: > > > So you hot-unplug the failed > > device, plug in a new NVMe drive and add it to the RAID. The pages now > > need to be DMA mapped to that new PCI device. > > yup, and now the dma tags for the mappings to that sub-device return > errors, which then tell the application that it needs to remap the > dma buffers it is using. > > That's just bog standard error handling - if a bdev goes away, > access to the dma tags have to return IO errors, and it is up to the > application level (i.e. the io_uring code) to handle that sanely. I didn't think anyone should see IO errors in such scenarios. This feature is more of an optional optimization, and everything should work as it does today if a tag becomes invalid. For md raid or multi-device filesystem, I imagined this would return dma tag that demuxes to dma tags of the member devices. If any particular member device doesn't have a dma tag for whatever reason, the filesystem or md would transparently fall back to the registered bvec that it currently uses when it needs to do IO to that device. If you do a RAID hot-swap, MD could request a new dma tag for the new device without io_uring knowing about the event. MD can continue servicing new IO referencing its dma tag, and use the new device's tag only once the setup is complete. I'm not familiar enough with the networking side, but I thought the file level abstraction would allow similar handling without io_uring's knowledge.