On Fri, Sep 06, 2019 at 11:30:57AM -0700, Sagi Grimberg wrote: > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > I share Thomas opinion about a NAPI like approach. > > We already have that, its irq_poll, but it seems that for this > use-case, we get lower performance for some reason. I'm not > entirely sure why that is, maybe its because we need to mask interrupts > because we don't have an "arm" register in nvme like network devices > have? For MSI, that's the INTMS/INTMC NVMe registers. MSI-x, though, has to disarm it in its table entry, and the Linux implementation will do a posted read in that path, which is a bit too expensive.