On Mon, Dec 04, 2023 at 09:31:21PM -0700, Keith Busch wrote: > On Tue, Dec 05, 2023 at 12:14:22PM +0800, Ming Lei wrote: > > On Mon, Dec 04, 2023 at 11:57:55AM -0700, Keith Busch wrote: > > > On Mon, Dec 04, 2023 at 01:40:58PM -0500, Jeff Moyer wrote: > > > > I added a CC: linux-security-module@vger > > > > Keith Busch <kbusch@xxxxxxxx> writes: > > > > > From: Keith Busch <kbusch@xxxxxxxxxx> > > > > > > > > > > The uring_cmd operation is often used for privileged actions, so drivers > > > > > subscribing to this interface check capable() for each command. The > > > > > capable() function is not fast path friendly for many kernel configs, > > > > > and this can really harm performance. Stash the capable sys admin > > > > > attribute in the io_uring context and set a new issue_flag for the > > > > > uring_cmd interface. > > > > > > > > I have a few questions. What privileged actions are performance > > > > sensitive? I would hope that anything requiring privileges would not > > > > be in a fast path (but clearly that's not the case). > > > > > > Protocol specifics that don't have a generic equivalent. For example, > > > NVMe FDP is reachable only through the uring_cmd and ioctl interfaces, > > > but you use it like normal reads and writes so has to be as fast as the > > > generic interfaces. > > > > But normal read/write pt command doesn't require ADMIN any more since > > commit 855b7717f44b ("nvme: fine-granular CAP_SYS_ADMIN for nvme io commands"), > > why do you have to pay the cost of checking capable(CAP_SYS_ADMIN)? > > Good question. The "capable" check had always been first so even with > the relaxed permissions, it was still paying the price. I have changed > that order in commit staged here (not yet upstream): > > http://git.infradead.org/nvme.git/commitdiff/7be866b1cf0bf1dfa74480fe8097daeceda68622 With this change, I guess you shouldn't see the following big gap, right? > Before: 970k IOPs > After: 1750k IOPs > > Note that only prevents the costly capable() check if the inexpensive > checks could make a determination. That's still not solving the problem > long term since we aim for forward compatibility where we have no idea > which opcodes, admin identifications, or vendor specifics could be > deemed "safe" for non-root users in the future, so those conditions > would always fall back to the more expensive check that this patch was > trying to mitigate for admin processes. Not sure I get the idea, it is related with nvme's permission model for user pt command, and: 1) it should be always checked in entry of nvme user pt command 2) only the following two types of commands require ADMIN, per commit 855b7717f44b ("nvme: fine-granular CAP_SYS_ADMIN for nvme io commands") - any admin-cmd is not allowed - vendor-specific and fabric commmand are not allowed Can you provide more details why the expensive check can't be avoided for fast read/write user IO commands? Thanks, Ming