Quoting Andrew G. Morgan (morgan@xxxxxxxxxx): > On Tue, Aug 17, 2010 at 7:41 PM, Serge E. Hallyn > <serge.hallyn@xxxxxxxxxxxxx> wrote: > > Quoting Chris Mason (chris.mason@xxxxxxxxxx): > >> [ sorry, corrected cc list ] > > > > Thanks - sorry for the inconvenience. I'm also cc:ing Andrew Morgan > > for another opinion. > > > >> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote: > >> > Hi everyone, > >> > > >> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells > >> > me that I'm spending a lot of time hammering on xattrs inside > >> > remove_suid. This is pretty surprising because I'm running as root, and > >> > my files are not suid. Looking back to this commit: > >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 > >> > > >> > We've changed remove_suid's semantics from > >> > > >> > if (file_is_suid) > >> > try to remove it > > > > (but only if not capable(CAP_FSETID)) > > I disagree. I think the relevant capability test should be with > respect to CAP_SETFCAP. > > Since this is the capability that allows you to put a capability on a > file, it should be the one to retain it if the file is modified. I'm ok with that. > >> > To something that always checks to see if we have removal permissions. > > > > (not really - security_inode_need_killpriv() shoudl return <0 only if > > there was an actual error, and the write needs to be cancelled altogether. > > It returns 0 if privs don't need to be removed, and >0 if they do. > > > >> > Was this intentional? It didn't cause my 2.6.35 regression (that's all > >> > my fault) but it does look wrong to me: > > > > If I'm thinking right, I think the key change we should make is to have > > CAP_FSETID be honored for maintaining file capabilities. > > > > That would have two (good) results: > > > > 1. we should be able to re-arrange the code to check for CAP_FSETID > > before bothering to check for file capabilities, so we can save the > > getxattrs which I assume were what you were finding? Even if it > > wasn't the cause of your performance regression, it should be an > > improvement. > > > > 2. I think it can be seen as a semantic fix. We mostly try to > > respect suid behavior for file caps, so it will be more consistent > > to honor CAP_FSETID for file capabilities. > > > > Andrew, what do you think? > > > > I think the test should be with respect to CAP_SETFCAP, but I agree > with the rest of your comments. I also point out, with some shame, that on first reading Chris' email, I had to look at that code more than I should have needed to to recall the details. So I think the function could stand some cleaning up/ simplifying. I'll try to take a look at this next week. > Lots of small writes to 'any' file also tends to bang on this code. > I've been wondering if it might make sense to cache, in the inode, > that a file does *not* have any capabilities associated with it. That > way the kernel wouldn't need to look up the xattrs twice for the same > incapable file - which is, by far, the common case. That could also be shared with a new (old) optional xattr-free file-backed-filecaps mount option :) thanks, -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html