On May 05, 2009 09:56 -0700, Joel Becker wrote: > On Tue, May 05, 2009 at 02:09:36AM -0600, Andreas Dilger wrote: > > If the reflink caller is always charged for the full space used (as if > > it were a real copy) by virtue of the user doing the reflink() owning the > > new inode. Doing anything else seems broken. If the owner of the file > > wasn't charged for the reflink's quota then if the reflink inode was > > chowned the new owner would be charged for the new file, but the quota > > code would have to special case the decrement of EACH of the reflink's > > blocks because otherwise the original owner might "release" quota that > > it was never originally charged. > > If the caller is creating an inode in someone else's name, then > who do you charge for the quota? IMHO, it shouldn't be possible to create an inode in someone else's name (CAP_* excluded), just like it isn't possible to create a new file in someone elses name. The caller of reflink() should be the one creating the file, hence the owner of the file, and the owner of the quota. > If you charge the caller, how do you know to decrement the caller's > quota when the actual owner does truncate, given that the inode has > no knowledge of the caller anymore. No, if the owner of the inode (== caller) is charged the quota then when the inode is truncated (regardless of who does the truncate) the quota will just work correctly. > You've hit the nail on the head - without backrefs for each > refcounted hunk, you can't figure out who it owns it from a quota > perspective. And that's just a non-starter to try and maintain. No, I don't think my proposal is _more_ complex than the original. It is actually _less_ complex, because the fact that this is a reflink and not a complete file copy is a purely internal detail of the filesystem and is not exposed outside the filesystem. The fact that a reflink consumes less space and is faster than a real copy is an implementation detail, not really any different than if the file were compressed by the filesystem internally. > > > Here's another fun trick. Overwriting rsync, instead of copying > > > blocks from the already-existing source could reflink the source to the > > > .temporary, then only write the changed blocks. And since you own both > > > files, it just works. If you're overwriting someone else's file? The > > > old copy behavior is fine. > > > > Well, "fine" as in it works, but if there are only a few changed blocks, > > and the old copy is now part of a snapshot (so it won't be released when > > rsync is finished) the space consumption has doubled instead of just > > using a few extra blocks. > > No, because the last thing rsync will do is rename(.temporary, > source). All the references from the source will be decremented, and > any blocks only owned by the source will be freed. Space usage is > identical before and after, like a copying rsync, but there is less > space used and less I/O done during the rsync process. What I was objecting to is "when overwriting someone elses file, the old copy behaviour is fine". If we are implementing a copy-on-write API, why hamstring it to not work in the expected manner by a normal "cp"? > > Is there anything about changing the owner/group of the new inode during > > reflink that makes the implementation more complex? If the process doing > > the reflink is the same as the file owner then the semantics are unchanged > > from what you have proposed. > > If you define that 'reflink sets the attributes as if it was a > new file', then you should be creating the file with a new security > context, not with the security context from the existing inode. And > then you can't really snapshot. > A mixed behavior, like "if you own it, I'll preserve the entire > security context, but if not I will treat it with a new context" is > confusing at best. I don't find it confusing. The security context would be inherited from the creating process, just like creating a new file would. If it is the same user as the file owner then the security context will be the same. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html