Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 12 Mar 2016, Al Viro wrote:
> On Fri, Mar 11, 2016 at 03:25:22AM +0000, Drokin, Oleg wrote:
> 
> > It is, though it's not like we always need to talk to the server in those
> > cases.
> > Overhead is overhead no matter how small, and an extra check for the d_init
> > d_op would happen for every filesystem, not just ceph and Lustre.
> > Overhead also has this bad property of adding up.
> > Yes, it's minuscule and CPUs are still getting somewhat faster.
> > Is it worth it in the long run? May be, I do not have a strong opinion
> > here since always doing ll_d_init after d_splice_alias solves the issue
> > for us and does it in a way that does not affect anybody else. And as you
> > correctly observed, most of users don't really care.--
> 
> Maybe...  Keep in mind that d_set_d_op() called just prior to that point
> has already done a bunch of checks for methods being non-NULL, so we are
> looking at something pretty much guaranteed to be in cache.  So the cost
> of test is pretty much nil; it turns into "fetch (with cache hit), test,
> branch (expectedly) not taken" in normal case.
> 
> We probably have already spent more cycles discussing that than all processors
> will spend executing those instructions ;-)
> 
> I'm really curious about the ceph situation; fh_to_dentry turns out to
> be irrelevant (ceph refuses to export snapshots and inode it'll be looking
> for won't be in snapshot, just as its parent), but I wonder why we have
> those 3 variants of ->d_op in the first place.
> 
> ->d_release() is the same in all 3.
> ->d_prune() is the same in plain and snapshot variants and absent for
> children of .snap
> ->d_revalidate() is nominally different for all 3, but stuff inside the
> snapshots and .snap children have equivalent ones *AND* normal ->d_revalidate
> appears to cover all three cases -
>         dir = ceph_get_dentry_parent_inode(dentry);  
> 
>         /* always trust cached snapped dentries, snapdir dentry */
>         if (ceph_snap(dir) != CEPH_NOSNAP) {
>                 dout("d_revalidate %p '%pd' inode %p is SNAPPED\n", dentry,
>                      dentry, d_inode(dentry));
>                 valid = 1;
>         } else if (d_really_is_positive(dentry) &&
>                    ceph_snap(d_inode(dentry)) == CEPH_SNAPDIR) {
>                 valid = 1;
>         } else if (dentry_lease_is_valid(dentry) ||
> in there is explicitly checking the other two.
> 
> Sage, is there any reason not to fold all three dentry_operations variants
> in one and be done with d_set_d_op()?  I see that in one place you check
> ->d_op being the plain one, but AFAICS checking ceph_snap() of the parent
> would serve just as well there. 

Yeah, I don't see any reason why we can't fold them into a single 
d_ops.

> BTW, looks like ceph is too pessimistic
> about bailing out of RCU mode on ->d_revalidate(), might be worth looking
> into at some point...

Ah, that dates back to Nick Piggin's original pushdown when adding RCU 
walk back in 2.6.38.  I'll make a note, thanks!

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux