Re: [PATCH] reconnect_one(): fix a missing error code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 15, 2017 at 07:54:57AM +1000, NeilBrown wrote:
> On Wed, Jun 14 2017, J. Bruce Fields wrote:
> 
> > On Wed, Jun 14, 2017 at 12:30:02PM +0300, Dan Carpenter wrote:
> >> I found this bug by reviewing places where we do ERR_PTR(0) (which is
> >> NULL).
> >> 
> >> We used to return an error pointer if lookup_one_len() failed but we
> >> moved this code into a helper function and accidentally removed that.
> >> NULL is a valid return for this function but it's not what we intended.
> >> 
> >> Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper function")
> >> Signed-off-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> >
> > ACK.  Agreed that the current code is wrong, and that this is the
> > correct fix.
> >
> > What I don't quite understand yet is what the impact of the bug would
> > be.
> >
> 
> It is interesting that reconnect_path() handles the possibility of
> reconnect_one() returning NULL, even though it will only do that if this
> "bug" is triggered.

As Dan says, you're missing a case.

> When that happens, the target_dir (a descendent of dentry) gets its
> DCACHE_DISCONNECTED flag cleared.
> 
> The bug can presumably only be triggered by a race.
> We look through a directory to find the name for an  inode
> (exportfs_get_name), then try to look up that name and it doesn't exist.

Wouldn't lookup_one_len succesfully return a negative dentry in that
case?

I think the error cases here are more likely due to permissions or IO
errors.

So, I wonder if you can get some kind of dcache corruption with an
uncached lookup of a directory with an ancestor that we lack permission
to.

> So presumably if you lose the race, some dentry will get
> DCACHE_DISCONNECTED cleared, even though it is still disconnected.
> This breaks a contract and can cause weirdness in dcache operations.
> 
> If the lookup_one_len_unlocked() fails, we should probably retry, at
> least once.  But if we do decide to give up, we shouldn't assume it all
> worked.
> 
> So I suggest:
>  - the fix as provided by Dan, plus
>  - remove "if (!parent) break;" from reconnect_path(), plus
>  - maybe retry the get_name/lookup_one operation once if the first
>     attempt fails.

See the comments in the code--if we lose the race, then it's because of
a concurrent operation which should have done the reconnection for us.

--b.
--
To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Development]     [Kernel Announce]     [Kernel Newbies]     [Linux Networking Development]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Device Mapper]

  Powered by Linux