Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> If we set up to wait for a cache item to be filled in, and then find
> that it is no longer pending, it could be that some other thread is
> in 'cache_revisit_request' and has moved our request to its 'pending' list.
> So when our setup_deferral calls cache_revisit_request it will find nothing to
> put on the pending list, and do nothing.
> 
> We then return from cache_wait_req, thus leaving the 'sleeper'
> on-stack structure open to being corrupted by subsequent stack usage.
> 
> However that 'sleeper' could still be on the 'pending' list that the
> other thread is looking at and so any corruption could cause it to behave badly.
> 
> To avoid this race we simply take the same path as if the
> 'wait_for_completion_interruptible_timeout' was interrupted and if the
> sleeper is no longer on the list (which it won't be) we wait on the
> completion - which will ensure that any other cache_revisit_request
> will have let go of the sleeper.

OK, but I don't think we need that first CACHE_PENDING check in
setup_deferral at all.  Best just to ignore it?:

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index d789dfc..82804b4 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
 	sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
 	dreq->revisit = cache_restart_thread;
 
-	ret = setup_deferral(dreq, item);
+	setup_deferral(dreq, item);
 
-	if (ret ||
-	    wait_for_completion_interruptible_timeout(
+	if (wait_for_completion_interruptible_timeout(
 		    &sleeper.completion, timeout) <= 0) {
 		/* The completion wasn't completed, so we need
 		 * to clean up

Then it's obvious that we always wait for completion, and that we fill
the basic requirement to avoid corruption here.

(Which is, in more detail: cache_wait_req must not return while dreq is
still reachable from anywhere else.  Since dreq is reachable from
elsewhere only as long as it is hashed, and since anyone else that might
unhash it will call our revisit (which unconditionally calls complete),
then forget about it, it suffices for cache_wait_req to either wait for
completion, or unhash dreq *itself* (before someone else does).)

Also we don't need the PENDING check to ensure we wait only when
necessary--we only wait while dreq is hashed, and as long as we're
hashed anyone clearing PENDING will also end up doing the complete().

In the deferral case it's maybe a useful optimization if it avoids
an unnecessary drop sometimes.  Here it doesn't help.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux