Re: [PATCH 1/2] sunrpc/cache: remove races with queuing an upcall.

Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx> · 04 Mar 2013 16:57:40 +0100

Hi,

sorry for my late reply. I was quite busy and needed some time
to think about that complicated stuff before writing an answer.

On 28 Feb 2013 00:24:00 +0100 NeilBrown <neilb@xxxxxxx> wrote:

> On 26 Feb 2013 15:02:01 +0100 Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> wrote:
> 
> > On 26 Feb 2013 07:37:00 +0100 NeilBrown <neilb@xxxxxxx> wrote:
> > 
> > > We currently queue an upcall after setting CACHE_PENDING, and dequeue after clearing CACHE_PENDING.
> > > So a request should only be present when CACHE_PENDING is set.
> > > 
> > > However we don't combine the test and the enqueue/dequeue in a protected region, so it is possible (if unlikely) for a race to result in a request being queued without CACHE_PENDING set, or a request to be absent despite CACHE_PENDING.
> > > 
> > > So: include a test for CACHE_PENDING inside the regions of enqueue and dequeue where queue_lock is held, and abort the operation if the value is not as expected.
> > > 
> > > With this, it perfectly safe and correct to:
> > >  - call cache_dequeue() if and only if we have just
> > >    cleared CACHE_PENDING
> > >  - call sunrpc_cache_pipe_upcall() (via cache_make_upcall)
> > >    if and only if we have just set CACHE_PENDING.
> > > 
> > > Reported-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> > > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > 
> > Sorry, I don't agree with this patch, as it fixes the first scenario of my mail
> > from 24 Feb 2013, but AFAICS changes the second one (which has been a minor
> > point that didn't need fixing necessarily) to a memory leak.
> > 
> > I'll try to expain my doubts:
> > 
> > Again, assume two threads calling cache_check() concurrently for the same cache
> > entry of a cache that has a reader.
> > Both threads get result -EAGAIN from cache_is_valid(). The second thread at that
> > moment is interrupted and suspended for a while.
> > The first thread sets CACHE_PENDING and queues an upcall request and sleeps
> > waiting for the reply.
> > The reader reads the request and replies accordingly. In sunrpc_cache_update()
> > the reader changes the entry to CACHE_VALID and calls cache_fresh_unlocked().
> > In cache_fresh_unlocked() it resets CACHE_PENDING. At this moment it is
> > interrupted and suspended.
> > Now the second thread wakes up, sets CACHE_PENDING again and queues a new upcall
> > request.
> > The reader wakes up and sees, that CACHE_PENDING is set again and does not
> > dequeue the old request. --> memory leak (?)
> 
> Yes, I think you are right.
> 
> > 
> > In my opinion, there are two possible fixes that could replace this patch:
> > 
> > 1) Don't call cache_dequeue() from cache_check(). Trying to dequeue something
> >    even if we know, that we haven't queued, looks strange to me. (And yes, I
> >    understand the reason, why you don't like it, but nevertheless I consider
> >    this the best solution.)
> 
> The reason for calling cache_dequeue() is that someone else might have queued
> something.  We are the last to leave so we turn out the lights - doesn't
> matter that we didn't turn them on.
> 
> So I think the correct fix to the leak is to remove the "return" near the end
> of cache_dequeue().
> i.e. whoever clears CACHE_PENDING must remove all entries from the queue.  If
> someone else sets CACHE_PENDING they might not succeed, but it doesn't matter
> as then someone else will come along and remove all the entries.

I think, there are two possible methods. The first - which I think was the
method chosen when the code was developed in the past - is to have a strict
handshake of enqueueing a request and dequeueing it later. But as we know, it
wasn't implemented correctly. (My patches were written to fix this.)
The second method is what you are preferring: make sure that the last one
clearing CACHE_PENDING dequeues everything. So your patches are some kind of
a redesign, I think. Without the "return" in cache_dequeue(), AFAICS your
patches should be fine.

> 
> >    This one would fix my first scenariop only, but not the second.
> > 
> > 2) I think, the starting point of all trouble is in cache_check().
> >    Currently, if a thread has set CACHE_PENDING, is works using a 
> >    possibly no longer valid state of the cache entry (rv).
> >    AFAICS, it would fix most of the problems to re-check the
> >    cache entry's state between setting CACHE_PENDING and the upcall.
> >    The upcall should be done only, if still necessary.
> >    This one could be combined with a new bit in the entry's state, that is
> >    set if a valid entry is updated (that is: replaced). Checking this
> >    bit also immediately before cache_make_upcall() is called would
> >    also fix my second scenario fully in that it avoids unnecessary
> >    upcalls.
> 
> Repeating the tests after setting CACHE_PENDING wouldn't hurt, but in almost
> all cases it wouldn't help either.  The races that could result in a second
> unnecessary up-call are extremely unlikely.  So I think the best approach is
> not trying to avoid them, but making sure that they don't cause any harm.
> This is best done with safe programming practices, like the "last one out
> turns out the lights" pattern.  

The races I've found were a consequence of two threads calling cache_is_valid()
concurrently and thus both trying to make an upcall. The first thread sets
CACHE_PENDING and tries the upcall. The second thread can do its upcall only,
after CACHE_PENDING was reset again (which can be done by the first thread
itself if the upcall failed, or by a reader if the upcall is answered).
In this case, after the second thread has set CACHE_PENDING itself, the state
of the cache entry will have changed. So, rechecking the state can avoid the
second upcall and thus also avoid the possibility of races.

> The "return" which I suggest removing is really a premature optimisation
> which should never have been included.  Without it we should be completely
> safe.

For the old method of an enqueueing / dequeueing handshake I think it was
fine.

> 
> ??

AFAICS, your newest patches that remove the "return" should be fine.
Unfortunately, I can't test it, as our setup is based on a SLES11 SP1,
which can't be changed without changing a lot of other SW also.
Thus, I'll try to backport the patches and do a test.

Bodo

> 
> NeilBrown
> 
ÿôèº{.nÇ+?·?®??+%?Ëÿ±éÝ¶¥?wÿº{.nÇ+?·¥?{±þwìþ)í?æèw*jg¬±¨¶????Ý¢jÿ¾«þG«?éÿ¢¸¢·¦j:+v?¨?wèjØm¶?ÿþø¯ù®w¥þ?àþf£¢·h??â?úÿ?Ù¥