Hi, sorry for my late reply. I was quite busy and needed some time to think about that complicated stuff before writing an answer. On 28 Feb 2013 00:24:00 +0100 NeilBrown <neilb@xxxxxxx> wrote: > On 26 Feb 2013 15:02:01 +0100 Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx> > wrote: > > > On 26 Feb 2013 07:37:00 +0100 NeilBrown <neilb@xxxxxxx> wrote: > > > > > We currently queue an upcall after setting CACHE_PENDING, and dequeue after clearing CACHE_PENDING. > > > So a request should only be present when CACHE_PENDING is set. > > > > > > However we don't combine the test and the enqueue/dequeue in a protected region, so it is possible (if unlikely) for a race to result in a request being queued without CACHE_PENDING set, or a request to be absent despite CACHE_PENDING. > > > > > > So: include a test for CACHE_PENDING inside the regions of enqueue and dequeue where queue_lock is held, and abort the operation if the value is not as expected. > > > > > > With this, it perfectly safe and correct to: > > > - call cache_dequeue() if and only if we have just > > > cleared CACHE_PENDING > > > - call sunrpc_cache_pipe_upcall() (via cache_make_upcall) > > > if and only if we have just set CACHE_PENDING. > > > > > > Reported-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx> > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > > Sorry, I don't agree with this patch, as it fixes the first scenario of my mail > > from 24 Feb 2013, but AFAICS changes the second one (which has been a minor > > point that didn't need fixing necessarily) to a memory leak. > > > > I'll try to expain my doubts: > > > > Again, assume two threads calling cache_check() concurrently for the same cache > > entry of a cache that has a reader. > > Both threads get result -EAGAIN from cache_is_valid(). The second thread at that > > moment is interrupted and suspended for a while. > > The first thread sets CACHE_PENDING and queues an upcall request and sleeps > > waiting for the reply. > > The reader reads the request and replies accordingly. In sunrpc_cache_update() > > the reader changes the entry to CACHE_VALID and calls cache_fresh_unlocked(). > > In cache_fresh_unlocked() it resets CACHE_PENDING. At this moment it is > > interrupted and suspended. > > Now the second thread wakes up, sets CACHE_PENDING again and queues a new upcall > > request. > > The reader wakes up and sees, that CACHE_PENDING is set again and does not > > dequeue the old request. --> memory leak (?) > > Yes, I think you are right. > > > > > In my opinion, there are two possible fixes that could replace this patch: > > > > 1) Don't call cache_dequeue() from cache_check(). Trying to dequeue something > > even if we know, that we haven't queued, looks strange to me. (And yes, I > > understand the reason, why you don't like it, but nevertheless I consider > > this the best solution.) > > The reason for calling cache_dequeue() is that someone else might have queued > something. We are the last to leave so we turn out the lights - doesn't > matter that we didn't turn them on. > > So I think the correct fix to the leak is to remove the "return" near the end > of cache_dequeue(). > i.e. whoever clears CACHE_PENDING must remove all entries from the queue. If > someone else sets CACHE_PENDING they might not succeed, but it doesn't matter > as then someone else will come along and remove all the entries. I think, there are two possible methods. The first - which I think was the method chosen when the code was developed in the past - is to have a strict handshake of enqueueing a request and dequeueing it later. But as we know, it wasn't implemented correctly. (My patches were written to fix this.) The second method is what you are preferring: make sure that the last one clearing CACHE_PENDING dequeues everything. So your patches are some kind of a redesign, I think. Without the "return" in cache_dequeue(), AFAICS your patches should be fine. > > > This one would fix my first scenariop only, but not the second. > > > > 2) I think, the starting point of all trouble is in cache_check(). > > Currently, if a thread has set CACHE_PENDING, is works using a > > possibly no longer valid state of the cache entry (rv). > > AFAICS, it would fix most of the problems to re-check the > > cache entry's state between setting CACHE_PENDING and the upcall. > > The upcall should be done only, if still necessary. > > This one could be combined with a new bit in the entry's state, that is > > set if a valid entry is updated (that is: replaced). Checking this > > bit also immediately before cache_make_upcall() is called would > > also fix my second scenario fully in that it avoids unnecessary > > upcalls. > > Repeating the tests after setting CACHE_PENDING wouldn't hurt, but in almost > all cases it wouldn't help either. The races that could result in a second > unnecessary up-call are extremely unlikely. So I think the best approach is > not trying to avoid them, but making sure that they don't cause any harm. > This is best done with safe programming practices, like the "last one out > turns out the lights" pattern. The races I've found were a consequence of two threads calling cache_is_valid() concurrently and thus both trying to make an upcall. The first thread sets CACHE_PENDING and tries the upcall. The second thread can do its upcall only, after CACHE_PENDING was reset again (which can be done by the first thread itself if the upcall failed, or by a reader if the upcall is answered). In this case, after the second thread has set CACHE_PENDING itself, the state of the cache entry will have changed. So, rechecking the state can avoid the second upcall and thus also avoid the possibility of races. > The "return" which I suggest removing is really a premature optimisation > which should never have been included. Without it we should be completely > safe. For the old method of an enqueueing / dequeueing handshake I think it was fine. > > ?? AFAICS, your newest patches that remove the "return" should be fine. Unfortunately, I can't test it, as our setup is based on a SLES11 SP1, which can't be changed without changing a lot of other SW also. Thus, I'll try to backport the patches and do a test. Bodo > > NeilBrown > ÿôèº{.nÇ+?·?®??+%?Ëÿ±éݶ¥?wÿº{.nÇ+?·¥?{±þwìþ)í?æèw*jg¬±¨¶????Ý¢jÿ¾«þG«?éÿ¢¸¢·¦j:+v?¨?wèjØm¶?ÿþø¯ù®w¥þ?àþf£¢·h??â?úÿ?Ù¥