On Thu, Jan 05 2017, J. Bruce Fields wrote: > I'm not against the patch, but I'm still not convinced by the > explanation: > > On Thu, Dec 22, 2016 at 12:38:06PM -0500, andros@xxxxxxxxxx wrote: >> From: Neil Brown <neilb@xxxxxxxx> >> >> The rsc cache code operates in a read_lock/write_lock environment. >> Changes to a cache entry should use the provided rsc_update >> routine which takes the write_lock. > > It looks pretty suspicious to be setting CACHE_NEGATIVE without the > cache_lock for write, but I'm not actually convinced there's a bug there > either. In any case not one that you'd be hitting reliably. > >> The current code sets the expiry_time and the CACHE_NEGATIVE flag >> without taking the write_lock as it does not call rsc_update. >> Without this patch, while cache_clean sees the entries to be >> removed, it does not remove the rsc_entries. This is because >> rsc_update updates other fields such as flush_time and last_refresh >> in the entry to trigger cache_clean to reap the entry. > > I think the root cause of the particular behavior you were seeing was > actually an oversight from Neil's c5b29f885afe "sunrpc: use seconds > since boot in expiry cache", which missed this one occurrence of > get_seconds(). So it's setting the item's entry to something decades in > the future. > > And that's probably not been a huge deal since these entries aren't so > big, and they will eventually get cleaned up by cache_purge when the > cache is destroyed. Still, I can imagine it slowing down cache lookups > on a long-lived server. > > The one-liner: > > - rsci->h.expiry_time = get_seconds(); > + rsci->h.expiry_time = seconds_since_boot(); > > would probably also do the job. Am I missing something? I was missing that get_seconds() bug - thanks. The other real bug is that setting h.expiry_time backwards should really set cd->nextcheck backwards too. I thought I had found code which did that, but I think I was confusing ->nextcheck with ->flush_time. > > But, OK, I think Neil's patch will ensure entries get cleaned up more > quickly than that would, and might also fix a rare race. Yes. The patch doesn't just fix the bug, whatever it is. It provides a proper interface for functionality that wasn't previously supported, and so had been hacked into place. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature