On 10/15/18 4:59 AM, Abhishek Lekshmanan wrote: > If we read the current bucket entry point and the bucket instance again > before marking the condition b as ok to clean up I think we'll cover all > the bases of the race condition. > > What I'm thinking is a scenario like this: > - we read the current entry point, bucket hasn't started resharding - so > new_bucket_instance = NONE; > - the bucket starts resharding; so new_bucket_instance is set (the > current value we've read is therefore stale) > - scanning yields a new bucket instance but it matches condition b: > ie. reshard_status = NONE (as the reshard is in progress) and entrypoint > (which we've read previously) doesn't refer this instance id. > > So for case B we re-read the bucket entry point & instance and recheck > the status before we mark this as ok to cleanup, in case of the first > case this is not a problem as state=DONE means that this bucket instance > id will never be reused again. Take a look at my reply to Casey in this thread. We can prevent another resharding from taking place with a lock. Furthermore, with my PR we can insure the lock is held continuously and never lost. I think you should hold the lock during the scan to generate a list of what to delete. Then you can release the lock and remove the items on the list. > Am I thinking this right? > Eric