On Mon, Aug 19, 2024 at 05:39:38PM +0200, Patrick Steinhardt wrote: > This patch series fixes that issue by starting to refcount the readers. > Each iterator will bump the refcount, thus avoiding the problem. While > at it, I also found a second issue where we segfault when reloading a > table fails while reusing one of the table readers. In this scenario, we > would end up releasing the reader of the stack itself, even though it > would still be used by it. I gave a fairly cursory look over this, as I'm not all that familiar with the reftable code. But everything looked pretty sensible to me. I wondered how we might test this. It looks like you checked the --stress output (which I confirmed seems fixed for me), along with a synthetic test directly calling various reftable functions. Both seem good to me. I had hoped to be able to have a non-racy external test, where running actual Git commands showed the segfault in a deterministic way. But I couldn't come up with one. One trick we've used for "pausing" a reading process is to run "cat-file --batch" from a fifo. You can ask it to do some ref lookups, then while it waits for more input, update the reftable, and then ask it to do more. But I don't think that's sufficient here, as the race happens while we are actually iterating. You'd really need some for_each_ref() callback that blocks in some externally controllable way. Possibly you could do something clever with partial-clone lazy fetches (where you stall the fetch and then do ref updates in the middle), but that is getting pretty convoluted. So I think the tests you included seem like a good place to stop. I did have a little trouble applying this for testing. I wanted to do it on 'next', which has the maintenance changes to cause the race. But merging it there ended up with a lot of conflicts with other reftable topics (especially the tests). I was able to resolve them all, but you might be able to make Junio's life easier by coordinating the topics a bit. -Peff