Jeff King <peff@xxxxxxxx> writes: > Thanks. Here it is rolled up with a commit message. > > -- >8 -- > Subject: clear_delta_base_cache(): don't modify hashmap while iterating > > Removing entries while iterating causes fast-import to > access an already-freed `struct packed_git`, leading to > various confusing errors. > > What happens is that clear_delta_base_cache() drops the > whole contents of the cache by iterating over the hashmap, > calling release_delta_base_cache() on each entry. That > function removes the item from the hashmap. The hashmap code > may then shrink the table, but the hashmap_iter struct > retains an offset from the old table. > > As a result, the next call to hashmap_iter_next() may claim > that the iteration is done, even though some items haven't > been visited. > > The only caller of clear_delta_base_cache() is fast-import, > which wants to clear the cache because it is discarding the > packed_git struct for its temporary pack. So by failing to > remove all of the entries, we still have references to the > freed packed_git. > > To make things even more confusing, this doesn't seem to > trigger with the test suite, because it depends on > complexities like the size of the hash table, which entries > got cleared, whether we try to access them before they're > evicted from the cache, etc. > > So I've been able to identify the problem with large > imports like freebsd's svn import, or a fast-export of > linux.git. But nothing that would be reasonable to run as > part of the normal test suite. > > We can fix this easily by iterating over the lru linked list > instead of the hashmap. They both contain the same entries, > and we can use the "safe" variant of the list iterator, > which exists for exactly this case. > > Let's also add a warning to the hashmap API documentation to > reduce the chances of getting bit by this again. > > Reported-by: Ulrich Spörlein <uqs@xxxxxxxxxxx> > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- Makes sense. Thanks, both, for reporting, finding and fixing. Will apply. > Documentation/technical/api-hashmap.txt | 4 +++- > sha1_file.c | 9 ++++----- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt > index 28f5a8b71..a3f020cd9 100644 > --- a/Documentation/technical/api-hashmap.txt > +++ b/Documentation/technical/api-hashmap.txt > @@ -188,7 +188,9 @@ Returns the removed entry, or NULL if not found. > `void *hashmap_iter_next(struct hashmap_iter *iter)`:: > `void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`:: > > - Used to iterate over all entries of a hashmap. > + Used to iterate over all entries of a hashmap. Note that it is > + not safe to add or remove entries to the hashmap while > + iterating. > + > `hashmap_iter_init` initializes a `hashmap_iter` structure. > + > diff --git a/sha1_file.c b/sha1_file.c > index 1eb47f611..d20714d6b 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) > > void clear_delta_base_cache(void) > { > - struct hashmap_iter iter; > - struct delta_base_cache_entry *entry; > - for (entry = hashmap_iter_first(&delta_base_cache, &iter); > - entry; > - entry = hashmap_iter_next(&iter)) { > + struct list_head *lru, *tmp; > + list_for_each_safe(lru, tmp, &delta_base_cache_lru) { > + struct delta_base_cache_entry *entry = > + list_entry(lru, struct delta_base_cache_entry, lru); > release_delta_base_cache(entry); > } > }