Re: [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators

shejialuo <shejialuo@xxxxxxxxx> · Mon, 24 Feb 2025 22:49:14 +0800

On Wed, Feb 19, 2025 at 02:23:40PM +0100, Patrick Steinhardt wrote:
> Implement seeking of ref-cache iterators. This is done by splitting most
> of the logic to seek iterators out of `cache_ref_iterator_begin()` and
> putting it into `cache_ref_iterator_seek()` so that we can reuse the
> logic.
> 
> Note that we cannot use the optimization anymore where we return an
> empty ref iterator when there aren't any references, as otherwise it
> wouldn't be possible to reseek the iterator to a different prefix that
> may exist. This shouldn't be much of a performance corncern though as we
> now start to bail out early in case `advance()` sees that there are no
> more directories to be searched.
> 

Bit: corncern/concern. Don't worth a reroll.

> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  refs/ref-cache.c | 74 ++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 48 insertions(+), 26 deletions(-)
> 
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 6457e02c1ea..b54547d71ee 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -362,9 +362,7 @@ struct cache_ref_iterator {
>  	struct ref_iterator base;
>  
>  	/*
> -	 * The number of levels currently on the stack. This is always
> -	 * at least 1, because when it becomes zero the iteration is
> -	 * ended and this struct is freed.
> +	 * The number of levels currently on the stack.
>  	 */

So, this value could be zero? We want to use this to optimize because
that we don't return the empty ref iterator any more.

>  	size_t levels_nr;
>  
> @@ -389,6 +387,9 @@ struct cache_ref_iterator {
>  	struct cache_ref_iterator_level *levels;
>  
>  	struct repository *repo;
> +	struct ref_cache *cache;
> +
> +	int prime_dir;

The reason why we needs to add these two states is that when using
`cache_ref_iterator_begin`, we need to pass `ref_cache` and
`prime_dir`. So, we need to store the state when reusing the ref
iterator.

>  };
>  
>  static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> @@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	struct cache_ref_iterator *iter =
>  		(struct cache_ref_iterator *)ref_iterator;
>  
> +	if (!iter->levels_nr)
> +		return ITER_DONE;
> +

Ok, we will check whether the cache ref iterator is exhausted.

>  	while (1) {
>  		struct cache_ref_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -444,6 +448,40 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	}
>  }
>  
> +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				   const char *prefix)
> +{
> +	struct cache_ref_iterator *iter =
> +		(struct cache_ref_iterator *)ref_iterator;
> +	struct ref_dir *dir;
> +
> +	dir = get_ref_dir(iter->cache->root);
> +	if (prefix && *prefix)
> +		dir = find_containing_dir(dir, prefix);
> +
> +	if (dir) {
> +		struct cache_ref_iterator_level *level;
> +
> +		if (iter->prime_dir)
> +			prime_ref_dir(dir, prefix);
> +		iter->levels_nr = 1;
> +		level = &iter->levels[0];
> +		level->index = -1;
> +		level->dir = dir;
> +
> +		if (prefix && *prefix) {
> +			iter->prefix = xstrdup(prefix);

Should we free the original `iter->prefix` before we assign the new
`prefix`? I have seen this pattern in previous patch. If the caller
calls this function multiple times, there would be memory leak.

> +			level->prefix_state = PREFIX_WITHIN_DIR;
> +		} else {
> +			level->prefix_state = PREFIX_CONTAINS_DIR;
> +		}
> +	} else {
> +		iter->levels_nr = 0;
> +	}

When we cannot find the dir, we set the `iter->levels_nr = 0`. Could we
first check

    if (!dir) {
	iter->levels_nr = 0;
	return 0;
    }

And thus we could avoid indentation. However, it seems that we always
return 0. So, maybe we should not change.

> +
> +	return 0;

I know your motivation that you want to normally return the ref iterator
thus we can reuse later. The original behavior is that we return an
empty ref iterator but empty ref iterator cannot be reused. So, we will
always get the cache ref iterator. If the level is 0, we still have a
valid cache ref iterator. Make sense.

> +}
> +

Thanks,
Jialuo