Re: [PATCH v8 4/5] dir_iterator: refactor state machine model

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/06/2017 03:39 AM, Daniel Ferreira wrote:
> Perform major refactor of dir_iterator_advance(). dir_iterator has
> ceased to rely on a convoluted state machine mechanism of two loops and
> two state variables (level.initialized and level.dir_state). This serves
> to ease comprehension of the iterator mechanism and ease addition of new
> features to the iterator.
> 
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18). This is useful for
> recursively removing a directory and calling rmdir() on a directory only
> after all of its contents have been wiped.
> 
> Add an option for the dir_iterator API to iterate over the root
> directory (the one it was initialized with) as well.
> 
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned new features to be enabled. The new default behavior
> (i.e. flags set to 0) does not iterate over directories. Flag
> DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
> so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
> directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
> iterates over the root directory. These flags do not conflict with each
> other and may be used simultaneously.
> 
> Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
> the flags parameter introduced.
> 
> Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
> test "post-order" and "iterate-over-root" modes.
> 
> Signed-off-by: Daniel Ferreira <bnmvco@xxxxxxxxx>
> ---
>  dir-iterator.c               | 216 ++++++++++++++++++++++++++++++-------------
>  dir-iterator.h               |  35 +++++--
>  refs/files-backend.c         |   2 +-
>  t/helper/test-dir-iterator.c |  27 +++++-
>  t/t0065-dir-iterator.sh      |  44 ++++++++-
>  5 files changed, 245 insertions(+), 79 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 9e073a0..4c919d1 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -4,8 +4,6 @@
>  #include "dir-iterator.h"
>  
>  struct dir_iterator_level {
> -	int initialized;
> -
>  	DIR *dir;
>  
>  	/*
> @@ -20,9 +18,20 @@ struct dir_iterator_level {
>  	 * iteration and also iterated into):
>  	 */
>  	enum {
> -		DIR_STATE_ITER,
> -		DIR_STATE_RECURSE
> +		DIR_STATE_PUSHED,
> +		DIR_STATE_PRE_ITERATION,
> +		DIR_STATE_ITERATING,
> +		DIR_STATE_POST_ITERATION,
> +		DIR_STATE_EXHAUSTED
>  	} dir_state;
> +
> +	/*
> +	 * The stat structure for the directory this level represents.
> +	 * It comes with a st_is_set flag which indicates whether it is,
> +	 * in fact, set.
> +	 */
> +	unsigned st_is_set : 1;

In another email I just agreed with you that it would probably be
cleaner to call `lstat()` on the top-level directory from
`dir_iterator_begin()`. But if that weren't an option, rather than
adding a new state variable `st_is_set`, it might be more natural to add
a new value, say `DIR_STATE_INITIALIZING`, to `dir_state`, and do the
special case initialization in the main loop in `dir_iterator_advance()`
rather than in `set_iterator_data()`.

> +	struct stat st;
>  };
>  
>  /*
> @@ -48,15 +57,23 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Holds the flags passed to dir_iterator_begin(). */
> +	unsigned flags;
>  };
>  
> -static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +static void push_dir_level(struct dir_iterator_int *iter,
> +		struct dir_iterator_level *level, struct stat *st)

The `level` argument is no longer used.

>  {
> -	level->dir_state = DIR_STATE_RECURSE;
>  	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>  		   iter->levels_alloc);
> +
> +	/* Push a new level */
>  	level = &iter->levels[iter->levels_nr++];
> -	level->initialized = 0;
> +	level->dir = NULL;
> +	level->dir_state = DIR_STATE_PUSHED;
> +	level->st_is_set = 1;
> +	level->st = *st;
>  }
>  
> [...]
> +/*
> + * This function uses a state machine with the following states:
> + * -> DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
> + * dirpath has already been returned if pre-order traversal is set.
> + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
> + * through it.
> + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
> + * We are ready to close it.
> + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
> + */
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter =
> @@ -97,7 +158,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			&iter->levels[iter->levels_nr - 1];
>  		struct dirent *de;

`de` is only used in one branch of the `if` statement. Its definition
could be moved there.

>  
> -		if (!level->initialized) {
> +		if (level->dir_state == DIR_STATE_PUSHED) {
> +			level->dir_state = DIR_STATE_PRE_ITERATION;
> +
> +			if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
> +				/* We may not want the root directory to be iterated over */
> +				if (iter->levels_nr != 1 ||
> +					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {

These two `if` statements can be combined into one:

			/* We may not want the root directory to be iterated over */
			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) &&
			    (iter->levels_nr != 1 ||
			     (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
				/* ... */


> +					/*
> +					 * This will only error if we fail to lstat() the
> +					 * root directory. In this case, we bail.
> +					 */
> +					if (set_iterator_data(iter, level)) {
> +						level->dir_state = DIR_STATE_EXHAUSTED;
> +						continue;
> +					}
> +
> +					return ITER_OK;
> +				}
> +			}
> +		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
>  			/*
>  			 * Note: dir_iterator_begin() ensures that
>  			 * path is not the empty string.
> [...]
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 50188e9..c29dc68 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>  	files_downcast(ref_store, 0, "reflog_iterator_begin");
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
> -	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
> +	iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);

Actually, this iterator *doesn't* care about directories. If you look up
in `files_reflog_iterator_advance()`, only entries for which
`S_ISREG(diter->st.st_mode)` are processed. So you can set the flag to
0. (You still need to retain the `S_ISREG()` check, though, to exclude
other types of files.)

>  	return ref_iterator;
>  }
>  
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> index 0394a13..9ac3fa7 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -4,15 +4,34 @@
>  #include "dir-iterator.h"
>  
>  int cmd_main(int argc, const char **argv) {
> +	const char **myargv = argv;
> +	int myargc = argc;
> +
>  	struct strbuf path = STRBUF_INIT;
>  	struct dir_iterator *diter;
>  
> -	if (argc < 2)
> -		die("BUG: test-dir-iterator needs one argument");
> +	unsigned flag = 0;
> +
> +	while (--myargc && starts_with(*++myargv, "--")) {
> +		if (!strcmp(*myargv, "--pre-order"))
> +			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--post-order"))
> +			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--list-root-dir"))
> +			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
> +		else if (!strcmp(*myargv, "--")) {
> +			myargc--;
> +			myargv++;
> +			break;
> +		} else
> +		   die("Unrecognized option: %s", *myargv);

The indentation above is wrong.

> +	}
>  
> -	strbuf_add(&path, argv[1], strlen(argv[1]));
> +	if (myargc != 1)
> +		die("expected exactly one non-option argument");
> +	strbuf_addstr(&path, *myargv);
>  
> -	diter = dir_iterator_begin(path.buf);
> +	diter = dir_iterator_begin(path.buf, flag);
>  
>  	while (dir_iterator_advance(diter) == ITER_OK) {
>  		if (S_ISDIR(diter->st.st_mode))
> [...]

Michael




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]