Re: [PATCH 03/11] util: Add a util to traverse directory tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/07/2013 01:03 PM, Osier Yang wrote:
> There is POSIX calls to walk through direcotry tree, nftw(3), but

s/is/are
s/direcotry/directory
s/tree/trees  or s/tree/a tree/

> there is no way to allow one to pass user data to the callback:
> 
> int nftw(const char *dirpath,
>          int (*fn) (const char *fpath, const struct stat *sb,
>                     int typeflag, struct FTW *ftwbuf),
>          int nopenfd, int flags);
> 
> Assuming a simple requirement: one wants to simply find out all the
> files which have name "vendor" and it has a specific value, under a
> directory, It's not possible to achieve with nftw(3). To solve the
> requirements like this, this new util function comes.

Not sure the last sentence is necessary

> 
> int virTraverseDirectory(const char *dirpath,
>                          int depth,
>                          unsigned int flags,
>                          virTraverseDirectoryCallback cb,
>                          virTraverseDirectorySkipCallback skipcb,
>                          void *opaque,
>                          char ***filepaths);
> 
>  * Which allow to traverse a directory limited to specified @depth
> (relative to the @dirpath)
> 
>  * Two callbacks are provided, callback @cb is to handle the file path
> passed to it, with the user data @opaque; callback @skipcb is to
> allow one to make the rules to skip the passed file path. For example,
> one can define regex patterns in @skipcb to skip all the file path
> which matches it.
> 
>  * @cb should return 0 to indicate the file path is successfully
> handled, otherwise -1 should be returned.
> 
>  * Like @cb, @skipcb also should return 0 to indiate the file path
> will be skipped to handle, otherwise -1 should be returned.
> 
>  * If passed @filepath is not NULL, it will be filled with the file
> paths which are successfully handled by @cb.

What use would a NULL filepath be?  The whole purpose of the code is to
generate a list of files with a specific pattern.  The end result may be
an empty list, but passing NULL
> 
>  * @flags can be used to control the general behaviour of the
> traversing. E.g. Don't follow symbol links.
> 
> A simple example:
> 
> /*
>  * utiltest.c: Prints all the file paths whose basename is
>  *             "vendor" under directory "./testdir".
>  */
> static int
> traverseCallback(const char *path,
>                  void *opaque)
> {
>     const char *name = opaque;
>     char *p = NULL;
> 
>     if ((p = strrchr(path, '/')))
>         p++;
> 
>     if (STRNEQ(p, name))
>         return -1;
> 
>     return 0;
> }
> 
> static int
> findVendor(void)
> {
>     char **filepaths = NULL;
>     unsigned int flags = 0;
>     const char *name = "vendor";
>     int n;
>     int i;
> 
>     flags |= VIR_TRAVERSE_DIRECTORY_FALL_THROUGH |
>              VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES ;
> 
>     if ((n = virTraverseDirectory("./testdir", 2, flags,
>                                   traverseCallback, NULL,
>                                   (void *)name, &filepaths)) < 0)
>         return -1;
> 
>     for (i = 0; i < n; i++)
>         printf("Match%d: %s\n", i, filepaths[i]);
> 
>     for (i = 0; i < n; i++)
>         VIR_FREE(filepaths[i]);
>     VIR_FREE(filepaths);
> 
>     return 0;
> }
> 
> Tree view of "./testdir":
> % tree -a ./testdir/
> ./testdir/
> |-- device
> |-- dir1
> |   |-- device
> |   |-- dir2
> |   |   |-- device
> |   |   |-- dir3
> |   |   |   `-- vendor
> |   |   `-- vendor
> |   |-- .dir7
> |   |   `-- vendor
> |   |-- dir8 -> ../dir4
> |   `-- vendor
> |-- dir4
> |   |-- device
> |   |-- dir5
> |   |   |-- device
> |   |   `-- vendor
> |   `-- vendor
> `-- vendor
> 
> 7 directories, 12 files
> 
> % ./utiltest
> Match0: ./testdir/dir1/dir2/vendor
> Match1: ./testdir/dir4/dir5/vendor
> 
> Only "vendor" in second depth are returned (flag *_FALL_THROUGH took
> effect), and the symbol link is not followed.
> 
> "virTraverseDirectory" is implemented using recursion method, which
> is generally not good for performance and memory use, and even may
> eats up the file descriptors. But since we allow to specify the
> tree "depth", and I don't think libvirt has use cases which need to
> traverse a very deep directory tree. And on the other hand,
> Implementing it using either iteration or stack will be much less
> readable.
> 
> Later patches will take use of virTraverseDirectory, and tests
> will be added.
> ---
>  src/libvirt_private.syms |   1 +
>  src/util/virutil.c       | 151 ++++++++++++++++++++++++++++++++++++++++++++++-
>  src/util/virutil.h       |  52 ++++++++++++++++
>  3 files changed, 203 insertions(+), 1 deletion(-)
> 
> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> index 1ea7467..fe182e8 100644
> --- a/src/libvirt_private.syms
> +++ b/src/libvirt_private.syms
> @@ -1965,6 +1965,7 @@ virSetNonBlock;
>  virSetUIDGID;
>  virSetUIDGIDWithCaps;
>  virStrIsPrint;
> +virTraverseDirectory;
>  virValidateWWN;
>  
>  
> diff --git a/src/util/virutil.c b/src/util/virutil.c
> index c5246bc..c1938f9 100644
> --- a/src/util/virutil.c
> +++ b/src/util/virutil.c
> @@ -2048,7 +2048,6 @@ virFindFCHostCapableVport(const char *sysfs_prefix ATTRIBUTE_UNUSED)
>      virReportSystemError(ENOSYS, "%s", _("Not supported on this platform"));
>      return NULL;
>  }
> -
>  #endif /* __linux__ */
>  
>  /**
> @@ -2070,3 +2069,153 @@ virCompareLimitUlong(unsigned long long a, unsigned long b)
>  
>      return -1;
>  }
> +
> +/*
> + * virTraverseDirectory:
> + * @dirpath: Directory path, (relative or absolute)
> + * @depth: The max directory tree depth for traversing
> + * @cb: Callback to handle file (not directory) during traversing
> + * @skibcb: Callback to define rules to skip entry

s/skibcb/skipcb

> + * @opaque: User data to pass to @cb
> + * @filepaths: Pointer to array to store the file paths, the file
> + *             path passed to @cb is stored into @filepaths as long
> + *             as @cb returns 0.
> + *
> + * Traverse a directory tree into specified @depth, handing the file

s/handing/handling

> + * with callback in the process. Caller must free @filepaths and
> + * its elements after use.
> + *
> + * Returns the number of file paths successfully handled by @cb on
> + * success, with @fillpaths (if passed @fillpath is not NULL) filled,

s/@fillpaths/@filepaths  (there's 2 of them)

> + * or -1 on failure.
> + */

NOTE: If you had run 'make -C tests valgrind' as part of your process
you would have seen there's a memory leak here and the 'utiltest' fails.
The output from my run has the following footprints repeated a few times:

==29216== 8 bytes in 1 blocks are definitely lost in loss record 2 of 39
==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
==29216==    by 0x4CBC251: virTraverseDirectory (virutil.c:2592)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
==29216==    by 0x40326F: virtTestRun (testutils.c:158)
==29216==    by 0x401D20: mymain (utiltest.c:305)
==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)
==29216==
==29216== 8 bytes in 1 blocks are definitely lost in loss record 3 of 39
==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
==29216==    by 0x4CBC3B4: virTraverseDirectory (virutil.c:2569)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
==29216==    by 0x40326F: virtTestRun (testutils.c:158)
==29216==    by 0x401D20: mymain (utiltest.c:305)
==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)


Beyond that the traversal seems OK to me - I think you're covering the
basics with hidden files and links. I do remember a recent discussion
regarding virFileDeleteTree() which may be useful to revisit to ensure
nothing else is forgotten.

> +int
> +virTraverseDirectory(const char *dirpath,
> +                     int depth,
> +                     unsigned int flags,
> +                     virTraverseDirectoryCallback cb,
> +                     virTraverseDirectorySkipCallback skipcb,
> +                     void *opaque,
> +                     char ***filepaths)
> +{
> +    struct dirent *entry = NULL;
> +    DIR *dir = NULL;
> +    char *fpath = NULL;
> +    struct stat st;
> +    int nfilepaths = 0;
> +    int ret = -1;
> +    int i;
> +
> +    if (depth < 0)
> +        return 0;
> +
> +    if (filepaths)
> +        *filepaths = NULL;
> +
> +    if (!(dir = opendir(dirpath))) {
> +        virReportSystemError(errno,
> +                             _("Failed to open directory '%s'"),
> +                             dirpath);
> +        return -1;
> +    }
> +
> +    while ((entry = readdir(dir))) {
> +        /* Always Ignore "." and ".." */
> +        if (STREQ(entry->d_name, ".") || STREQ(entry->d_name, ".."))
> +            continue;
> +
> +        /* Ignore hidden files if it's required */
> +        if (flags & VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES &&
> +            entry->d_name[0] == '.')
> +            continue;
> +
> +        if (virAsprintf(&fpath, "%s/%s", dirpath, entry->d_name) < 0) {
> +            virReportOOMError();
> +            goto error;
> +        }
> +
> +        /* Skip to handle the entry as long as @skipcb returns 0 for it */
> +        if (skipcb && skipcb(fpath, opaque) == 0) {
> +            VIR_FREE(fpath);
> +            continue;
> +        }
> +
> +        if (lstat(fpath, &st) < 0) {
> +            virReportSystemError(errno,
> +                                 _("Failed to stat '%s'"),

technically failed to 'lstat'

> +                                 fpath);
> +            goto error;
> +        }
> +
> +        /* Don't follow symbol link unless it's explicitly required */
> +        if (S_ISLNK(st.st_mode) &&
> +            !(flags & VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK)) {
> +                VIR_FREE(fpath);
> +                continue;
> +        }
> +
> +        if (S_ISDIR(st.st_mode)) {
> +            char **tmp = NULL;
> +            int rc;
> +
> +            if ((rc = virTraverseDirectory(fpath, depth - 1, flags, cb, skipcb,
> +                                           opaque, filepaths ? &tmp : NULL)) < 0)
> +                goto error;
> +
> +            if (rc > 0) {
> +                /* Copy the filepaths returned by recursive call */
> +                if (filepaths) {
> +                    if (VIR_REALLOC_N((*filepaths), nfilepaths + rc) < 0) {
> +                        for (i = 0; i < rc; i++)
> +                            VIR_FREE(tmp[i]);
> +                        VIR_FREE(tmp);
> +                        virReportOOMError();
> +                        goto error;
> +                    }
> +
> +                    for (i = 0; i < rc; i++)
> +                        (*filepaths)[nfilepaths + i] = tmp[i];
> +                }
> +                nfilepaths += rc;
> +            }

Perhaps the valgrind error is that the else clause here or the one for
"if (filepaths)" in the (rc > 0) condition.

In either case, tmp wouldn't be free()'d properly.

If you required filepaths to be non NULL - I believe the whole
mess/issue would be irrelevant...

> +        } else {
> +             /* Don't handle the file unless it's the last depth */
> +             if ((flags & VIR_TRAVERSE_DIRECTORY_FALL_THROUGH) &&
> +                 depth != 0) {
> +                 VIR_FREE(fpath);
> +                 continue;
> +             }
> +
> +             if (cb(fpath, opaque) == 0) {
> +                if (filepaths) {
> +                    if (VIR_REALLOC_N((*filepaths), nfilepaths + 1) < 0) {
> +                        virReportOOMError();
> +                        goto error;
> +                    }
> +
> +                    if (VIR_STRDUP((*filepaths)[nfilepaths], fpath) < 0)
> +                        goto error;
> +
> +                }
> +                nfilepaths++;
> +            }
> +        }
> +
> +        VIR_FREE(fpath);
> +    }
> +
> +    ret = nfilepaths;
> +
> +cleanup:
> +    closedir(dir);
> +    return ret;
> +
> +error:
> +    VIR_FREE(fpath);
> +    if (filepaths) {
> +        for (i = 0; i < nfilepaths; i++)
> +            VIR_FREE((*filepaths)[i]);
> +        VIR_FREE(*filepaths);
> +    }
> +    goto cleanup;
> +}
> diff --git a/src/util/virutil.h b/src/util/virutil.h
> index 280a18d..6c46f23 100644
> --- a/src/util/virutil.h
> +++ b/src/util/virutil.h
> @@ -166,4 +166,56 @@ char *virFindFCHostCapableVport(const char *sysfs_prefix);
>  
>  int virCompareLimitUlong(unsigned long long a, unsigned long b);
>  
> +/**
> + * virTraverseDirectoryCallback:
> + * @fpath: file path to handle
> + * @opaque: User data to pass to the callback
> + *
> + * Callback for "virTraverseDirectory".
> + *
> + * Returns 0 on success, -1 on failure.
> + */
> +typedef int (*virTraverseDirectoryCallback)(const char *fpath,
> +                                            void *opaque);
> +
> +/**
> + * virTraverseDirectorySkipCallback:
> + * @fpath: file path to handle
> + * @opaque: User data to pass to the callback
> + *
> + * Callback to control whether virTraverseDirectory should ship
> + * @fpath E.g. Skipping files whose name match a regex pattern.
> + *
> + * Return 0 to let "virTraverseDirectory" skip handling the @fpath,
> + * -1 to not skip.
> + */
> +typedef int (*virTraverseDirectorySkipCallback)(const char *fpath,
> +                                                void *opaque);
> +
> +/**
> + * virTraverseDirectoryFlags:
> + *
> + * Except virTraverseDirectorySkipCallback, one can use these flags to
> + * control the general behaviour of virTraverseDirectory
> + */
> +enum {
> +    /* Follow symbol links */
> +    VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK = 1 << 0,
> +
> +    /* Ignore hidden files or directory */
> +    VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES = 1 << 1,
> +
> +    /* Don't handle files unless it's in the last depth */
> +    VIR_TRAVERSE_DIRECTORY_FALL_THROUGH = 1 << 2,

Better named "MATCH_DEPTH_ONLY"?  FALL_THROUGH just has other

> +} virTraverseDirectoryFlags;
> +
> +int virTraverseDirectory(const char *dirpath,
> +                         int depth,
> +                         unsigned int flags,
> +                         virTraverseDirectoryCallback cb,
> +                         virTraverseDirectorySkipCallback skipcb,
> +                         void *opaque,
> +                         char ***filepaths);
> +    ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3)

Not sure "3" is what you mean - perhaps you meant 4 (eg, cb).

It also stands to reason filepaths shouldn't be NULL.  I guess I just
don't see the reason to return just a count, but perhaps there's a use
case I'm not thinking about or will be apparent in future patches.

John

> +
>  #endif /* __VIR_UTIL_H__ */
> 

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]