Re: [PATCH 03/11] util: Add a util to traverse directory tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/06/13 21:44, John Ferlan wrote:
On 06/07/2013 01:03 PM, Osier Yang wrote:
There is POSIX calls to walk through direcotry tree, nftw(3), but
s/is/are
s/direcotry/directory
s/tree/trees  or s/tree/a tree/

hm... I reviewed it times before posting it. :(


there is no way to allow one to pass user data to the callback:

int nftw(const char *dirpath,
          int (*fn) (const char *fpath, const struct stat *sb,
                     int typeflag, struct FTW *ftwbuf),
          int nopenfd, int flags);

Assuming a simple requirement: one wants to simply find out all the
files which have name "vendor" and it has a specific value, under a
directory, It's not possible to achieve with nftw(3). To solve the
requirements like this, this new util function comes.
Not sure the last sentence is necessary

int virTraverseDirectory(const char *dirpath,
                          int depth,
                          unsigned int flags,
                          virTraverseDirectoryCallback cb,
                          virTraverseDirectorySkipCallback skipcb,
                          void *opaque,
                          char ***filepaths);

  * Which allow to traverse a directory limited to specified @depth
(relative to the @dirpath)

  * Two callbacks are provided, callback @cb is to handle the file path
passed to it, with the user data @opaque; callback @skipcb is to
allow one to make the rules to skip the passed file path. For example,
one can define regex patterns in @skipcb to skip all the file path
which matches it.

  * @cb should return 0 to indicate the file path is successfully
handled, otherwise -1 should be returned.

  * Like @cb, @skipcb also should return 0 to indiate the file path
will be skipped to handle, otherwise -1 should be returned.

  * If passed @filepath is not NULL, it will be filled with the file
paths which are successfully handled by @cb.
What use would a NULL filepath be?  The whole purpose of the code is to
generate a list of files with a specific pattern.  The end result may be
an empty list, but passing NULL

I think there are some use cases won't care about the file paths it
processed, it's a future consideration though, I don't use it in that
way currently.

  * @flags can be used to control the general behaviour of the
traversing. E.g. Don't follow symbol links.

A simple example:

/*
  * utiltest.c: Prints all the file paths whose basename is
  *             "vendor" under directory "./testdir".
  */
static int
traverseCallback(const char *path,
                  void *opaque)
{
     const char *name = opaque;
     char *p = NULL;

     if ((p = strrchr(path, '/')))
         p++;

     if (STRNEQ(p, name))
         return -1;

     return 0;
}

static int
findVendor(void)
{
     char **filepaths = NULL;
     unsigned int flags = 0;
     const char *name = "vendor";
     int n;
     int i;

     flags |= VIR_TRAVERSE_DIRECTORY_FALL_THROUGH |
              VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES ;

     if ((n = virTraverseDirectory("./testdir", 2, flags,
                                   traverseCallback, NULL,
                                   (void *)name, &filepaths)) < 0)
         return -1;

     for (i = 0; i < n; i++)
         printf("Match%d: %s\n", i, filepaths[i]);

     for (i = 0; i < n; i++)
         VIR_FREE(filepaths[i]);
     VIR_FREE(filepaths);

     return 0;
}

Tree view of "./testdir":
% tree -a ./testdir/
./testdir/
|-- device
|-- dir1
|   |-- device
|   |-- dir2
|   |   |-- device
|   |   |-- dir3
|   |   |   `-- vendor
|   |   `-- vendor
|   |-- .dir7
|   |   `-- vendor
|   |-- dir8 -> ../dir4
|   `-- vendor
|-- dir4
|   |-- device
|   |-- dir5
|   |   |-- device
|   |   `-- vendor
|   `-- vendor
`-- vendor

7 directories, 12 files

% ./utiltest
Match0: ./testdir/dir1/dir2/vendor
Match1: ./testdir/dir4/dir5/vendor

Only "vendor" in second depth are returned (flag *_FALL_THROUGH took
effect), and the symbol link is not followed.

"virTraverseDirectory" is implemented using recursion method, which
is generally not good for performance and memory use, and even may
eats up the file descriptors. But since we allow to specify the
tree "depth", and I don't think libvirt has use cases which need to
traverse a very deep directory tree. And on the other hand,
Implementing it using either iteration or stack will be much less
readable.

Later patches will take use of virTraverseDirectory, and tests
will be added.
---
  src/libvirt_private.syms |   1 +
  src/util/virutil.c       | 151 ++++++++++++++++++++++++++++++++++++++++++++++-
  src/util/virutil.h       |  52 ++++++++++++++++
  3 files changed, 203 insertions(+), 1 deletion(-)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 1ea7467..fe182e8 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1965,6 +1965,7 @@ virSetNonBlock;
  virSetUIDGID;
  virSetUIDGIDWithCaps;
  virStrIsPrint;
+virTraverseDirectory;
  virValidateWWN;
diff --git a/src/util/virutil.c b/src/util/virutil.c
index c5246bc..c1938f9 100644
--- a/src/util/virutil.c
+++ b/src/util/virutil.c
@@ -2048,7 +2048,6 @@ virFindFCHostCapableVport(const char *sysfs_prefix ATTRIBUTE_UNUSED)
      virReportSystemError(ENOSYS, "%s", _("Not supported on this platform"));
      return NULL;
  }
-
  #endif /* __linux__ */
/**
@@ -2070,3 +2069,153 @@ virCompareLimitUlong(unsigned long long a, unsigned long b)
return -1;
  }
+
+/*
+ * virTraverseDirectory:
+ * @dirpath: Directory path, (relative or absolute)
+ * @depth: The max directory tree depth for traversing
+ * @cb: Callback to handle file (not directory) during traversing
+ * @skibcb: Callback to define rules to skip entry
s/skibcb/skipcb

+ * @opaque: User data to pass to @cb
+ * @filepaths: Pointer to array to store the file paths, the file
+ *             path passed to @cb is stored into @filepaths as long
+ *             as @cb returns 0.
+ *
+ * Traverse a directory tree into specified @depth, handing the file
s/handing/handling

+ * with callback in the process. Caller must free @filepaths and
+ * its elements after use.
+ *
+ * Returns the number of file paths successfully handled by @cb on
+ * success, with @fillpaths (if passed @fillpath is not NULL) filled,
s/@fillpaths/@filepaths  (there's 2 of them)

+ * or -1 on failure.
+ */
NOTE: If you had run 'make -C tests valgrind' as part of your process
you would have seen there's a memory leak here and the 'utiltest' fails.
The output from my run has the following footprints repeated a few times:

==29216== 8 bytes in 1 blocks are definitely lost in loss record 2 of 39
==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
==29216==    by 0x4CBC251: virTraverseDirectory (virutil.c:2592)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
==29216==    by 0x40326F: virtTestRun (testutils.c:158)
==29216==    by 0x401D20: mymain (utiltest.c:305)
==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)
==29216==
==29216== 8 bytes in 1 blocks are definitely lost in loss record 3 of 39
==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
==29216==    by 0x4CBC3B4: virTraverseDirectory (virutil.c:2569)
==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
==29216==    by 0x40326F: virtTestRun (testutils.c:158)
==29216==    by 0x401D20: mymain (utiltest.c:305)
==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)


Beyond that the traversal seems OK to me - I think you're covering the
basics with hidden files and links. I do remember a recent discussion
regarding virFileDeleteTree() which may be useful to revisit to ensure
nothing else is forgotten.

Hm, what virFileDeleteTree does should be able to achieve with

virTraverseDirectory, that's the use case which doesn't have to care
about the returned file paths I think.



+int
+virTraverseDirectory(const char *dirpath,
+                     int depth,
+                     unsigned int flags,
+                     virTraverseDirectoryCallback cb,
+                     virTraverseDirectorySkipCallback skipcb,
+                     void *opaque,
+                     char ***filepaths)
+{
+    struct dirent *entry = NULL;
+    DIR *dir = NULL;
+    char *fpath = NULL;
+    struct stat st;
+    int nfilepaths = 0;
+    int ret = -1;
+    int i;
+
+    if (depth < 0)
+        return 0;
+
+    if (filepaths)
+        *filepaths = NULL;
+
+    if (!(dir = opendir(dirpath))) {
+        virReportSystemError(errno,
+                             _("Failed to open directory '%s'"),
+                             dirpath);
+        return -1;
+    }
+
+    while ((entry = readdir(dir))) {
+        /* Always Ignore "." and ".." */
+        if (STREQ(entry->d_name, ".") || STREQ(entry->d_name, ".."))
+            continue;
+
+        /* Ignore hidden files if it's required */
+        if (flags & VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES &&
+            entry->d_name[0] == '.')
+            continue;
+
+        if (virAsprintf(&fpath, "%s/%s", dirpath, entry->d_name) < 0) {
+            virReportOOMError();
+            goto error;
+        }
+
+        /* Skip to handle the entry as long as @skipcb returns 0 for it */
+        if (skipcb && skipcb(fpath, opaque) == 0) {
+            VIR_FREE(fpath);
+            continue;
+        }
+
+        if (lstat(fpath, &st) < 0) {
+            virReportSystemError(errno,
+                                 _("Failed to stat '%s'"),
technically failed to 'lstat'

+                                 fpath);
+            goto error;
+        }
+
+        /* Don't follow symbol link unless it's explicitly required */
+        if (S_ISLNK(st.st_mode) &&
+            !(flags & VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK)) {
+                VIR_FREE(fpath);
+                continue;
+        }
+
+        if (S_ISDIR(st.st_mode)) {
+            char **tmp = NULL;
+            int rc;
+
+            if ((rc = virTraverseDirectory(fpath, depth - 1, flags, cb, skipcb,
+                                           opaque, filepaths ? &tmp : NULL)) < 0)
+                goto error;
+
+            if (rc > 0) {
+                /* Copy the filepaths returned by recursive call */
+                if (filepaths) {
+                    if (VIR_REALLOC_N((*filepaths), nfilepaths + rc) < 0) {
+                        for (i = 0; i < rc; i++)
+                            VIR_FREE(tmp[i]);
+                        VIR_FREE(tmp);
+                        virReportOOMError();
+                        goto error;
+                    }
+
+                    for (i = 0; i < rc; i++)
+                        (*filepaths)[nfilepaths + i] = tmp[i];
+                }
+                nfilepaths += rc;
+            }
Perhaps the valgrind error is that the else clause here or the one for
"if (filepaths)" in the (rc > 0) condition.

In either case, tmp wouldn't be free()'d properly.

If you required filepaths to be non NULL - I believe the whole
mess/issue would be irrelevant...

+        } else {
+             /* Don't handle the file unless it's the last depth */
+             if ((flags & VIR_TRAVERSE_DIRECTORY_FALL_THROUGH) &&
+                 depth != 0) {
+                 VIR_FREE(fpath);
+                 continue;
+             }
+
+             if (cb(fpath, opaque) == 0) {
+                if (filepaths) {
+                    if (VIR_REALLOC_N((*filepaths), nfilepaths + 1) < 0) {
+                        virReportOOMError();
+                        goto error;
+                    }
+
+                    if (VIR_STRDUP((*filepaths)[nfilepaths], fpath) < 0)
+                        goto error;
+
+                }
+                nfilepaths++;
+            }
+        }
+
+        VIR_FREE(fpath);
+    }
+
+    ret = nfilepaths;
+
+cleanup:
+    closedir(dir);
+    return ret;
+
+error:
+    VIR_FREE(fpath);
+    if (filepaths) {
+        for (i = 0; i < nfilepaths; i++)
+            VIR_FREE((*filepaths)[i]);
+        VIR_FREE(*filepaths);
+    }
+    goto cleanup;
+}
diff --git a/src/util/virutil.h b/src/util/virutil.h
index 280a18d..6c46f23 100644
--- a/src/util/virutil.h
+++ b/src/util/virutil.h
@@ -166,4 +166,56 @@ char *virFindFCHostCapableVport(const char *sysfs_prefix);
int virCompareLimitUlong(unsigned long long a, unsigned long b); +/**
+ * virTraverseDirectoryCallback:
+ * @fpath: file path to handle
+ * @opaque: User data to pass to the callback
+ *
+ * Callback for "virTraverseDirectory".
+ *
+ * Returns 0 on success, -1 on failure.
+ */
+typedef int (*virTraverseDirectoryCallback)(const char *fpath,
+                                            void *opaque);
+
+/**
+ * virTraverseDirectorySkipCallback:
+ * @fpath: file path to handle
+ * @opaque: User data to pass to the callback
+ *
+ * Callback to control whether virTraverseDirectory should ship
+ * @fpath E.g. Skipping files whose name match a regex pattern.
+ *
+ * Return 0 to let "virTraverseDirectory" skip handling the @fpath,
+ * -1 to not skip.
+ */
+typedef int (*virTraverseDirectorySkipCallback)(const char *fpath,
+                                                void *opaque);
+
+/**
+ * virTraverseDirectoryFlags:
+ *
+ * Except virTraverseDirectorySkipCallback, one can use these flags to
+ * control the general behaviour of virTraverseDirectory
+ */
+enum {
+    /* Follow symbol links */
+    VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK = 1 << 0,
+
+    /* Ignore hidden files or directory */
+    VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES = 1 << 1,
+
+    /* Don't handle files unless it's in the last depth */
+    VIR_TRAVERSE_DIRECTORY_FALL_THROUGH = 1 << 2,
Better named "MATCH_DEPTH_ONLY"?  FALL_THROUGH just has other

Not sure, I know FALL_THROUGH is not a good name, but I don't see
MATCH_DEPTH_ONLY makes it better either.. :-), since whether it's
matching or anything else, completely depends on what @cb does.



+} virTraverseDirectoryFlags;
+
+int virTraverseDirectory(const char *dirpath,
+                         int depth,
+                         unsigned int flags,
+                         virTraverseDirectoryCallback cb,
+                         virTraverseDirectorySkipCallback skipcb,
+                         void *opaque,
+                         char ***filepaths);
+    ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3)
Not sure "3" is what you mean - perhaps you meant 4 (eg, cb).

yes.. I made too many typos..


It also stands to reason filepaths shouldn't be NULL.  I guess I just
don't see the reason to return just a count, but perhaps there's a use
case I'm not thinking about or will be apparent in future patches.

John

+
  #endif /* __VIR_UTIL_H__ */


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]