[PATCH] Lower memory requirements for layouts with duplicate files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



mkfs.cramfs allocates memory based on a calculated upper-bound
of required filesystem size. If there are duplicate files
or hard links, the current implementation unnecessarily increases
the upper-bound per each copy of the file, even though cramfs does
not store copies of contents of identical files.

This patch improves the calculation of fslen_ub, the upper bound
of required filesystem size, by making the upper bound aware of
duplicate files.

This is very helpful for layouts that hold a lot of hard links,
which are seen as duplicate files by mkfs.cramfs. For example,
this drastically reduces the memory requirements for creating
a standard Busybox layout.

Signed-off-by: Roy Peled <the.roy.peled@xxxxxxxxx>

---
diff -urNp util-linux-ng-2.14.1.org/disk-utils/mkfs.cramfs.c util-linux-ng-2.14.1/disk-utils/mkfs.cramfs.c
--- util-linux-ng-2.14.1.org/disk-utils/mkfs.cramfs.c	2008-11-13 05:57:01.000000000 +0200
+++ util-linux-ng-2.14.1/disk-utils/mkfs.cramfs.c	2008-11-13 06:26:41.000000000 +0200
@@ -247,7 +247,7 @@ identical_file(struct entry *e1, struct 
  */
 #define MAX_INPUT_NAMELEN 255
 
-static int find_identical_file(struct entry *orig, struct entry *new)
+static int find_identical_file(struct entry *orig, struct entry *new, loff_t *fslen_ub)
 {
         if (orig == new)
 		return 1;
@@ -264,19 +264,20 @@ static int find_identical_file(struct en
 		    !memcmp(orig->md5sum, new->md5sum, 16) &&
 		    identical_file(orig, new)) {
 			new->same = orig;
+			*fslen_ub -= new->size;
 			return 1;
 		}
         }
-        return find_identical_file(orig->child, new) ||
-                   find_identical_file(orig->next, new);
+        return find_identical_file(orig->child, new, fslen_ub) ||
+                   find_identical_file(orig->next, new, fslen_ub);
 }
 
-static void eliminate_doubles(struct entry *root, struct entry *orig) {
+static void eliminate_doubles(struct entry *root, struct entry *orig, loff_t *fslen_ub) {
         if (orig) {
                 if (orig->size && orig->path)
-			find_identical_file(root,orig);
-                eliminate_doubles(root,orig->child);
-                eliminate_doubles(root,orig->next);
+			find_identical_file(root,orig, fslen_ub);
+                eliminate_doubles(root,orig->child, fslen_ub);
+                eliminate_doubles(root,orig->next, fslen_ub);
         }
 }
 
@@ -859,7 +860,7 @@ int main(int argc, char **argv)
 	}
 
         /* find duplicate files */
-        eliminate_doubles(root_entry,root_entry);
+        eliminate_doubles(root_entry,root_entry, &fslen_ub);
 
 	/* TODO: Why do we use a private/anonymous mapping here
            followed by a write below, instead of just a shared mapping
--
To unsubscribe from this list: send the line "unsubscribe util-linux-ng" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux