On Fri, 9 Mar 2007, Anton Tropashko wrote: > > but your /usr should be large enough if /usr/local and /usr/local/src > are not!!! I don't like the size distribution. My /usr has 181585 files, but is 4.0G in size, which doesn't match yours. Also, I've wanted to generate bogus data for a while, just for testing, so I wrote this silly program that I can tweak the size distribution for. It gives me something that approaches your distribution (I ran it a few times, I now have 110402 files, and 5.7GB of space according to 'du'). It's totally unrealistic wrt packing, though (no deltas, and no compression, since the data itself is all random), and I don't know how to approximate that kind of details samely. I'll need to call it a day for the kids dinner etc, so I'm probably done for the day. I'll play with this a bit more to see if I can find various scalability issues (and just ignore the delta/compression problem - you probably don't have many deltas either, so I'm hoping that the fact that I only have 5.7GB will approximate your data thanks to it not being compressible). Linus --- #include <time.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/fcntl.h> /* * Create a file with a random size in the range * 0-1MB, but with a "pink noise"ish distribution * (ie equally many files in the 1-2kB range as in * the half-meg to megabyte range). */ static void create_file(const char *name) { int i; int fd = open(name, O_CREAT | O_WRONLY | O_TRUNC, 0666); static char buffer[1000]; unsigned long size = random() % (1 << (10+(random() % 10))); if (fd < 0) return; for (i = 0; i < sizeof(buffer); i++) buffer[i] = random(); while (size) { int len = sizeof(buffer); if (len > size) len = size; write(fd, buffer, len); size -= len; } close(fd); } static void start(const char *base, float dir_likely, float dir_expand, float end_likely, float end_expand) { int len = strlen(base); char *name = malloc(len + 10); mkdir(base, 0777); memcpy(name, base, len); name[len++] = '/'; dir_likely *= dir_expand; end_likely *= end_expand; for (;;) { float rand = (random() & 65535) / 65536.0; sprintf(name + len, "%ld", random() % 1000000); rand -= dir_likely; if (rand < 0) { start(name, dir_likely, dir_expand, end_likely, end_expand); continue; } rand -= end_likely; if (rand < 0) break; create_file(name); } } int main(int argc, char **argv) { /* * Tune the numbers to your liking.. * * The floats are: * - dir_likely (likelihood of creating a recursive directory) * - dir_expand (how dir_likely behaves as we move down recursively) * - end_likely (likelihood of ending file creation in a directory) * - end_expand (how end_likely behaves as we move down recursively) * * The numbers 0.3/0.6 0.03/1.1 are totally made up, and for me * generate a tree of between a few hundred files and a few tens * of thousands of files. * * Re-run several times to generate more files in the tree. */ srandom(time(NULL)); start("tree", 0.3, 0.6, 0.02, 1.1); return 0; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html