Re: Errors cloning large repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 9 Mar 2007, Anton Tropashko wrote:
> 
> but your /usr should be large enough if /usr/local and /usr/local/src 
> are not!!!

I don't like the size distribution.

My /usr has 181585 files, but is 4.0G in size, which doesn't match yours. 
Also, I've wanted to generate bogus data for a while, just for testing, so 
I wrote this silly program that I can tweak the size distribution for.

It gives me something that approaches your distribution (I ran it a few 
times, I now have 110402 files, and 5.7GB of space according to 'du').

It's totally unrealistic wrt packing, though (no deltas, and no 
compression, since the data itself is all random), and I don't know how to 
approximate that kind of details samely.

I'll need to call it a day for the kids dinner etc, so I'm probably done 
for the day. I'll play with this a bit more to see if I can find various 
scalability issues (and just ignore the delta/compression problem - you 
probably don't have many deltas either, so I'm hoping that the fact 
that I only have 5.7GB will approximate your data thanks to it not being 
compressible).

		Linus

---
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/fcntl.h>

/*
 * Create a file with a random size in the range
 * 0-1MB, but with a "pink noise"ish distribution
 * (ie equally many files in the 1-2kB range as in
 * the half-meg to megabyte range).
 */
static void create_file(const char *name)
{
	int i;
	int fd = open(name, O_CREAT | O_WRONLY | O_TRUNC, 0666);
	static char buffer[1000];
	unsigned long size = random() % (1 << (10+(random() % 10)));

	if (fd < 0)
		return;
	for (i = 0; i < sizeof(buffer); i++)
		buffer[i] = random();
	while (size) {
		int len = sizeof(buffer);
		if (len > size)
			len = size;
		write(fd, buffer, len);
		size -= len;
	}
	close(fd);
}

static void start(const char *base,
	float dir_likely, float dir_expand,
	float end_likely, float end_expand)
{
	int len = strlen(base);
	char *name = malloc(len + 10);

	mkdir(base, 0777);

	memcpy(name, base, len);
	name[len++] = '/';

	dir_likely *= dir_expand;
	end_likely *= end_expand;

	for (;;) {
		float rand = (random() & 65535) / 65536.0;

		sprintf(name + len, "%ld", random() % 1000000);
		rand -= dir_likely;
		if (rand < 0) {
			start(name, dir_likely, dir_expand, end_likely, end_expand);
			continue;
		}
		rand -= end_likely;
		if (rand < 0)
			break;
		create_file(name);
	}
}

int main(int argc, char **argv)
{
	/*
	 * Tune the numbers to your liking..
	 *
	 * The floats are:
	 *  - dir_likely (likelihood of creating a recursive directory)
	 *  - dir_expand (how dir_likely behaves as we move down recursively)
	 *  - end_likely (likelihood of ending file creation in a directory)
	 *  - end_expand (how end_likely behaves as we move down recursively)
	 *
	 * The numbers 0.3/0.6 0.03/1.1 are totally made up, and for me
	 * generate a tree of between a few hundred files and a few tens 
	 * of thousands of files.
	 *
	 * Re-run several times to generate more files in the tree.
	 */
	srandom(time(NULL));
	start("tree",
		0.3, 0.6,
		0.02, 1.1);
	return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]