Re: Switching from CVS to GIT

Eli Zaretskii <eliz@xxxxxxx> · Tue, 16 Oct 2007 06:30:11 +0200

> Date: Mon, 15 Oct 2007 20:45:02 -0400 (EDT)
> From: Daniel Barkalow <barkalow@xxxxxxxxxxxx>
> cc: Alex Riesen <raa.lkml@xxxxxxxxx>, Johannes.Schindelin@xxxxxx, ae@xxxxxx, 
>     tsuna@xxxxxxxxxxxxx, git@xxxxxxxxxxxxxxx, make-w32@xxxxxxx
> 
> I believe the hassle is that readdir doesn't necessarily report a README in 
> a directory which is supposed to have a README, when it has a readme 
> instead.

Sorry I'm asking potentially stupid questions out of ignorance: why
would you want readdir to return `README' when you have `readme'?

> I think we want O(n) comparison of sorted lists, which doesn't 
> work if equivalent names don't sort the same.

You comparison function should be case-insensitive on Windows, or am I
missing something?

> > > - no acceptable level of performance in filesystem and VFS (readdir,
> > >   stat, open and read/write are annoyingly slow)
> > 
> > With what libraries?  Native `stat' and `readdir' are quite fast.
> > Perhaps you mean the ported glibc (libgw32c), where `readdir' is
> > indeed painfully slow, but then you don't need to use it.
> 
> We want getting stat info, using readdir to figure out what files exist, 
> for 106083 files in 1603 directories with a hot cache to take under 1s; 
> otherwise "git status" takes a noticeable amount of time with a medium-big 
> project, and we want people to be able to get info on what's changed 
> effectively instantly. My impression is that Windows' native stat and 
> readdir are plenty fast for what normal Windows programs want, but we 
> actually expect reasonable performance on an unreasonably-big 
> metadata-heavy input.

If that's the issue, then it's not a good idea to call `stat' and
`readdir' on Windows at all.  `stat' is a single system call on Posix
systems, while on Windows it usually needs to go out of its way
calling half a dozen system services to gather the `struct stat' info.
You need to call something like FindFirstFile, which can do the job of
`stat' and `readdir' together (and of `fnmatch', if you need to filter
only some files) in one go.  I don't know whether this will scan 100K
files under one second (maybe I will try it one of these days), but it
will definitely be faster than `readdir'+`stat' by maybe as much as an
order of magnitude.

> > > - no real "mmap" (which kills perfomance and complicates code)
> > 
> > You only need mmap because you are accustomed to use it on GNU/Linux.
> 
> I believe the need here is quick setup and fast access to sparse portions 
> of several 100M files. It's hard to beat a page fault for read speed.

If you need memory-mapped files, they are available on Windows.  I
thought the original comment about `mmap' was because it was used to
allocate memory, not read files into memory.

> We also expect to be able to make a sequence of file system operations 
> such that programs starting at any time see the same database as the files 
> containing the database get restructured.

Sorry, I don't understand this; please tell more about the operations,
``the same database'' issue (what database?) and what do you mean by
``the files containing the database get restructured''.

> A unixy pipeline was convenient

Windows supports pipelines with almost 100% the same functionality as
Posix.  Again, perhaps I'm missing something.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html