Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list

Jakub Narebski <jnareb@xxxxxxxxx> · Thu, 14 Sep 2006 09:59:03 +0200

Dnia czwartek 14. września 2006 09:37, napisałeś:
> Jakub Narebski <jnareb@xxxxxxxxx> writes:
> 
> > +		sub wanted {
> > +			# skip dot files (hidden files), check only directories
> > +			#return if (/^\./);
> 
> Leftover comment?

Leftover comment, from copying anonymous 'wanted' subroutine
from git_get_refs_list. But I have realized that for gitweb
for only one project one could have ".git" as a project name,
e.g. by putting $projectroot to be live git repository (working
directory of git repository).

> > +			my $subdir = substr($File::Find::name, $pfxlen + 1);
> > +			# we check related file in $projectroot
> > +			if (-e "$projectroot/$subdir/HEAD") {
> > +				push @list, { path => $subdir };
> > +				$File::Find::prune = 1;
> 
> We might want to do an extra cheap check to make what we found
> is sane, to prevent us getting confused by a random file whose
> name happens to be HEAD.

That is what we did before. Simplest check, also to avoid now to 
claim top directory as git repository, and to know when to cut-off 
(prune) finding.

It was intended I think to avoid adding '.' and '..' as git 
repositories, not stray directories. Well, perhaps index file
if it was used.

> For example, it is a regular file whose contents is a single
> line and begins with "ref: refs/heads/" (16 bytes) or it is a
> symlink and readlink result begins with "refs/heads/" (11
> bytes).

We can do that, but I think it is unnecessary. Let's assume that
$projectroot contains _only_ git repositories, perhaps in subdirs 
(directory hierarchy), and perhaps some stray files like not used
now index file.

> If you feel opening and reading the file is an added overhead,
> checking for $project/$subdir/{objects,refs}/ directories might
> be a good alternative.

Probably overkill.

> > +		File::Find::find({
> > +			no_chdir => 1, # do not change directory
> > +			follow_fast => 1, # follow symbolic links
> 
> What is the reason behind choosing follow_fast?  By saying
> follow_anything, you choose to care about cases where there are
> symlinks under projectroot to point at various projects.  If
> that is the case, don't you want to make sure you include the
> same project only once?

First, it is faster. Second, for testing if it works I used copy
of a one "live" git repository I have (git.git repository), by making
second symlink to it.

> > +			#follow_skip => 2, # ignore duplicated files and directories
> 
> Leftover comment?

Leftover from benchmarking what set of options is faster.

By the way, if we choose to use 'follow' instead of 'follow_fast' we 
might want to uncomment it, to not spew errors in the log.

> About these two leftover comments, if you decided you did not
> want them, please do not leave them behind.

O.K.

> If on the other hand you wanted to hint others that you are not
> sure about your decision, it would probably be better to say
> that honestly in the comment, perhaps mark the message as RFC,
> and describe what the issues are, like so:
> 
> 	sub wanted {
> 		# We might want to also ignore dot files, by
>                 # saying "return if /^\./;" here, but there is
>                 # no inherent reason for us to forbid a repository
>                 # name from starting with a dot.

True.

>                 # We check only if a directory looks like a git
>                 # repo and do not care about non directories.
>                 # Note that this cannot be done with "-d _"
>                 # because we are using follow_fast and the last
>                 # stat was done with lstat(); we want to catch a
>                 # symlink that points at a directory.
>                 return unless -d $_;
>                 ...

Not true. Link to directory is both -d $_ and -l $_, so

	return unless (-d $_ || (-l $_ && -d readlink($_)));

is not needed.
-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html