Re: [PATCH/RFC] gitweb: Restructure projects list generation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 27 Feb 2011, Jonathan Nieder wrote:
> Jakub Narebski wrote:
> 
> > Extract filtering out forks (which is done if 'forks' feature is
> > enabled) into filter_forks_from_projects_list subroutine
> 
> Context (to remind myself): the git_get_projects_list function decides
> which repositories to show on the homepage.

Actually it git_get_projects_list generates list of all exported
repositories, whether they are visible in projects list page (homepage)
or not... at least after this change.

Well... list of directories that look like repositories; it is actually
checked further with git_get_last_activity (running actual git command)
in fill_project_list_info if they are really git repositories.

>                                               git_project_list_body 
> takes care of actually showing them in a table; it displays a "+" sign
> in front of projects with forks and generally the project list it is
> passed omits the forks themselves.

Well, forks are hidden (listed on 'forks' pages linked to by those "+"
signs) only if 'forks' feature is enabled.

> 
> git_get_projects_list needs to be reasonably fast, since it is used
> to validate the "project" parameter in every request.

Only with $strict_export set to true (it defaults to false).

> [...]
> > Both are now run _before_ displaying projects, and not while printing;
> > this allow to know upfront if there were any found projects.  Gitweb
> > now can and do print 'No such projects found' if user searches for
> > phrase which does not correspond to any project (any repository).
> 
> Aren't forks filtered out by git-get-projects-list already?

After this commit they are not, and they should not be.  If forks were
filtered out, then with $strict_export you wouldn't be able to access
pages of repositories which are forks.

> [...]
> > Filtering out forks and marking repository (project) as having forks
> > is now consolidated into one subroutine (special case of handling
> > forks in git_get_projects_list only for $projects_list being file is
> > now removed).
> 
> Ah, no, they aren't.  That's a good change for consistency already.
> 
> > Forks handling is also cleaned up and simplified, which
> > might mean some changes in output.
> 
> (Not having read the code yet) this part is not obvious to me.  Maybe
> an example could help.

Well, the commit message is quite large anyway, so I didn't describe in
detailwhat one can read in patch itself...

But see also comment below about algorithm used.

> [...]
> > The interaction between forks, content tags and searching is now made
> > more explicit: searching whether by tag, or via search form turns off
> > fork filtering (gitweb searches also forks, and will show all
> > results).
> 
> Hm, or maybe this is it.
> 
> > IMVHO the code is
> > cleaner.
> 
> So some general code tidying is intermixed.
> 
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -2646,18 +2646,20 @@ sub git_get_projects_list {
> [...]
> > -		my $dir = $projects_list . ($filter ? "/$filter" : '');
> > +		my $dir = $projects_list;
> >  		# remove the trailing "/"
> >  		$dir =~ s!/+$!!;
> > -		my $pfxlen = length("$dir");
> > -		my $pfxdepth = ($dir =~ tr!/!!);
> > +		my $pfxlen = length("$projects_list");
> > +		my $pfxdepth = ($projects_list =~ tr!/!!);
> [...]
> 
> $dir is the same as before except that trailing slashes are removed,
> and meanwhile the stripped names of directories returned from find()
> will include $filter.  Should be almost a no-op (i.e., a nice
> simplification).

Thanks.

This code could be further simplified if not for the fact that at least
theoretically with $projects_list being directory to scan, $projectroot
can be different from $projects_list.

I'll try to resend fixed version soon.

> > @@ -2672,14 +2674,14 @@ sub git_get_projects_list {
> >  				# only directories can be git repositories
> >  				return unless (-d $_);
> >  				# don't traverse too deep (Find is super slow on os x)
> > +				# $project_maxdepth excludes depth of $projectroot
> >  				if (($File::Find::name =~ tr!/!!) - $pfxdepth > $project_maxdepth) {
> 
> A bugfix.  $project_maxdepth is advertised to be relative to the
> projectroot but it was relative to $projects_list/$filter.

I'm not sure if it was advertised in such detail, but I think consistency
is good to have: we don't go beyond some depth regardless whether $filter
is set or not, i.e. whether we are getting list of projects for homepage
('projects_list' action) or for some 'forks' page.

> [...]
> > @@ -2703,32 +2704,9 @@ sub git_get_projects_list {
> >  			if (!defined $path) {
> >  				next;
> >  			}
> > -			if ($filter ne '') {
> > -				# looking for forks;
> > -				my $pfx = substr($path, 0, length($filter));
> > -				if ($pfx ne $filter) {
> > -					next PROJECT;
> > -				}
> > -				my $sfx = substr($path, length($filter));
> > -				if ($sfx !~ /^\/.*\.git$/) {
> > -					next PROJECT;
> > -				}
> [...]
> > +			# if $filter is rpovided, check if $path begins with $filter
> 
> s/rpovided/provided/ :)

I should have run spellchecker...

> > +			if ($filter && $path !~ m!^\Q$filter\E/!) {
> > +				next;
> 
> Why did the code check for \Q.git\E$ before?  If I'm not missing
> something, I'm happy to see the condition go.

This was about looking for forks, which is now moved out of this
subroutine into filter_forks_from_projects_list subroutine.

> What if $filter is 0?  Silly question, I guess.

Not a problem.  \Q and \E are for quoting metacharacters, i.e. treating
$filter as string (as prefix to be matched) rather than as regexp.

> [...]
> > @@ -2745,6 +2721,67 @@ sub git_get_projects_list {
> >  	return @list;
> >  }
> >  
> > +# this assumes that forks follow immediately forked projects:
> > +# [ ..., 'repo.git', 'repo/fork.git', ...]; if not, set second
> > +# parameter to true value to sort projects by path first.
> > +sub filter_forks_from_projects_list {
> > +	my ($projects, $need_sorting) = @_;
> > +
> > +	@$projects = sort { $a->{'path'} cmp $b->{'path'} } @$projects
> > +		if ($need_sorting);
> 
> What happens in this scenario?
> 
> 	git.git
> 	git.related.project.git
> 	git/platforms.git

Thanks for catching this.

It looks like I have oversimplified the algorithm by requiring that 
forked project directly precedes any of its forks (if they exists).
I'd have to redo this part of patch.

> [...]
> > @@ -4747,23 +4784,36 @@ sub fill_project_list_info {
> >  		if (!defined $pr->{'owner'}) {
> >  			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
> >  		}
> > -		if ($check_forks) {
> > -			my $pname = $pr->{'path'};
> > -			if (($pname =~ s/\.git$//) &&
> > -			    ($pname !~ /\/$/) &&
> > -			    (-d "$projectroot/$pname")) {
> > -				$pr->{'forks'} = "-d $projectroot/$pname";
> 
> So currently we treat project.git as having forks when the
> corresponding project/ directory exists, even when the latter is
> empty.  The proposed new rule is that project.git has forks if and
> only if there is at least one project/foo.git repository.  Sensible.

Actually it is marked as having 0 forks, which visually results in "+"
sign which is not a hyperlink to 'forks' view.

Though I am not sure if it is worth it: without mentioned feature code
would be slightly simpler.


Note also that previous code put _string_ into 'forks' field, with
contents "-d <path to project>", instead of 1 as one might expect.

Earlier code contained a few leftover debugging things; I am not sure
if I have removed everything, at least wrt. projects list.

> Hope that helps,

Thanks again for your comments.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]