Re: [PATCH] Added sub get_owner_file which checks if there's a file with project owner name

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 30 Jan 2008, Nagy Balázs wrote:
> Jakub Narebski wrote:
>> Nagy Balázs wrote:
>>   
>>> Are you talking about I/O of an all-in CGI script?  
>>>     
>>
>> I am talking there between I/O difference between situation
>> (configuration) when $projects_list is a directory (default),
>> or is a file. If $projects_list is a directory, gitweb scans
>> directory structure to find git repositories, which for large
>> number of repositories might take time, even with filesystem
>> cache, and with depth of searching bound by $project_maxdepth.
>> Add to that finding symbolic name of the owner of repository
>> directory, or (with the patch) reading a file per repo with repo
>> owner.
>>   
> We have two configurable options here: $projectroot and $projects_list.  
> If $projects_list is a directory, we'll end up using a directory to get 
> project list info, and using another one to actually handle the 
> projects.  In small repo area it's safe to have $projects_list empty.  
> This is why I reference $projects_list as a file.

Besides the fact that using $projects_list file can speed up generating
'projects_list' page, it can also be used to either just restrict
visibility of certain projects (some projects will be not visible
in the projects list page, but will be still available when provided
with project name), or restrict/refuse access (if GITWEB_STRICT_EXPORT
aka $strict_export is true, only files shown in projects list page
would be available to browse; it can be further restricted using
"export-ok" mechanism).

You can use $projects_list pointing to directory with symlinks to
selected repositories residing under $projectroot for that. So it
is not only $projects_list a file, or $projects_list undef (and fallback
to $projectroot and $projects_list as directory). $projects_list as
a directory different from $projectroot has sense in some cases too.

> If $projects_list is a file, we'll rely on a file which was generated 
> some time ago and can't reflect the latest changes of $projectroot (but 
> see below).

Creating projects is a rare event. You cannot do this remotely with git
tools only. So I think it would be not very difficult and not very
suprising to use some script to add new project, script which would
ensure proper project configuration, perhaps setup proper SSH keys, and
regenerate $projects_list file if it is what gitweb is using.

[...]
>>> What if this script creates the $projects_list file, for example when 
>>> the $projectroot's mtime changes?  We can even hold mtime info for every 
>>> project's config file.
>>>     
>>
>> I don't understand what you wanted to say here. $projects_list file
>> lists only project path (project name) and project owner.
>>   
> I mean it would be better to add this kind of metadata like description 
> and owner's shoesize to config instead of a raw file.  I understand row 
> files are easier to read but reading a single cache file adn doing some 
> stat()s are much easier.  I can think of $project_lists as a cache file 
> name, which can be maintained by gitweb.cgi, and these mtime values 
> could be saved to $project_list to verify entries' validity.

Err... I think that having some kind of cache for 'projects_list' page
is a separate issue than using $projects_list file for a list
(and owners) of projects.

Besides I'd rather opt for the other side of spectrum: instead of
gitweb checking for freshness of a 'cache', regenerate the cache
or just delete it when you know that contents change: from a script
adding a repository, from a script renaming or changing description
or an owner of repository, from a script deleting repository or
removing it from a list, from a post-update / post-receive hook if
the cached info includes last change, etc.
 
> All we have to do is to maintain $project_list to be up to date.  The 
> best would be to have a separate projectlist maintainer script which 
> handles two scenarios:
> 
> 1| repo addition/deletion
> 2| repo config changes
> 
> I don't have solution for the first scenario which would be a speed 
> improvement in gitweb.cgi, this is why I suggest to put $project_list 
> updater to a separate script.  The second scenario could be handled by 
> gitweb.cgi though, but it would be mere code duplication.

I was thinking about gitconfig file, but with limited syntax to be
easily parseable from Perl, like git-cvsserver does, put in $projectroot,
e.g. $projectroot/gitconfig, which would contain parts of repo config
relevant to 'projects_list' page.  It would use gitweb.<repo>.<key>
syntax, where <key> is one of owner, description, and perhaps url.

Or we could put it in gitweb_config.perl file, in the form of parsed
config hash... well, it should be fairly easy to combine those two
approaches with current code: use %config hash, and fill it from
$projectroot/gitconfig if not set.

Of course you would have the usual danger when dealing with data
duplication, naley that they would get out of sync. And usual danger
dealing with caches, that the validating needed and other system
caches would make it perform *worse* than without cache.

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux