Re: Gitweb caching: Google Summer of Code project

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 28 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
> >
> > 1. Caching data
> >  * disadvantages:
> >    - more CPU
> >    - need to serialize and deserialize (parse) data
> >    - more complicated
> 
> CPU: John told me that so far CPU has *never* been an issue on k.org. 
> Unless someone tells me they've had CPU problems, I'll assume that CPU 
> is a non-issue until I actually run into it (and then I can optimize the 
> particular pieces where CPU is actually an issue).

True.

What you have to care about (although I don't think it would be
partilcularly difficult) is to not repeat bad I/O patterns with
cache...

> Serialization: I was planning to use Storable (memcached's Perl API uses 
> it transparently I think).  I'm hoping that this'll just solve it.

While Storable is part of, I think, any modern Perl installation, there
might be problem with memcached API, and memcached API wrappers such as
CHI one.  Namely you cannot assume that memcached API is installed, so
you have to provide some kind of fallback.
 
> It's true that it's more complicated.  It'll require quite a bit of 
> refactoring, and maybe I'll just back off if I find that it's too hard.

What's more, if you want to implement If-Modified-Since and
If-None-Match, you would have to implement it by yourself, while
for static pages (cahing HTML output) web server would do this
for us "for free".

> > I'm afraid that implementing kernel.org caching in mainline in
> > a generic way would be enough work for a whole GSoC 2008.
> 
> I probably won't reimplement the current caching mechanism.  Do you 
> think that a solution using memcached is generic enough?  I'll still 
> need to add some abstraction layer in the code, but when I'm finished 
> the user will either get the normal uncached gitweb, or activate 
> memcached caching with some configuration setting.

Thats good enough, although I think that current caching mechanism in
kernel.org's gitweb (your implementation follows more what repo.or.cz's
gitweb does) has some good ideas, like for example adaptive (depending
on load) expiry time.

By the way what do you think about adding (as an option) information
about gitweb performance to the output, in the form of
  "Site generated in 0.01 seconds, 2 calls to git commands"
or
  "Site generated in 0.0023 seconds, cached output, 1m31s old"
line somewhere in the page footer?

I hope you have some ideas in gitweb access statistics from kernel.org,
repo.or.cz, and perhaps other large git hosting sites (e.g.
freedesktop.org), and you plan on benchamrking gitweb caching using
average / amortized time to generate page, ApacheBench or equivalent,
load average on server depending on number of requests, I/O load (using
fio tool, for example) depending on number of requests etc.

> By the way, I'll be posting about gitweb on this mailing list 
> occasionally.  If any of you would like to receive CC's on such 
> messages, please let me know, otherwise I'll assume you get them through 
> the mailing list.

I read git mailing list via Usenet / news interface (NNTP gateway) from
GMane. 

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux