Re: why doesn't yum cache anything?

Daniel Veillard <veillard@xxxxxxxxxx> · Fri, 31 Dec 2004 08:04:59 -0500

On Fri, Dec 31, 2004 at 01:27:49PM +0100, Farkas Levente wrote:
> Daniel Veillard wrote:
> >  Parsing the XML file and building the associated Python objects.
> >
> >And before bashing XML and the cost of parsing, it's only a very small
> >fraction of the time spent, building the Python strings and objects is
> >the really costly part as we found with seth when doing basic tests.
> >My own test led me to believe that python string interning (take a 
> >string from the C layer or XML and get the copy from Python own string
> >implementation) is extremely costly, and of course we are manipulating
> >an very large amount of strings when collecting the repodata.
> 
> have you already made some real mesurement?

  of what ? yes I know exactly how long it takes libxml2 to parse
the data: 

[root@localhost ~]# xmllint --stream --timing /var/cache/yum/base/primary.xml.gzParsing took 1094 ms

  using the reader at the C level, this include decompressing the archive
and walking though all nodes. The main cost is to turn the parsed data into
Python's internal representation as I said.

> than wouldn't be useful to 
> implement that small portion in C? or it isn't so small part?

  The string interning is in the Python lib, probably in C as it's a C API
as far as I can tell. And no I din't looked at python internal code.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard@xxxxxxxxxx  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/