On Fri, Dec 31, 2004 at 11:48:19AM -0500, seth vidal wrote: > > > using the reader at the C level, this include decompressing the archive > > and walking though all nodes. The main cost is to turn the parsed data into > > Python's internal representation as I said. > > > > > than wouldn't be useful to > > > implement that small portion in C? or it isn't so small part? > > > > The string interning is in the Python lib, probably in C as it's a C API > > as far as I can tell. And no I din't looked at python internal code. > > I'm talking from ignorance here: > Would it be possible to speed up the string interning by providing your > own __repr__ methods in the libxml2 python module? Unfortunately that's not where the problem lies assuming I understand what you suggest, __repr__ is used to make a string representation from a python object, while the problem we have is about building that python object (which happen to be a string) based on the C string. We should double-check where time is actually spent. Using (k)cachegrind is very useful to make such an analysis. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard@xxxxxxxxxx | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/