On Fri, Jan 28, 2005 at 09:12:42AM -0500, ANDREW MARLOW, BLOOMBERG/ LONDON OF wrote: > I have a large XML file and it takes quite a long time > for gscanner to read it. Using quantify shows that > several calls to read are made. I wonder if things can > be sped up by allowing the caller to specify the > buffer size used for read? Currently this is set > by a macro in gscanner.c to 4000 bytes. > I would like to use a larger value when I am > using a larger XML file. Any thoughts? I've never used the gscanner before. (all this is pretty new to me.) I was reading about it last night though. (There's a lot of really cool stuff in Glib, isn't there? GTK+ too. I'm amazed.) so, take this with a grain of salt: I looked at the source just a little and I agree with you that it seems like one possible bottle neck is the file handling stuff... though it also looks like it's been written more or less as efficiently as possible. Increasing the buffer size could help, but I'm thinking a better solution would be to stop using gscanner's reading code all together. That would make the read buffer size a non-issue. Instead, use mmap() to suck the entire file into memory before you start feeding it to the scanner. For those who've never used mmap() before: make sure you call munmap() once for each time you call mmap() on a given file. I ran into this bug once where a file was mmap()'d and kept in memory by one program and periodically updated by another program. Whenever the mmap()ing program determined that the file had been updated, it reopened the file and mmap()ed it again to the same memory address. It looked like valid code, but caused what looked like a file descriptor leak. This isn't mentioned in the docs last I checked. - Ben _______________________________________________ gtk-list@xxxxxxxxx http://mail.gnome.org/mailman/listinfo/gtk-list