On Fri, Jul 24, 2015 at 04:54:57PM +0100, Daniel P. Berrange wrote: > On Fri, Jul 24, 2015 at 04:50:34PM +0200, Christophe Fergeau wrote: > > > - Should we restructure the database ? > > > > > > eg, we have a single data/oses/fedora.xml file that contains > > > the data for every Fedora release. This is already 200kb in > > > size and will grow forever. If we split up all the files > > > so there is only ever one entity (os, hypervisor, device, etc) > > > in each XML file, each file will be smaller in size. This would > > > also let us potentially do database minimization. eg we could > > > provide a download that contains /all/ OS, and another download > > > that contains only non-end-of-life OS. > > > > I was about to make the same comment as Zeeshan, GNOME has had issues in > > the past with data scattered among too many small files, in general this > > is solved by adding a cache file containing a concatenated version of > > all the files (possibly pre-parsed to some domain-specific format). > > If we can avoid loading the entire database, and only load the subset > of files we want info on, we'd hopefully not have such problems. I > could see benefit in having some "index" file perhaps which says > which entity is defined in which file, as a way to avoid dictating > a filename/dirname convention. FYI, I wrote a simple perl script to process our current XML files and split them up into 1 file per entity... This resulted in 438 individual XML files. I timed libosinfo speed of loading the database with the current database structure, and with the split structure. There as no measurable difference in load time. I repeated using vm.drop_caches=3 to clear the FS cache between timing, and still found no difference in load time. So I think our load time is not dominated by the number of files we have - most likely the XML parsing & object allocation is our main timesink. FWIW, with warm cache it was ~250ms, with cold cache it was 1.9s, though in the latter number I don't know how much of that time is from loading the ELF libraries, vs the database. Anyway, it wasn't different according to file split. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ Libosinfo mailing list Libosinfo@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libosinfo