At 9:50 AM +0100 3/2/07, Karel Zak wrote: >On Fri, Mar 02, 2007 at 01:07:05AM -0500, Tony Nelson wrote: >> Also, if it were to always run: >> >> Readahead-collector allocates memory in big chunks. It uses lots of memory >> -- when I ran it, 39 MB of /var/log/readahead-rac.log (which produced about >> .33 MB of /etc/readahead.d/custom.* -- but see bz 230687). (I note that >> readahead-collector will collect without limit, but that readahead will >> only use the first 32K entries.) Thus, while readahead-collect uses too >> much memory now to run every time, if it used a better data structure, say >> a balanced tree, and parsed the audit data into the tree as the data >> arrived, it could use about 2% of what is currently does. > > It's not so easy. My first implementation has collected only paths, but > this way is not reliable. You need to collect all events and parse it > by libauparse, because every syscall produces three events (syscall, > cwd and path) and the collector requires data from all three events. The > order of events could be *random* and before parsing you need to > all events for the syscall. I suggested parsing as the events are received. The order may vary, but all events for a particular file come in a group, according to parse_events and man auparse_next_record. It's the same event while the strings match up to the first ")" ("audit(1234567890.123:1): "). Collect strings until it changes and then auparse the preceding event. (For "balanced tree" read whatever mapping type you prefer.) > I think a simple solution is reduce number of fields in events and > store to memory simplificated event strings. I hope libauparse > doesn't have care about number of fields. This way can save 80% of > used memory (I think). I'll try to implement it. That would help, but it seems to rely on more of auparse's internals than would looking at the start of the string. > Frankly, I'm not sure if 30MB of RAM is so big problem in particular > case that readahead is effective solution for machines where is a lot of > memory for kernel cache. ISTM that it is too much memory if readahead-collector is to run every boot. > But you're right that there is a place for optimization. > >> Neither program seems to take account of the memory used by the files that >> are read, though readahead can report it. (Possibly readahead-collect >> should avoid the largest files, as they probably aren't mostly used and >> don't cause so much seeking.) > > Any example of really large file (during boot)? default.early 54204 KB /usr/lib/locale/locale-archive default.later 54204 KB /usr/lib/locale/locale-archive 46556 KB /var/lib/rpm/Packages 13748 KB /usr/share/icons/Bluecurve/icon-theme.cache custom.early 54204 KB /usr/lib/locale/locale-archive 10240 KB /var/lib/mysql/ibdata1 7373 KB /usr/share/fonts/japanese/TrueType/sazanami-gothic.ttf 5120 KB /var/lib/mysql/ib_logfie0 5120 KB /var/lib/mysql/ib_logfile1 custom.later 54204 KB /usr/lib/locale/locale-archive 25254 KB /usr/share/icons/crystalsvg/icon-theme.cache 13748 KB /usr/share/icons/Bluecurve/icon-theme.cache 7373 KB /usr/share/fonts/japanese/TrueType/sazanami-gothic.ttf 7356 KB /usr/lib/firerfox-1.5.0.10/components/libgklayoyut.so 6518 KB /usr/share/icons/gnome/icon-themee.cache 4756 KB /etrc/gconf/gconf.xml.defaults/%gconf-tree.xxml 4623 KB /usr/share/icons/hicolor/icon-theme.cache (I wrote a tool at <http://georgeanelson.com/readaheadsize.py>, but this is hand-copied.) Note that locale-archive is read twice, in both early and later. The early files list should be subtracted from the later files list, by a merge after sorting. (Ask me to do it?) Packages is for yum-updatesd, which I'm not running, and which, as a daemon, doesn't need to be sped up anyway. The mysql stuff is also for a daemon. Probably all daemons' files should be skipped? I think it unlikely that reading an icon-theme.cache is as useful as its size. I'm using gnome and Bluecurve, but I have some KDE stuff installed, so that's where the crystalsvg stuff comes from. It's clearly not worth its weight. Another issue is files opened for writing and not reading. There's no use reading them in at all. I don't have any examples right away. I expect that the open mode is in the message somewhere, but I can't read it well enough. >> Readahead-collector runs for 5 minutes, so its output might need pruning if >> it ran each boot. When run manually, one knows to start stuff up and then >> wait for readahead to finish. BTW, the collection loop has a 30 second >> timeout that isn't being used. It might be reasonable to stop collecting >> if no event has come in in that time. > > Good idea, but I'm pessimistic that there is 30s when system doesn't > call open() :-) In that case, readahead isn't going to help anyway. 8-b >> If readahead-collect could run automatically, readahead might request it >> for the next boot if "too many" files are not found (say, after a firefox >> update). > > Very good point. > > TODO updated: > > >http://git.fedoraproject.org/?p=hosted/readahead;a=blob_plain;f=TODO;hb=HEAD > > > Thanks. You're welcome. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@xxxxxxxxxxxxxxxxx> ' <http://www.georgeanelson.com/> -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list