Andreas, Thanks for your helpful reply. On Thu, 2006-01-12 at 12:52 -0700, Andreas Dilger wrote: > On Jan 12, 2006 12:07 -0500, Charles P. Wright wrote: > > I'm writing an application that makes pretty extensive use of extended > > attributes to store file attributes on Ext2. I used a profiling tool > > developed by my colleague Nikolai Joukov at SUNY Stony Brook to dig a > > bit deeper into the performance of my application. > > Presumably you are using ext3 and not ext2, given posting to this list? Actually this test case was on Ext2, not Ext3. I did a quick search for an ext2-users list and didn't immediately see results, so I figured that as Ext2 and Ext3 have similar EA implementations, this list would be appropriate. > > In the course of my benchmark, there are 54247 setxattr operations > > during a 54 seconds. They use about 10.56 seconds of the time, which > > seemed to be a rather outsized performance toll to me (~40k writes took > > only 10% as long). > > > > After looking at the profile, 27 of those writes end up taking 7.74 > > seconds. That works out to roughly 286 ms per call; which seems a bit > > high. > > > > The workload is not memory constrained (the working set is 50MB + 5000 > > files). Each file has one extended attribute block that contains two > > attributes totaling 32 bytes. The attributes are unique (random > > actually), so there isn't any sharing. > > > > Can someone provide me with some intuition as to why there are so many > > writes that reach the disk, and why they take so long. I would expect > > that the operations shouldn't take much longer than a seek (on the order > > of 10ms, not 200+)? > > I suspect the reason is that the journal is getting full and jbd is > doing a full journal checkpoint because it has run out of space for > new transactions. This is because using external EA blocks consume > a lot of space (4kB) regardless of how small the EA is, and this can > eat up the journal quickly. 54247 * 4kB = 211MB, much larger than > the default 32MB (or maybe 128MB with newer e2fsprogs) journal size. > > Solutions to your specific problem are to use large inodes and the > fast EA space ("mke2fs -j -I 256 ..." makes 256-byte inodes, 128 bytes > left for EAs) and/or increasing the journal size ("mke2fs -J size=400", > though even 400MB won't be enough for this test case). Increasing the inode size to 256 bytes made a huge difference under Ext3. The spikes that I mentioned for Ext2 also existed in Ext3, and were eliminated by this change. My application's performance increased by about 40%, and the standard deviations dropped from around 20% to 4%. However, for Ext2 it made very little difference. I still have a handful of operations (.05%) that account for 73% of the time. I know that Ext2 is optimized for shared attribute blocks (for the case of ACLs). Is there something about having lots of unique attributes that results in poor performance? > We implemented the large inodes + fast EAs (included in 2.6.12+ kernels) > to avoid the need to do any seeking when reading/writing EAs, in addition > to the benefit of not writing so much data (mostly unused) to disk. > This showed a huge performance increase for Lustre metadata servers > (which use EAs on every file) and also with Samba4 testing. I can see why, especially on a journalled file system. Thanks, Charles _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users