On Sat, Feb 2, 2013, at 12:55 AM, Theodore Ts'o wrote: > On Fri, Feb 01, 2013 at 10:33:21PM +1100, Bron Gondwana wrote: > > > > In particular, the way that Cyrus works seems entirely suboptimal for ext4. > > The index and database files receive very small appends (108 byte per message > > for the index, and probably just a few hundred per write for most of the the > > twoskip databases), and they happen pretty much randomly to one of tens of > > thousands of these little files, depending which mailbox received the message. > > Are all of these files in a single directory? If so, that's part of > the problem, since ext[34] uses the directory structure to try to > spread apart unrelated files, so that hueristic can't be easily used > if all of the files are in a single directory. No, but the vast majority of them are 2-3 files per directory which will be appended to at the same time, so they probably interleave :( > > Here's the same experiment on a "fresh" filesystem. I created this by taking > > a server down, copying the entire contents of the SSD to a spare piece of rust, > > reformatting, and copying it all back (cp -a). So the data on there is the > > same, just the allocations have changed. > > > > [brong@imap15 conf]$ fallocate -l 20m testfile > > [brong@imap15 conf]$ filefrag -v testfile > > Filesystem type is: ef53 > > File size of testfile is 20971520 (20480 blocks, blocksize 1024) > > ext logical physical expected length flags > > 0 0 22913025 8182 unwritten > > 1 8182 22921217 22921207 8182 unwritten > > 2 16364 22929409 22929399 4116 unwritten,eof > > testfile: 3 extents found > > > > As you can see, that's slightly more optimal. I'm assuming 8182 is the > > maximum number of contiguous blocks before you hit an assigned metadata > > location and have to skip over it. > > Is there a reason why you are using a 1k block size? The size of a > block group is 8192 blocks for 1k blocks (or 8 megabytes), while with > a 4k block size, the size of a block group is 32768 blocks (or 128 > megabytes). In general the ext4 file system is going to be far more > efficient with a 4k block size. Mostly because a lot of our files are quite small. Here's a set of file sizes and counts for that filesystem. 72055 zero 501435 <=512 32004 <=1k 46447 <=4k 38411 <=16k 49435 >16k As you can see, the vast majority are significantly less than 1k in size, so a 4k block size would add significant space overhead. Basically, we wouldn't be able to fit everything on there. There are plans afoot to merge most of those smaller files into a single larger per-user file, which should help eventually. Meanwhile, this is what we have. We were actually considering 1k block size for our email spools as well, which are currently 4k block size, because most emails are smaller than 4k as well, so we would reduce the space wastage there. Bron. -- Bron Gondwana brong@xxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html