On Sun, Jun 25, 2006 at 03:00:53PM -0700, Valerie Henson wrote: > I foolishly signed up to give a talk at OSCON in about a month about > choosing and tuning Linux file systems for different workloads. I > have some ideas about which file system to use when, but I'd rather > get recommendations from the experts on each file system. Below is a > straw man outline of my current recommendations, please take a look > and comment. I will make a summary freely available when I'm done. > At long last, I'll have an easy answer when someone asks me, "But > which file system should I use?" Answer: "Go read this web page..." Here are some comments. > Choosing a file system > > Laptop: ext3 with noatime > General purpose server: ext3 or reiser > Lots of small files: reiser, ext2/3 with 1k blocks Small files usually implies lots of files in a directory, so be sure to use htree with ext3. > More than ~32,000 files in one directory: XFS or reiser Ext3 can easily have more than 32000 *files* in a directory. However, it can only have 32000 *subdirectories* in a directory. This limit is from struct ext3_inode->i_links_count, which is an __le16: each subdirectory has an entry ".." that links back to its parent increasing the parents i_links_count. > Fast lookups in large directories: XFS, reiser, ext3 with htree (?) > File size more than 2TB: XFS, reiser up to 8TB > File system size more than 2TB: XFS, reiser up to 16TB > Ease of data recovery after corruption: ext2, ext3 > > Tuning a file system > > Use "noatime" mount option Can also be combined with the "nodiratime" mount option. > - atime makes read workloads into random write workloads, yuck > - This is Ubuntu installation default > - I have a report that mutt doesn't work with this because atime is > never updated but mtime is, maybe some kind of lazy atime is better? It does indeed think that a mailbox always has new content. However, this is only with mbox style mailboxes, maildir or mh style mailboxes just work. > - Don't do if you want to e.g., track down hackers > > Choosing journaling mode in ext3 > - Default is "ordered", usually the right choice > - "journal" is slower but guarantees data is on-disk as well > - "writeback" is faster but may result in garbage/security leaks in > your file data > > Choosing block size > - You can do this at mkfs time > - tradeoff is space wasted vs. max file/fs size (other considerations?) > - limitation is system page size NTFS has support for block sizes larger than page size. There were some patches from Anton Altaparmakov to allow such block sizes, but IIRC they are NTFS-only and not made genericly available for all filesystems. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html