On 2011年11月01日 00:22, Ted Ts'o Wrote: > On Mon, Oct 31, 2011 at 10:08:20AM -0600, Andreas Dilger wrote: >> On 2011-10-31, at 4:22 AM, Theodore Tso <tytso@xxxxxxx> wrote: [snip] > I'm curious why TaoBao is so interested in changing the extent > encoding for bigalloc file systems. Currently we can support up to 1 > EB worth of physical block numbers, and 16TB of logical block numbers. > Are you concerned about bumping into the 1 EB file system limit? Or > the 16 TB file size limit? Or something else? > In some application, we allocate a big file which occupies most space of a file system, while the file system built on (expensive) SSD. In such configuration, we want less blocks allocated for inode table and bitmap. If the max extent length could be much big, there is chance to have much less block groups, which results more blocks for regular file. Current bigalloc code does well already, but there is still chance to do better. The sys-admin team believe cluster-based-extent can help Ext4 to consume as less meta data memory as raw disk does, and gain as more available data blocks as raw disks does, too. This is a small number on one single SSD, but in our cluster environment, this effort can help to save a recognized amount of capex. Further more, for HDFS with 128MB data block file, and the file system is formatted with 1MB cluster bigalloc. In worst case, only one extent block read is needed to access an 128MB data block file. (However, this case is about a chunk size more than 64K, not compulsory for cluster-based-extent) With inline-data and cluster-based-extent to bigalloc, we get more closed to the above goal. P.S. When I finish typing this email, I find Andreas also explain the similar reason in his email, much more simple and clear :-) -- Coly Li -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html