Re: [PATCH 3/3] mke2fs: document bigalloc and cluster-size

"Theodore Ts'o" <tytso@xxxxxxx> · Tue, 15 Jan 2013 17:28:24 -0500

On Tue, Jan 15, 2013 at 03:38:47PM -0500, Phillip Susi wrote:
> 
> If it is only to get around the mm pagesize limit, then why not just
> have the fs automatically lie to the kernel about the block size and
> shift the references back and forth on the fly when it detects a
> larger blocksize?

Because of the pain in dealing with how to handle random writes into a
sparse file.  We need to either track which blocks in the large block
have been initialized, or we would need to erase the entire large
block before writing the first page into the large block (and then you
still need to track whether or not you are writing that first or
subsequent page into a large block).

What we're doing with bigalloc is effectively tracking which blocks in
the cluster have been initialized by using entries in the extent tree,
since entries to the allocation bitmaps is in units of clusters, but
entries in the extent tree is in units of blocks.

Looking back at how complicated it has been to get delalloc right, it
may have been the case that just using a brute-force sb_issue_zeroout
when the block is freshly allocated, unless the arguments to the
request to ext4_writepages() exactly covered the large block might
have been simpler.  Getting the Direct I/O path right would have been
messy, but perhaps it would have been less work in the end.

       	   	      	    	      - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html