Re: [PATCH] ext4: Do not normalize request from fallocate

Lukáš Czerner <lczerner@xxxxxxxxxx> · Mon, 25 Mar 2013 14:26:54 +0100 (CET)

On Mon, 25 Mar 2013, Theodore Ts'o wrote:

> Date: Mon, 25 Mar 2013 08:53:09 -0400
> From: Theodore Ts'o <tytso@xxxxxxx>
> To: Lukáš Czerner <lczerner@xxxxxxxxxx>
> Cc: linux-ext4@xxxxxxxxxxxxxxx, gharm@xxxxxxxxxx
> Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
> 
> On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote:
> > 
> > Sorry for being dense, but I am trying to understand why this is so
> > bad and what is the "expected" column there.
> > 
> > The physical offset of each extent bellow starts on the start of the
> > block group and it seems to me that it's perfectly aligned for every
> > power of two up to the block group size.
> 
> Yes, but the logical offset isn't aligned.  Consider the simplest
> workload, which is where we are writing the 1GB file sequentially.
> Let's assume that the raid stripe size is 8M.  So ideally, we would
> want each write to be a multiple of 8M, starting at logical block 0.
> 
> But look what happens here:
> 
> > > File size of 1 is 1073741824 (262144 blocks of 4096 bytes)
> > >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> > >    0:        0..   32766:     458752..    491518:  32767:             unwritten
> > >    1:    32767..   65533:     491520..    524286:  32767:     491519: unwritten
> > >    2:    65534..   98300:     589824..    622590:  32767:     524287: unwritten
> 
> If we do 8M writes, then we would want to write in chunks of 2048
> blocks.  So consider what happens when we write the 2048 block chunk
> starting with logical block 30720.  The fact that there is a
> discontinuity between logical blocks 32766 and 32767 means that we
> will have to do a read-modify-write cycle for that particular RAID
> stripe.
> 
> Does that make more sense?

Oh, now I get it :) Thanks a lot for explanation I kept thinking
about the physical layout and forgot that the logical is actually
misaligned.

> 
> Another reason why keeping the file as physically contiguous as
> possible is because we can now extent caching using the extent status
> tree.  So if we can allocate the file using 2 physically contiguous
> extents in instead of 9 or 10 physically contiguous extents, it means
> the extent status tree uses less memory, too.  For a 1GB file, that
> might not make that much difference, but if we caching 2048 of these
> 1G files (on a 2TB disk, for example), keeping the files as physically
> contiguous as possible means we can cache the logical to physical
> block mapping of all of these files much more easily.

Yes, that makes sense too.

> 
> Regards,
> 
> 						- Ted
> 

Thanks!
-Lukas