On Mon, 25 Mar 2013, Theodore Ts'o wrote: > Date: Mon, 25 Mar 2013 08:53:09 -0400 > From: Theodore Ts'o <tytso@xxxxxxx> > To: Lukáš Czerner <lczerner@xxxxxxxxxx> > Cc: linux-ext4@xxxxxxxxxxxxxxx, gharm@xxxxxxxxxx > Subject: Re: [PATCH] ext4: Do not normalize request from fallocate > > On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote: > > > > Sorry for being dense, but I am trying to understand why this is so > > bad and what is the "expected" column there. > > > > The physical offset of each extent bellow starts on the start of the > > block group and it seems to me that it's perfectly aligned for every > > power of two up to the block group size. > > Yes, but the logical offset isn't aligned. Consider the simplest > workload, which is where we are writing the 1GB file sequentially. > Let's assume that the raid stripe size is 8M. So ideally, we would > want each write to be a multiple of 8M, starting at logical block 0. > > But look what happens here: > > > > File size of 1 is 1073741824 (262144 blocks of 4096 bytes) > > > ext: logical_offset: physical_offset: length: expected: flags: > > > 0: 0.. 32766: 458752.. 491518: 32767: unwritten > > > 1: 32767.. 65533: 491520.. 524286: 32767: 491519: unwritten > > > 2: 65534.. 98300: 589824.. 622590: 32767: 524287: unwritten > > If we do 8M writes, then we would want to write in chunks of 2048 > blocks. So consider what happens when we write the 2048 block chunk > starting with logical block 30720. The fact that there is a > discontinuity between logical blocks 32766 and 32767 means that we > will have to do a read-modify-write cycle for that particular RAID > stripe. > > Does that make more sense? Oh, now I get it :) Thanks a lot for explanation I kept thinking about the physical layout and forgot that the logical is actually misaligned. > > Another reason why keeping the file as physically contiguous as > possible is because we can now extent caching using the extent status > tree. So if we can allocate the file using 2 physically contiguous > extents in instead of 9 or 10 physically contiguous extents, it means > the extent status tree uses less memory, too. For a 1GB file, that > might not make that much difference, but if we caching 2048 of these > 1G files (on a 2TB disk, for example), keeping the files as physically > contiguous as possible means we can cache the logical to physical > block mapping of all of these files much more easily. Yes, that makes sense too. > > Regards, > > - Ted > Thanks! -Lukas