fragmentation optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ted, Everyone,

During our last discussions you mentioned the following (2017/08/16 5:06 SAST/GMT+2):

"One other thought.  There is an ext4 block allocator optimization
"feature" which is biting us here.  At the moment we have an
optimization where if there is small "hole" in the logical block
number space, we leave a "hole" in the physical blocks allocated to
the file."

You proceeded to provide the example regarding writing of object files as per binutils (ld specifically).

As per the data I provided you previously rsync (with --sparse) is generating a lot of "holes" for us due to this. As a result I end up with a rather insane amount of fragmentation:

Blocksize: 4096 bytes
Total blocks: 13153337344
Free blocks: 1272662587 (9.7%)

Min. free extent: 4 KB
Max. free extent: 17304 KB
Avg. free extent: 44 KB
Num. free extent: 68868260

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
    4K...    8K-  :      28472490      28472490    2.24%
    8K...   16K-  :      27005860      55030426    4.32%
   16K...   32K-  :       2595993      14333888    1.13%
   32K...   64K-  :       2888720      32441623    2.55%
   64K...  128K-  :       2745121      62071861    4.88%
  128K...  256K-  :       2303439     103166554    8.11%
  256K...  512K-  :       1518463     134776388   10.59%
  512K... 1024K-  :        902691     163108612   12.82%
    1M...    2M-  :        314858     105445496    8.29%
    2M...    4M-  :         97174      64620009    5.08%
    4M...    8M-  :         22501      28760501    2.26%
    8M...   16M-  :           945       2069807    0.16%
   16M...   32M-  :             5         21155    0.00%

Based on the behavior I notice by watching how rsync works[1] I greatly suspect that writes are sequential from start of file to end of file. Regarding the above "feature" you further proceeded to mention:

"However, it obviously doesn't do the right thing for rsync --sparse,
and these days, thanks to delayed allocation, so long as binutils can
finish writing the blocks within 30 seconds, it doesn't matter if GNU
ld writes the blocks in a completely random order, since we will only
attempt to do the writeback to the disk after all of the holes in the
.o file have been filled in.  So perhaps we should turn off this ext4
block allocator optimization if delayed allocation is enabled (which
is the default these days)."

You mentioned a few pros and cons of this approach as well, and also mentioned that it won't help my existing filesystem, however, I suspect it might in combination with a e4defrag sweep (which if it takes a few weeks in the background that's fine by me). Also, I suspect disabling this might help avoid future holes, and since persistence of files varies (from a week to a year) I suspect it may help to over time slowly improve performance.

I'm also relatively comfortable to make the 30s write limit even longer (as you pointed out the files causing the problems are typically 300GB+ even though on average my files are very small), permitting that I won't introduce additional file-system corruption risk. Also keeping in mind that I run anything from 10 to 20 concurrent rsync instances at any point in time.

I would like to attempt such a patch, so if you (or someone else) could possibly point me in an appropriate direction of where to start work on this I would really appreciate the help.

Another approach for me may be to simply switch off --sparse since especially now I'm unsure of it's benefit. I'm guessing that I could do a sweep of all inodes to determine how much space is really being saved by this.

Kind Regards,
Jaco

[1] My observed behaviour when syncing a file (without --inplace which is in my opinion a bad idea in general unless you're severely space constrained, and then I honestly don't know how this situation would be affected) is that rsync will create a new file, and then the file size of this file will grow slowly (not, not disk usage, but size as reported by ls) until it reaches the file size of the new file, and at this point rsync will use rename(2) to replace the old file with the new one (which is the right approach).





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux