Re: Btrfs defragmentation

Timofey Titovets <nefelim4ag@xxxxxxxxx> · Wed, 6 May 2015 21:07:44 +0300

2015-05-06 20:51 GMT+03:00 Lionel Bouton <lionel+ceph@xxxxxxxxxxx>:
> On 05/05/15 02:24, Lionel Bouton wrote:
>> On 05/04/15 01:34, Sage Weil wrote:
>>> On Mon, 4 May 2015, Lionel Bouton wrote:
>>>> Hi,
>>>>
>>>> we began testing one Btrfs OSD volume last week and for this first test
>>>> we disabled autodefrag and began to launch manual btrfs fi defrag.
>>>> [...]
>>> Cool.. let us know how things look after it ages!
>> [...]
>>
>>
>> It worked for the past day. Before the algorithm change the Btrfs OSD
>> disk was the slowest on the system compared to the three XFS ones by a
>> large margin. This was confirmed both by iostat %util (often at 90-100%)
>> and monitoring the disk average read/write latencies over time which
>> often spiked one order of magnitude above the other disks (as high as 3
>> seconds). Now the Btrfs OSD disk is at least comparable to the other
>> disks if not a bit faster (comparing latencies).
>>
>> This is still too early to tell, but very encouraging.
>
> Still going well, I added two new OSDs which are behaving correctly too.
>
> The first of the two has finished catching up. There's a big difference
> in the number of extents on XFS and on Btrfs. I've seen files backing
> rbd (4MB files with rbd in their names) often have only 1 or 2 extents
> on XFS.
> On Btrfs they seem to start at 32 extents when they are created and
> Btrfs doesn't seem to mind (ie: calling btrfs fi defrag <file> doesn't
> reduce the number of extents, at least not in the following 30s where it
> should go down). The extents aren't far from each other on disk though,
> at least initially.
>
> When my simple algorithm computes the fragmentation cost (the expected
> overhead of reading a file vs its optimized version), it seems that just
> after finishing catching up (between 3 hours and 1 day depending on the
> cluster load and settings), the content is already heavily fragmented
> (files are expected to take more than 6x time the read delay than
> optimized versions would). Then my defragmentation scheduler manages to
> bring down the maximum fragmentation cost (according to its own
> definition) by a factor of 0.66 (the very first OSD volume is currently
> sitting at a ~4x cost and occasionally reaches the 3.25-3.5 range).
>
> Is there something that would explain why initially Btrfs creates the
> 4MB files with 128k extents (32 extents / file) ? Is it a bad thing for
> performance ?

This kind of behaviour is a reason why i ask you about compression.
"You can use filefrag to locate heavily fragmented files (may not work
correctly with compression)."
https://btrfs.wiki.kernel.org/index.php/Gotchas

Filefrag show each compressed chunk as separated extents, but he can
be located linear. This is a problem in file frag =\

> During normal operation Btrfs OSD volumes continue to behave in the same
> way XFS ones do on the same system (sometimes faster/sometimes slower).
> What is really slow though it the OSD process startup. I've yet to make
> serious tests (umounting the filesystems to clear caches), but I've
> already seen 3 minutes of delay reading the pgs. Example :
>
> 2015-05-05 16:01:24.854504 7f57c518b780  0 osd.17 22428 load_pgs
> 2015-05-05 16:01:24.936111 7f57ae7fc700  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint:
> ioctl SNAP_DESTROY got (2) No such file or directory
> 2015-05-05 16:01:24.936137 7f57ae7fc700 -1
> filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap
> 'snap_1671188' got (2) No such file or directory
> 2015-05-05 16:01:24.991629 7f57ae7fc700  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint:
> ioctl SNAP_DESTROY got (2) No such file or directory
> 2015-05-05 16:01:24.991654 7f57ae7fc700 -1
> filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap
> 'snap_1671189' got (2) No such file or directory
> 2015-05-05 16:04:25.413110 7f57c518b780  0 osd.17 22428 load_pgs opened
> 160 pgs
>
> The filesystem might not have reached its balance between fragmentation
> and defragmentation rate at this time (so this may change) but mirrors
> our initial experience with Btrfs where this was the first symptom of
> bad performance.
>
> Best regards,
>
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Have a nice day,
Timofey.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com