2015-05-06 20:51 GMT+03:00 Lionel Bouton <lionel+ceph@xxxxxxxxxxx>: > On 05/05/15 02:24, Lionel Bouton wrote: >> On 05/04/15 01:34, Sage Weil wrote: >>> On Mon, 4 May 2015, Lionel Bouton wrote: >>>> Hi, >>>> >>>> we began testing one Btrfs OSD volume last week and for this first test >>>> we disabled autodefrag and began to launch manual btrfs fi defrag. >>>> [...] >>> Cool.. let us know how things look after it ages! >> [...] >> >> >> It worked for the past day. Before the algorithm change the Btrfs OSD >> disk was the slowest on the system compared to the three XFS ones by a >> large margin. This was confirmed both by iostat %util (often at 90-100%) >> and monitoring the disk average read/write latencies over time which >> often spiked one order of magnitude above the other disks (as high as 3 >> seconds). Now the Btrfs OSD disk is at least comparable to the other >> disks if not a bit faster (comparing latencies). >> >> This is still too early to tell, but very encouraging. > > Still going well, I added two new OSDs which are behaving correctly too. > > The first of the two has finished catching up. There's a big difference > in the number of extents on XFS and on Btrfs. I've seen files backing > rbd (4MB files with rbd in their names) often have only 1 or 2 extents > on XFS. > On Btrfs they seem to start at 32 extents when they are created and > Btrfs doesn't seem to mind (ie: calling btrfs fi defrag <file> doesn't > reduce the number of extents, at least not in the following 30s where it > should go down). The extents aren't far from each other on disk though, > at least initially. > > When my simple algorithm computes the fragmentation cost (the expected > overhead of reading a file vs its optimized version), it seems that just > after finishing catching up (between 3 hours and 1 day depending on the > cluster load and settings), the content is already heavily fragmented > (files are expected to take more than 6x time the read delay than > optimized versions would). Then my defragmentation scheduler manages to > bring down the maximum fragmentation cost (according to its own > definition) by a factor of 0.66 (the very first OSD volume is currently > sitting at a ~4x cost and occasionally reaches the 3.25-3.5 range). > > Is there something that would explain why initially Btrfs creates the > 4MB files with 128k extents (32 extents / file) ? Is it a bad thing for > performance ? This kind of behaviour is a reason why i ask you about compression. "You can use filefrag to locate heavily fragmented files (may not work correctly with compression)." https://btrfs.wiki.kernel.org/index.php/Gotchas Filefrag show each compressed chunk as separated extents, but he can be located linear. This is a problem in file frag =\ > During normal operation Btrfs OSD volumes continue to behave in the same > way XFS ones do on the same system (sometimes faster/sometimes slower). > What is really slow though it the OSD process startup. I've yet to make > serious tests (umounting the filesystems to clear caches), but I've > already seen 3 minutes of delay reading the pgs. Example : > > 2015-05-05 16:01:24.854504 7f57c518b780 0 osd.17 22428 load_pgs > 2015-05-05 16:01:24.936111 7f57ae7fc700 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint: > ioctl SNAP_DESTROY got (2) No such file or directory > 2015-05-05 16:01:24.936137 7f57ae7fc700 -1 > filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap > 'snap_1671188' got (2) No such file or directory > 2015-05-05 16:01:24.991629 7f57ae7fc700 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint: > ioctl SNAP_DESTROY got (2) No such file or directory > 2015-05-05 16:01:24.991654 7f57ae7fc700 -1 > filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap > 'snap_1671189' got (2) No such file or directory > 2015-05-05 16:04:25.413110 7f57c518b780 0 osd.17 22428 load_pgs opened > 160 pgs > > The filesystem might not have reached its balance between fragmentation > and defragmentation rate at this time (so this may change) but mirrors > our initial experience with Btrfs where this was the first symptom of > bad performance. > > Best regards, > > Lionel > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Have a nice day, Timofey. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com