Btrfs defragmentation

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Mon, 04 May 2015 00:43:39 +0200

Hi,

we began testing one Btrfs OSD volume last week and for this first test
we disabled autodefrag and began to launch manual btrfs fi defrag.

During the tests, I monitored the number of extents of the journal
(10GB) and it went through the roof (it currently sits at 8000+ extents
for example).
I was tempted to defragment it but after thinking a bit about it I think
it might not be a good idea.
With Btrfs, by default the data written to the journal on disk isn't
copied to its final destination. Ceph is using a clone_range feature to
reference the same data instead of copying it.
So if you defragment both the journal and the final destination, you are
moving the data around to attempt to get both references to satisfy a
one extent goal but most of the time can't get both of them at the same
time (unless the destination is a whole file instead of a fragment of one).

I assume the journal probably doesn't benefit at all from
defragmentation: it's overwritten constantly and as Btrfs uses CoW, the
previous extents won't be reused at all and new ones will be created for
the new data instead of overwritting the old in place. The final
destination files are reused (reread) and benefit from defragmentation.

Under these assumptions we excluded the journal file from
defragmentation, in fact we only defragment the "current" directory
(snapshot directories are probably only read from in rare cases and are
ephemeral so optimizing them is not interesting).

The filesystem is only one week old so we will have to wait a bit to see
if this strategy is better than the one used when mounting with
autodefrag (I couldn't find much about it but last year we had
unmanageable latencies).
We have a small Ruby script which triggers defragmentation based on the
number of extents and by default limits the rate of calls to btrfs fi
defrag to a negligible level to avoid trashing the filesystem. If
someone is interested I can attach it or push it on Github after a bit
of cleanup.

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com