Re: Btrfs defragmentation

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 07 May 2015 12:30:36 +0200

Hi,

On 05/07/2015 12:04 PM, Lionel Bouton wrote:
On 05/06/15 19:51, Lionel Bouton wrote:
*snipsnap*
We've seen progress on this front. Unfortunately for us we had 2 power
outages and they seem to have damaged the disk controller of the system
we are testing Btrfs on: we just had a system crash.
On the positive side this gives us an update on the OSD boot time.

With a freshly booted system without anything in cache :
- the first Btrfs OSD we installed loaded the pgs in ~1mn30s which is
half of the previous time,
- the second Btrfs OSD where defragmentation was disabled for some time
and was considered more fragmented by our tool took nearly 10 minutes to
load its pgs (and even spent 1mn before starting to load them).
- the third Btrfs OSD which was always defragmented took 4mn30 seconds
to load its pgs (it was considered more fragmented than the first and
less than the second).

My current assumption is that the defragmentation process we use can't
handle large spikes of writes (at least when originally populating the
OSD with data through backfills) but then can repair the damage on
performance they cause at least partially (it's still slower to boot
than the 3 XFS OSDs on the same system where loading pgs took 6-9 seconds).
In the current setup the defragmentation is very slow to process because
I set it up to generate very little load on the filesystems it processes
: there may be room to improve.

Part of the OSD boot up process is also the handling of existing 
snapshots and journal replay. I've also had several btrfs based OSDs 
that took up to 20-30 minutes to start, especially after a crash. During 
journal replay the OSD daemon creates a number of new snapshot for its 
operations (newly created snap_XYZ directories that vanish after a short 
time). This snapshotting probably also adds overhead to the OSD startup 
time.
I have disabled snapshots in my setup now, since the stock ubuntu trusty 
kernel had some stability problems with btrfs.

I also had to establish cron jobs for rebalancing the btrfs partitions. 
It compacts the extents and may reduce the total amount of space taken. 
Unfortunately this procedure is not a default in most distribution (it 
definitely should be!). The problems associated with unbalanced extents 
should have been solved in kernel 3.18, but I didn't had the time to 
check it yet.

As a side note: I had several OSD with dangling snapshots (more than the 
two usually handled by the OSD). They are probably due to crashed OSD 
daemons. You have to remove the manually, otherwise they start to 
consume disk space.

Best regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com