Re: Btrfs defragmentation

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Mon, 04 May 2015 02:33:30 +0200

On 05/04/15 01:34, Sage Weil wrote:
> On Mon, 4 May 2015, Lionel Bouton wrote:
>> Hi, we began testing one Btrfs OSD volume last week and for this
>> first test we disabled autodefrag and began to launch manual btrfs fi
>> defrag. During the tests, I monitored the number of extents of the
>> journal (10GB) and it went through the roof (it currently sits at
>> 8000+ extents for example). I was tempted to defragment it but after
>> thinking a bit about it I think it might not be a good idea. With
>> Btrfs, by default the data written to the journal on disk isn't
>> copied to its final destination. Ceph is using a clone_range feature
>> to reference the same data instead of copying it. 
> We've discussed this possibility but have never implemented it. The
> data is written twice: once to the journal and once to the object file.

That's odd. Here's an extract of filefrag output:

Filesystem type is: 9123683e
File size of /var/lib/ceph/osd/ceph-17/journal is 10485760000 (2560000
blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:  155073097.. 155073097:      1:           
   1:        1..    1254:  155068587.. 155069840:   1254:  155073098: shared
   2:     1255..    2296:  155071149.. 155072190:   1042:  155069841: shared
   3:     2297..    2344:  148124256.. 148124303:     48:  155072191: shared
   4:     2345..    4396:  148129654.. 148131705:   2052:  148124304: shared
   5:     4397..    6446:  148137117.. 148139166:   2050:  148131706: shared
   6:     6447..    6451:  150414237.. 150414241:      5:  148139167: shared
   7:     6452..   10552:  150432040.. 150436140:   4101:  150414242: shared
   8:    10553..   12603:  150477824.. 150479874:   2051:  150436141: shared

Almost all extents of the journal are shared with another file (on one
occasion I've found 3 consecutive extents without the shared flag). I've
thought that it could be shared by a copy in a snapshot but the
snapshots are of the "current" subvolume.

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com