On 05/04/15 01:34, Sage Weil wrote: > On Mon, 4 May 2015, Lionel Bouton wrote: >> Hi, we began testing one Btrfs OSD volume last week and for this >> first test we disabled autodefrag and began to launch manual btrfs fi >> defrag. During the tests, I monitored the number of extents of the >> journal (10GB) and it went through the roof (it currently sits at >> 8000+ extents for example). I was tempted to defragment it but after >> thinking a bit about it I think it might not be a good idea. With >> Btrfs, by default the data written to the journal on disk isn't >> copied to its final destination. Ceph is using a clone_range feature >> to reference the same data instead of copying it. > We've discussed this possibility but have never implemented it. The > data is written twice: once to the journal and once to the object file. That's odd. Here's an extract of filefrag output: Filesystem type is: 9123683e File size of /var/lib/ceph/osd/ceph-17/journal is 10485760000 (2560000 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 0: 155073097.. 155073097: 1: 1: 1.. 1254: 155068587.. 155069840: 1254: 155073098: shared 2: 1255.. 2296: 155071149.. 155072190: 1042: 155069841: shared 3: 2297.. 2344: 148124256.. 148124303: 48: 155072191: shared 4: 2345.. 4396: 148129654.. 148131705: 2052: 148124304: shared 5: 4397.. 6446: 148137117.. 148139166: 2050: 148131706: shared 6: 6447.. 6451: 150414237.. 150414241: 5: 148139167: shared 7: 6452.. 10552: 150432040.. 150436140: 4101: 150414242: shared 8: 10553.. 12603: 150477824.. 150479874: 2051: 150436141: shared Almost all extents of the journal are shared with another file (on one occasion I've found 3 consecutive extents without the shared flag). I've thought that it could be shared by a copy in a snapshot but the snapshots are of the "current" subvolume. Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com