Re: Btrfs defragmentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,
Excuse me, what I'm saying is off topic

@Lionel, if you use btrfs, did you already try to use btrfs compression for OSD?
If yes, сan you share the your experience?

2015-05-05 3:24 GMT+03:00 Lionel Bouton <lionel+ceph@xxxxxxxxxxx>:
> On 05/04/15 01:34, Sage Weil wrote:
>> On Mon, 4 May 2015, Lionel Bouton wrote:
>>> Hi,
>>>
>>> we began testing one Btrfs OSD volume last week and for this first test
>>> we disabled autodefrag and began to launch manual btrfs fi defrag.
>>> [...]
>> Cool.. let us know how things look after it ages!
>
> We had the first signs of Btrfs aging yesterday's morning. Latencies
> went up noticeably. The journal was at ~3000 extents back from a maximum
> of ~13000 the day before. To verify my assumption that journal
> fragmentation was not the cause of latencies, I defragmented it. It took
> more than 7 minutes (10GB journal), left it at ~2300 extents (probably
> because it was heavily used during the defragmentation) and the high
> latencies weren't solved at all.
>
> The initial algorithm selected files to defragment based solely on the
> number of extents (files with more extents were processed first). This
> was a simple approach to the problem that I hoped would be enough so I
> had to make it more clever.
>
> filefrag -v conveniently outputs each fragment relative position on the
> device and the total file size. So I changed the algorithm so that it
> can still use the result of a periodic find | xargs filefrag call (which
> is relatively cheap and ends up fitting in a <100MB Ruby process) but
> better model the fragmentation cost.
>
> The new one computes the total cost of reading every file, counting an
> initial seek, the total time based on sequential read speed and the time
> associated with each seek from one extent to the next (which can be 0
> when Btrfs managed to put an extent just after another, or very small if
> it is not far from the first on the same HDD track). This total cost is
> compared with the ideal defragmented case to know what the speedup could
> be after defragmentation. Finally the result is normalized by dividing
> it with the total size of each file. The normalization is done because
> in the case of RBD (and probably most other uses) what is interesting is
> how long a 128kB or 1MB read would take whatever the file and the offset
> in the file, not how long a whole file read would take (there's an
> assumption that each file as the same probability of being read which
> might need to be revisited). There are approximations in the cost
> computation and it's HDD centric but it's not very far from reality.
>
> The idea was that it would be able to find the files where fragmentation
> is the most painful faster instead of wasting time on less interesting
> files. This would make the defragmentation more efficient even if it
> didn't process as many files (the less defragmentation takes place the
> less load we add).
>
> It worked for the past day. Before the algorithm change the Btrfs OSD
> disk was the slowest on the system compared to the three XFS ones by a
> large margin. This was confirmed both by iostat %util (often at 90-100%)
> and monitoring the disk average read/write latencies over time which
> often spiked one order of magnitude above the other disks (as high as 3
> seconds). Now the Btrfs OSD disk is at least comparable to the other
> disks if not a bit faster (comparing latencies).
>
> This is still too early to tell, but very encouraging.
>
> Best regards,
>
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Have a nice day,
Timofey.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux