Re: BlueStore fragmentation woes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Hector,

Not related to fragmentation. But I see you mentioned CephFS, and your OSDs are at high utilization. Is your pool NEAR FULL? CephFS write performance is severely degraded if the pool is NEAR FULL. Buffered write will be disabled, and every single write() system call needs to wait for reply from OSD.

If this is the case, use “ceph osd set-nearfull-ratio” to get normal performance.

Weiwen Hu

> 在 2023年5月24日,20:19,Hector Martin <marcan@xxxxxxxxx> 写道:
> 
> Hi,
> 
> I've been seeing relatively large fragmentation numbers on all my OSDs:
> 
> ceph daemon osd.13 bluestore allocator score block
> {
>    "fragmentation_rating": 0.77251526920454427
> }
> 
> These aren't that old, as I recreated them all around July last year.
> They mostly hold CephFS data with erasure coding, with a mix of large
> and small files. The OSDs are at around 80%-85% utilization right now.
> Most of the data was written sequentially when the OSDs were created (I
> rsynced everything from a remote backup). Since then more data has been
> added, but not particularly quickly.
> 
> At some point I noticed pathologically slow writes, and I couldn't
> figure out what was wrong. Eventually I did some block tracing and
> noticed the I/Os were very small, even though CephFS-side I was just
> writing one large file sequentially, and that's when I stumbled upon the
> free space fragmentation problem. Indeed, deleting some large files
> opened up some larger free extents and resolved the problem, but only
> until those get filled up and I'm back to fragmented tiny extents. So
> effectively I'm stuck at the current utilization, as trying to fill them
> up any more just slows down to an absolute crawl.
> 
> I'm adding a few more OSDs and plan on doing the dance of removing one
> OSD at a time and replacing it with another one to hopefully improve the
> situation, but obviously this is going to take forever.
> 
> Is there any plan for offering a defrag tool of some sort for bluestore?
> 
> - Hector
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux