Re: SMR disks go 100% busy after ~15 minutes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Then you're not aware of what the SMR disks do. They are just slow for
all writes, having to read the tracks around, then write it all again
instead of just the one thing you really wanted to write, due to
overlap. Then to partially mitigate this, they have some tiny write
buffer like 8GB flash, and then they use that for the "normal" speed,
and then when it's full, you crawl (at least this is what the seagate
ones do). Journals aren't designed to solve that... they help prevent
the sync load on the osd, but don't somehow make the throughput higher
(at least not sustained). Even if the journal was perfectly designed for
performance, it would still do absolutely nothing if it's full and the
disk is still busy with the old flushing.


On 02/13/17 15:49, Wido den Hollander wrote:
> Hi,
>
> I have a odd case with SMR disks in a Ceph cluster. Before I continue, yes, I am fully aware of SMR and Ceph not playing along well, but there is something happening which I'm not able to fully explain.
>
> On a 2x replica cluster with 8TB Seagate SMR disks I can write with about 30MB/sec to each disk using a simple RADOS bench:
>
> $ rados bench -t 1
> $ time rados put 1GB.bin
>
> Both ways I found out that the disk can write at that rate.
>
> Now, when I start a benchmark with 32 threads it writes fine. Not super fast, but it works.
>
> After 15 minutes or so various disks go to 100% busy and just stay there. These OSDs are being marked as down and some even commit suicide due to threads timing out.
>
> Stopping the RADOS bench and starting the OSDs again resolves the situation.
>
> I am trying to explain what's happening. I'm aware that SMR isn't very good at Random Writes. To partially overcome this there are Intel DC 3510s in there as Journal SSDs.
>
> Can anybody explain why this 100% busy pops up after 15 minutes or so?
>
> Obviously it would the best if BlueStore had SMR support, but for now it's just Filestore with XFS on there.
>
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux