Re: SMR disks go 100% busy after ~15 minutes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we are using SMR disks for backup purposes in our Ceph cluster.
We have had massive problems with those disks prior to upgrading to Kernel 4.9.x. We also dropped XFS as filesystem and we now use btrfs (only for those disks).
Since we did this we don't have such problems anymore.

If you don't like btrfs you could try to use a journal disk for XFS itself and also a journal disk for Ceph. I assume this will also solve many problems as the XFS journal is rewritten often and SMR disks don't like rewrites.
I think that is one reason why btrfs works smoother with those disks.

Hope this helps

Bernhard

Wido den Hollander <wido@xxxxxxxx> schrieb am Mo., 13. Feb. 2017 um 16:11 Uhr:

> Op 13 februari 2017 om 15:57 schreef Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx>:
>
>
> Then you're not aware of what the SMR disks do. They are just slow for
> all writes, having to read the tracks around, then write it all again
> instead of just the one thing you really wanted to write, due to
> overlap. Then to partially mitigate this, they have some tiny write
> buffer like 8GB flash, and then they use that for the "normal" speed,
> and then when it's full, you crawl (at least this is what the seagate
> ones do). Journals aren't designed to solve that... they help prevent
> the sync load on the osd, but don't somehow make the throughput higher
> (at least not sustained). Even if the journal was perfectly designed for
> performance, it would still do absolutely nothing if it's full and the
> disk is still busy with the old flushing.
>

Well, that explains indeed. I wasn't aware of the additional buffer inside a SMR disk.

I was asked to look at this system for somebody who bought SMR disks without knowing. As I never touch these disks I found the behavior odd.

The buffer explains it a lot better, wasn't aware that SMR disks have that.

SMR shouldn't be used in Ceph without proper support in Bluestore or XFS aware SMR.

Wido

>
> On 02/13/17 15:49, Wido den Hollander wrote:
> > Hi,
> >
> > I have a odd case with SMR disks in a Ceph cluster. Before I continue, yes, I am fully aware of SMR and Ceph not playing along well, but there is something happening which I'm not able to fully explain.
> >
> > On a 2x replica cluster with 8TB Seagate SMR disks I can write with about 30MB/sec to each disk using a simple RADOS bench:
> >
> > $ rados bench -t 1
> > $ time rados put 1GB.bin
> >
> > Both ways I found out that the disk can write at that rate.
> >
> > Now, when I start a benchmark with 32 threads it writes fine. Not super fast, but it works.
> >
> > After 15 minutes or so various disks go to 100% busy and just stay there. These OSDs are being marked as down and some even commit suicide due to threads timing out.
> >
> > Stopping the RADOS bench and starting the OSDs again resolves the situation.
> >
> > I am trying to explain what's happening. I'm aware that SMR isn't very good at Random Writes. To partially overcome this there are Intel DC 3510s in there as Journal SSDs.
> >
> > Can anybody explain why this 100% busy pops up after 15 minutes or so?
> >
> > Obviously it would the best if BlueStore had SMR support, but for now it's just Filestore with XFS on there.
> >
> > Wido
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Freundliche Grüße

Bernhard J. M. Grün, Püttlingen, Deutschland
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux