Re: SMR disks go 100% busy after ~15 minutes

Wido den Hollander <wido@xxxxxxxx> · Mon, 13 Feb 2017 17:47:08 +0100 (CET)

> Op 13 februari 2017 om 16:49 schreef "Bernhard J. M. Grün" <bernhard.gruen@xxxxxxxxx>:
> 
> 
> Hi,
> 
> we are using SMR disks for backup purposes in our Ceph cluster.
> We have had massive problems with those disks prior to upgrading to Kernel
> 4.9.x. We also dropped XFS as filesystem and we now use btrfs (only for
> those disks).
> Since we did this we don't have such problems anymore.
> 

We have kernel 4.9 there, but XFS is not SMR-aware so it doesn't help.

I saw posts that some XFS work is on it's way, but it's not being actively developed. What I saw however is that you need to issue some flags on mkfs.

Did you need to do that when formatting btrfs on the SMR disks?

Wido

> If you don't like btrfs you could try to use a journal disk for XFS itself
> and also a journal disk for Ceph. I assume this will also solve many
> problems as the XFS journal is rewritten often and SMR disks don't like
> rewrites.
> I think that is one reason why btrfs works smoother with those disks.
> 
> Hope this helps
> 
> Bernhard
> 
> Wido den Hollander <wido@xxxxxxxx> schrieb am Mo., 13. Feb. 2017 um
> 16:11 Uhr:
> 
> >
> > > Op 13 februari 2017 om 15:57 schreef Peter Maloney <
> > peter.maloney@xxxxxxxxxxxxxxxxxxxx>:
> > >
> > >
> > > Then you're not aware of what the SMR disks do. They are just slow for
> > > all writes, having to read the tracks around, then write it all again
> > > instead of just the one thing you really wanted to write, due to
> > > overlap. Then to partially mitigate this, they have some tiny write
> > > buffer like 8GB flash, and then they use that for the "normal" speed,
> > > and then when it's full, you crawl (at least this is what the seagate
> > > ones do). Journals aren't designed to solve that... they help prevent
> > > the sync load on the osd, but don't somehow make the throughput higher
> > > (at least not sustained). Even if the journal was perfectly designed for
> > > performance, it would still do absolutely nothing if it's full and the
> > > disk is still busy with the old flushing.
> > >
> >
> > Well, that explains indeed. I wasn't aware of the additional buffer inside
> > a SMR disk.
> >
> > I was asked to look at this system for somebody who bought SMR disks
> > without knowing. As I never touch these disks I found the behavior odd.
> >
> > The buffer explains it a lot better, wasn't aware that SMR disks have that.
> >
> > SMR shouldn't be used in Ceph without proper support in Bluestore or XFS
> > aware SMR.
> >
> > Wido
> >
> > >
> > > On 02/13/17 15:49, Wido den Hollander wrote:
> > > > Hi,
> > > >
> > > > I have a odd case with SMR disks in a Ceph cluster. Before I continue,
> > yes, I am fully aware of SMR and Ceph not playing along well, but there is
> > something happening which I'm not able to fully explain.
> > > >
> > > > On a 2x replica cluster with 8TB Seagate SMR disks I can write with
> > about 30MB/sec to each disk using a simple RADOS bench:
> > > >
> > > > $ rados bench -t 1
> > > > $ time rados put 1GB.bin
> > > >
> > > > Both ways I found out that the disk can write at that rate.
> > > >
> > > > Now, when I start a benchmark with 32 threads it writes fine. Not
> > super fast, but it works.
> > > >
> > > > After 15 minutes or so various disks go to 100% busy and just stay
> > there. These OSDs are being marked as down and some even commit suicide due
> > to threads timing out.
> > > >
> > > > Stopping the RADOS bench and starting the OSDs again resolves the
> > situation.
> > > >
> > > > I am trying to explain what's happening. I'm aware that SMR isn't very
> > good at Random Writes. To partially overcome this there are Intel DC 3510s
> > in there as Journal SSDs.
> > > >
> > > > Can anybody explain why this 100% busy pops up after 15 minutes or so?
> > > >
> > > > Obviously it would the best if BlueStore had SMR support, but for now
> > it's just Filestore with XFS on there.
> > > >
> > > > Wido
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> -- 
> Freundliche Grüße
> 
> Bernhard J. M. Grün, Püttlingen, Deutschland
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com