I had a recent battle with performance on two of our nodes and it turned out to be a result of using non-raid mode. We ended up rebuilding them one by one in raid-0 with controller cache enabled on the OSD disks. I discussed it on the mailing list: https://www.spinics.net/lists/ceph-users/msg42756.html. The r730 controller has a battery so I don't think there's a reason to be concerned about moving to raid-0 w/cache.
|
|||
|
|||
|
|||
|
On Thu, Mar 15, 2018 at 3:09 PM, Tim Bishop <tim-lists@xxxxxxxxxxx> wrote:
Thank you Christian, David, and Reed for your responses.
My servers have the Dell H730 RAID controller in them, but I have the
OSD disks in Non-RAID mode. When initially testing I compared single
RAID-0 containers with Non-RAID and the Non-RAID performance was
acceptable, so I opted for the configuration with less components
between Ceph and the disks. This seemed to be the safer approach at the
time.
What I obviously hadn't realised was that the drive caches were enabled.
Without those caches the difference is much greater, and the latency
is now becoming a problem.
My reading of the documentation led me to think along the lines
Christian mentions below - that is, that data in flight would be lost,
but that the disks should be consistent and still usable. But it would
be nice to get confirmation of whether that holds for Bluestore.
However, it looks like this wasn't the case for Reed, although perhaps
that was at an earlier time when Ceph and/or Linux didn't handle things
was well?
I had also thought that our power supply was pretty safe - redundant
PSUs with independent feeds, redundant UPSs, and a generator. But Reed's
experiences certainly highlight that even that can fail, so it was good
to hear that from someone else rather than experience it first hand.
I do have tape backups, but recovery would be a pain, so based on all
your comments I'll leave the drive caches off and look at using the RAID
controller cache with its BBU instead.
Tim.
Tim Bishop
On Thu, Mar 15, 2018 at 04:13:49PM +0900, Christian Balzer wrote:
> Hello,
>
> what has been said by others before is essentially true, as in if you want:
>
> - as much data conservation as possible and have
> - RAID controllers with decent amounts of cache and a BBU
>
> then disabling the on disk cache is the way to go.
>
> But as you found out, w/o those caches and a controller cache to replace
> them, performance will tank.
>
> And of course any data only in the pagecache (dirty) and not yet flushed
> to the controller/disks is lost anyway in a power failure.
>
> All current FS _should_ be powerfail safe (barriers) in the sense that you
> may loose the data in the disk caches (if properly exposed to the OS and
> the controller or disk not lying about having written data to disk) but
> the FS will be consistent and not "all will be lost".
>
> I'm hoping that this is true for Bluestore, but somebody needs to do that
> testing.
>
> So if you can live with the loss of the in-transit data in the disk caches
> in addition to the pagecache and/or you trust your DC never to loose
> power, go ahead and get re-enable the disk caches.
>
> If you have the money and need for a sound happy sleep, do the BBU
> controller cache dance.
> Some controllers (Areca comes to mind) actually manage to IT mode style
> exposure of the disks and still use their HW cache.
>
> Christian
--
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com