Re: Disk write cache - safe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I had a recent battle with performance on two of our nodes and it turned out to be a result of using non-raid mode. We ended up rebuilding them one by one in raid-0 with controller cache enabled on the OSD disks. I discussed it on the mailing list: https://www.spinics.net/lists/ceph-users/msg42756.html. The r730 controller has a battery so I don't think there's a reason to be concerned about moving to raid-0 w/cache.

 

John Petrini
Platforms Engineer
CoreDial 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

On Thu, Mar 15, 2018 at 3:09 PM, Tim Bishop <tim-lists@xxxxxxxxxxx> wrote:
Thank you Christian, David, and Reed for your responses.

My servers have the Dell H730 RAID controller in them, but I have the
OSD disks in Non-RAID mode. When initially testing I compared single
RAID-0 containers with Non-RAID and the Non-RAID performance was
acceptable, so I opted for the configuration with less components
between Ceph and the disks. This seemed to be the safer approach at the
time.

What I obviously hadn't realised was that the drive caches were enabled.
Without those caches the difference is much greater, and the latency
is now becoming a problem.

My reading of the documentation led me to think along the lines
Christian mentions below - that is, that data in flight would be lost,
but that the disks should be consistent and still usable. But it would
be nice to get confirmation of whether that holds for Bluestore.
However, it looks like this wasn't the case for Reed, although perhaps
that was at an earlier time when Ceph and/or Linux didn't handle things
was well?

I had also thought that our power supply was pretty safe - redundant
PSUs with independent feeds, redundant UPSs, and a generator. But Reed's
experiences certainly highlight that even that can fail, so it was good
to hear that from someone else rather than experience it first hand.

I do have tape backups, but recovery would be a pain, so based on all
your comments I'll leave the drive caches off and look at using the RAID
controller cache with its BBU instead.

Tim.

On Thu, Mar 15, 2018 at 04:13:49PM +0900, Christian Balzer wrote:
> Hello,
>
> what has been said by others before is essentially true, as in if you want:
>
> - as much data conservation as possible and have
> - RAID controllers with decent amounts of cache and a BBU
>
> then disabling the on disk cache is the way to go.
>
> But as you found out, w/o those caches and a controller cache to replace
> them, performance will tank.
>
> And of course any data only in the pagecache (dirty) and not yet flushed
> to the controller/disks is lost anyway in a power failure.
>
> All current FS _should_ be powerfail safe (barriers) in the sense that you
> may loose the data in the disk caches (if properly exposed to the OS and
> the controller or disk not lying about having written data to disk) but
> the FS will be consistent and not "all will be lost".
>
> I'm hoping that this is true for Bluestore, but somebody needs to do that
> testing.
>
> So if you can live with the loss of the in-transit data in the disk caches
> in addition to the pagecache and/or you trust your DC never to loose
> power, go ahead and get re-enable the disk caches.
>
> If you have the money and need for a sound happy sleep, do the BBU
> controller cache dance.
> Some controllers (Areca comes to mind) actually manage to IT mode style
> exposure of the disks and still use their HW cache.
>
> Christian

--
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux