Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

> I did a quick test with wcache off[1]. And have the impression the
> simple rados bench of 2 minutes performed a bit worse on my slow hdd's.

This probably depends on whether or not the drive actually has non-volatile write cache. I noticed that from many vendors you can buy the seemingly exact same drive for a difference of something like 20$. My best bet is, that the slightly more expensive ones have functioning power loss protection hardware that passed the quality test and is disabled in the cheaper drives (probably among other things). Going for the cheapest version all the time can have its price.

For the disks we are using, my impression is that disabling volatile write cache actually adds the volatile cache capacity to the non-volatile write cache. The disks start consuming more power, but also perform better with ceph.

For our HDDs I have never seen a degradation, fortunately - or one could say that maybe they are so crappy that it couldn't get any worse :). In case our vendor reads this, this was a practical joke :)

The main question here is, do you want to risk data loss on power loss? Ceph is extremely sensitive to data that was acknowledged as "on disk" by the firmware to disappear after power outage. This is different to journaled file systems like ext4, which manage to roll back to an earlier consistent version. One looses data but the fs is not damaged. Xfs has still problems with that though. With ceph you can loose entire pools without a viable recovery option as was described earlier in this thread.

> Couldn't we just set (uncomment)
>   write_cache = off
> in /etc/hdparm.conf?

I was pondering with that. The problem is, that on Centos systems it seems to be ignored, in general it does not apply to SAS drives, for example, and that it has no working way of configuring which drives to exclude.

For example, while for data disks for ceph we have certain minimum requirements, like functioning power loss protection, for an OS boot drive I really don't care. Power outages on cheap drives that loose writes has not been a problem since ext4. A few log entries or contents of swap - who cares. Here, performance is more important than data security on power loss.

I would require a configurable option that works in the same way for all types of protocols, SATA, SAS, NVMe disks, you name it. At time of writing, I don't know of any.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
Sent: 25 June 2020 00:01:51
To: paul.emmerich; vitalif
Cc: bknecht; ceph-users; s.priebe
Subject:  Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

I did a quick test with wcache off[1]. And have the impression the
simple rados bench of 2 minutes performed a bit worse on my slow hdd's.

[1]
IFS=$'\n' && for line in `mount | grep 'osd/ceph'| awk '{print $1"
"$3}'| sed -e 's/1 / /' -e 's#/var/lib/ceph/osd/ceph-##'`;do IFS=' '
arr=($line); service ceph-osd@${arr[1]} stop && smartctl -s wcache,off
${arr[0]} && service ceph-osd@${arr[1]} start ;done


-----Original Message-----
To: Paul Emmerich
Cc: Benoît Knecht; s.priebe@xxxxxxxxxxxx; ceph-users@xxxxxxx
Subject:  Re: High ceph_osd_commit_latency_ms on Toshiba
MG07ACA14TE HDDs

Hi, https://yourcmc.ru/wiki/Ceph_performance author here %)

Disabling write cache is REALLY bad for SSDs without capacitors
[consumer SSDs], also it's bad for HDDs with firmwares that don't have
this bug-o-feature. The bug is really common though. I have no idea
where it comes from, but it's really common. When you "disable" the
write cache you actually "enable" the non-volatile write cache on those
drives. Seagate EXOS drives also behave like that... It seems most EXOS
drives have an SSD cache even though it's not mentioned in specs. And it
gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt
linear write performance even on those HDDs, though.

> Well, what I was saying was "does it hurt to unconditionally run
> hdparm -W 0 on all disks?"
>
> Which disk would suffer from this? I haven't seen any disk where this
> would be a bad idea
>
> Paul
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux