Hello, On Thu, 5 Jan 2017 23:02:51 +0100 Oliver Dzombic wrote: I've never seen hung qemu tasks, slow/hung I/O tasks inside VMs with a broken/slow cluster I've seen. That's because mine are all RBD librbd backed. I think your approach with cephfs probably isn't the way forward. Also with cephfs you probably want to run the latest and greatest kernel there is (4.8?). Is your cluster logging slow request warnings during that time? > > In the night, thats when this issues occure primary/(only?), we run the > scrubs and deep scrubs. > > In this time the HDD Utilization of the cold storage peaks to 80-95%. > Never a good thing, if they are also expected to do something useful. HDD OSDs have their journals inline? > But we have a SSD hot storage in front of this, which is buffering > writes and reads. > With that you mean cache-tier in writeback mode? > In our ceph.conf we already have this settings active: > > osd max scrubs = 1 > osd scrub begin hour = 20 > osd scrub end hour = 7 > osd op threads = 16 > osd client op priority = 63 > osd recovery op priority = 1 > osd op thread timeout = 5 > > osd disk thread ioprio class = idle > osd disk thread ioprio priority = 7 > You're missing the most powerful scrub dampener there is: osd_scrub_sleep = 0.1 > > > All in all i do not think that there is not enough IO for the clients on > the cold storage ( even it looks like that on the first view ). > I find that one of the best ways to understand and thus manage your cluster is to run something like collectd with graphite (or grafana or whatever cranks your tractor). This should in combination with detailed spot analysis by atop or similar give a very good idea of what is going on. So in this case, watch cache-tier promotions and flushes, see if your clients I/Os really are covered by the cache or if during the night your VMs may do log rotates or access other cold data and thus have to go to the HDD based OSDs... > And if its really as simple as too view IO for the clients, my question > would be, how to avoid it ? > > Turning off scrub/deep scrub completely ? That should not be needed and > is also not too much advisable. > >From where I'm standing deep-scrub is a luxury bling thing of limited value when compared to something with integrated live checksums as in Bluestore (so we hope) and BTRFS/ZFS. That said, your cluster NEEDs to be able to survive scrubs or it will be in even bigger trouble when OSDs/nodes fail. Christian > We simply can not run less than > > osd max scrubs = 1 > > > So if scrub is eating away all IO, the scrub algorythem is simply too > aggressiv. > > Or, and thats most probable i guess, i have some kind of config mistake. > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com