Re: cephfs ata1.00: status: { DRDY }

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Fri, 6 Jan 2017 13:07:20 +0100

Hi Christian,

thank you for your comments.

Unfortunatelly i have here some Softwarerestrictions, so i can not go
with the librbd backend and have to go with cephfs.

I understand, that a newer kernelsoftware might help solving issues. But
we can not upgrade every time to every new kernel being released.

If cephfs is that much sensitive, its simply not really useable. And i
think it is useable, so i think the kernel is not the issue here.

Between 4.5 and 4.8 is not soo much timespace, that i would search the
issue here.

----

Right now, unfortunatelly, i have some trouble to assign log warnings
within ceph to the issues inside the VM's because the VM's are not run
by us. So i lack of exact informations.

I will have to setup our own VM's there and do some logging/checking.

But as far as the logs told me, there are no slow requests.

----

HDDs have their journals on seperate HW Raid-10 SSD's ( SM863's  )

----

Yes, writeback mode.

----

I inserted osd_scrub_sleep = 0.1 now. But currently also doing an
extension of the ceph, so its anyway busy with recovery ( without any
issues so far, even the utilization is basically same with (deep) scrubbing.

The SSD Cache we run is quiet big and quiet fast. So most of the heavy
IO is handled there.

The HDD's usually have nothing to do ( 200-500 kb per second throughput
per device at ~5% utilization at 5/6 IOPS ).

Thats during productive time.

At night time, running scrub, the stats are raising to

50 MB/s throughput, 80-95% utilization at 400 IOPS reading

-----

I hope this osd_scrub_sleep = 0.1 will have some impact as soon as the
cluster is back to normal and i turn on scrubbing again.

Again, thank you very much !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 06.01.2017 um 01:56 schrieb Christian Balzer:
> 
> Hello,
> 
> On Thu, 5 Jan 2017 23:02:51 +0100 Oliver Dzombic wrote:
> 
> 
> I've never seen hung qemu tasks, slow/hung I/O tasks inside VMs with a
> broken/slow cluster I've seen.
> That's because mine are all RBD librbd backed.
> 
> I think your approach with cephfs probably isn't the way forward.
> Also with cephfs you probably want to run the latest and greatest kernel
> there is (4.8?).
> 
> Is your cluster logging slow request warnings during that time?
> 
>>
>> In the night, thats when this issues occure primary/(only?), we run the
>> scrubs and deep scrubs.
>>
>> In this time the HDD Utilization of the cold storage peaks to 80-95%.
>>
> Never a good thing, if they are also expected to do something useful.
> HDD OSDs have their journals inline?
> 
>> But we have a SSD hot storage in front of this, which is buffering
>> writes and reads.
>>
> With that you mean cache-tier in writeback mode?
>  
>> In our ceph.conf we already have this settings active:
>>
>> osd max scrubs = 1
>> osd scrub begin hour = 20
>> osd scrub end hour = 7
>> osd op threads = 16
>> osd client op priority = 63
>> osd recovery op priority = 1
>> osd op thread timeout = 5
>>
>> osd disk thread ioprio class = idle
>> osd disk thread ioprio priority = 7
>>
> You're missing the most powerful scrub dampener there is:
> osd_scrub_sleep = 0.1
> 
>>
>>
>> All in all i do not think that there is not enough IO for the clients on
>> the cold storage ( even it looks like that on the first view ).
>>
> I find that one of the best ways to understand and thus manage your
> cluster is to run something like collectd with graphite (or grafana or
> whatever cranks your tractor).
> 
> This should in combination with detailed spot analysis by atop or similar
> give a very good idea of what is going on.
> 
> So in this case, watch cache-tier promotions and flushes, see if your
> clients I/Os really are covered by the cache or if during the night your
> VMs may do log rotates or access other cold data and thus have to go to
> the HDD based OSDs...
>  
>> And if its really as simple as too view IO for the clients, my question
>> would be, how to avoid it ?
>>
>> Turning off scrub/deep scrub completely ? That should not be needed and
>> is also not too much advisable.
>>
> From where I'm standing deep-scrub is a luxury bling thing of limited
> value when compared to something with integrated live checksums as in
> Bluestore (so we hope) and BTRFS/ZFS. 
> 
> That said, your cluster NEEDs to be able to survive scrubs or it will be
> in even bigger trouble when OSDs/nodes fail.
> 
> Christian
> 
>> We simply can not run less than
>>
>> osd max scrubs = 1
>>
>>
>> So if scrub is eating away all IO, the scrub algorythem is simply too
>> aggressiv.
>>
>> Or, and thats most probable i guess, i have some kind of config mistake.
>>
>>
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com