Performance really drops from 700MB/s to 10MB/s

ganders@xxxxxxxxxxxx (German Anders) · Thu, 14 Aug 2014 15:00:50 -0400



I use nmon on each OSD server, this is a really good tool to find out 
what is going on regarding CPU, Mem, Disks and Networking


German Anders


> --- Original message ---
> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to 
> 10MB/s
> De: Craig Lewis <clewis at centraldesktop.com>
> Para: Mariusz Gronczewski <mariusz.gronczewski at efigence.com>
> Cc: German Anders <ganders at despegar.com>, Ceph Users 
> <ceph-users at lists.ceph.com>
> Fecha: Thursday, 14/08/2014 15:42
>
> I find graphs really help here.  One screen that has all the disk I/O
> and latency for all OSDs makes it easy to pin point the bottleneck.
>
> If you don't have that, I'd go low tech: Watch the blinky lights. It's
> really easy to see which disk is the hotspot.
>
>
>
> On Thu, Aug 14, 2014 at 6:56 AM, Mariusz Gronczewski
> <mariusz.gronczewski at efigence.com> wrote:
>>
>> Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful.
>>
>> Few ideas:
>>
>> * do 'ceph health detail' to get detail of which OSD is stalling
>> * 'ceph osd perf' to see latency of each osd
>> * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok 
>> dump_historic_ops' shows "recent slow" ops
>>
>> I actually have very similiar problem, cluster goes full speed 
>> (sometimes even for hours) and suddenly everything stops for a minute 
>> or 5, no disk IO, no IO wait (so disks are fine), no IO errors in 
>> kernel log, and OSDs only complain that other OSD subop is slow (but 
>> on that OSD everything looks fine too)
>>
>> On Wed, 13 Aug 2014 16:04:30 -0400, German Anders
>> <ganders at despegar.com> wrote:
>>
>>>
>>> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that
>>> it freeze the prompt. Any ideas? I've attach some syslogs from one of
>>> the OSD servers and also from the client. Both are running Ubuntu
>>> 14.04LTS with Kernel  3.15.8.
>>> The cluster is not usable at this point, since I can't run a "ls" on
>>> the rbd.
>>>
>>> Thanks in advance,
>>>
>>> Best regards,
>>>
>>>
>>> German Anders
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> --- Original message ---
>>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>>> 10MB/s
>>>> De: German Anders <ganders at despegar.com>
>>>> Para: Mark Nelson <mark.nelson at inktank.com>
>>>> Cc: <ceph-users at lists.ceph.com>
>>>> Fecha: Wednesday, 13/08/2014 11:09
>>>>
>>>>
>>>> Actually is very strange, since if i run the fio test on the client,
>>>> and also un parallel run a iostat on all the OSD servers, i don't see
>>>> any workload going on over the disks, I mean... nothing! 0.00....and
>>>> also the fio script on the client is reacting very rare too:
>>>>
>>>>
>>>> $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m
>>>> --size=10G --iodepth=16 --ioengine=libaio --runtime=60
>>>> --group_reporting --name=file99
>>>> file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio,
>>>> iodepth=16
>>>> fio-2.1.3
>>>> Starting 1 process
>>>> Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>>> 01h:26m:43s]
>>>>
>>>> It's seems like is doing nothing..
>>>>
>>>>
>>>>
>>>> German Anders
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> --- Original message ---
>>>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>>>> 10MB/s
>>>>> De: Mark Nelson <mark.nelson at inktank.com>
>>>>> Para: <ceph-users at lists.ceph.com>
>>>>> Fecha: Wednesday, 13/08/2014 11:00
>>>>>
>>>>> On 08/13/2014 08:19 AM, German Anders wrote:
>>>>>>
>>>>>>
>>>>>> Hi to all,
>>>>>>
>>>>>>                                I'm having a particular behavior on a 
>>>>>> new Ceph cluster.
>>>>>> I've map
>>>>>> a RBD to a client and issue some performance tests with fio, at this
>>>>>> point everything goes just fine (also the results :) ), but then I try
>>>>>> to run another new test on a new RBD on the same client, and suddenly
>>>>>> the performance goes below 10MB/s and it took almost 10 minutes to
>>>>>> complete a 10G file test, if I issue a *ceph -w* I don't see anything
>>>>>> suspicious, any idea what can be happening here?
>>>>>
>>>>> When things are going fast, are your disks actually writing data out
>>>>> as
>>>>> fast as your client IO would indicate? (don't forgot to count
>>>>> replication!)  It may be that the great speed is just writing data
>>>>> into
>>>>> the tmpfs journals (if the test is only 10GB and spread across 36
>>>>> OSDs,
>>>>> it could finish pretty quickly writing to tmpfs!).  FWIW, tmpfs
>>>>> journals
>>>>> aren't very safe.  It's not something you want to use outside of
>>>>> testing
>>>>> except in unusual circumstances.
>>>>>
>>>>> In your tests, when things are bad: it's generally worth checking to
>>>>> see
>>>>> if any one disk/osd is backed up relative to the others.  There are a
>>>>> couple of ways to accomplish this.  the Ceph admin socket can tell you
>>>>> information about each OSD ie how many outstanding IOs and a history
>>>>> of
>>>>> slow ops.  You can also look at per-disk statistics with something
>>>>> like
>>>>> iostat or collectl.
>>>>>
>>>>> Hope this helps!
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                                The cluster is made of:
>>>>>>
>>>>>> 3 x MON Servers
>>>>>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal
>>>>>> ->
>>>>>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each
>>>>>> server)
>>>>>> 2 x Network SW (Cluster and Public)
>>>>>> 10GbE speed on both networks
>>>>>>
>>>>>>                                The ceph.conf file is the following:
>>>>>>
>>>>>> [global]
>>>>>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1
>>>>>> mon_initial_members = cephmon01, cephmon02, cephmon03
>>>>>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3
>>>>>> auth_client_required = cephx
>>>>>> auth_cluster_required = cephx
>>>>>> auth_service_required = cephx
>>>>>> filestore_xattr_use_omap = true
>>>>>> public_network = 10.97.0.0/16
>>>>>> cluster_network = 192.168.10.0/24
>>>>>> osd_pool_default_size = 2
>>>>>> glance_api_version = 2
>>>>>>
>>>>>> [mon]
>>>>>> debug_optracker = 0
>>>>>>
>>>>>> [mon.cephmon01]
>>>>>> host = cephmon01
>>>>>> mon_addr = 10.97.10.1:6789
>>>>>>
>>>>>> [mon.cephmon02]
>>>>>> host = cephmon02
>>>>>> mon_addr = 10.97.10.2:6789
>>>>>>
>>>>>> [mon.cephmon03]
>>>>>> host = cephmon03
>>>>>> mon_addr = 10.97.10.3:6789
>>>>>>
>>>>>> [osd]
>>>>>> journal_dio = false
>>>>>> osd_journal_size = 4096
>>>>>> fstype = btrfs
>>>>>> debug_optracker = 0
>>>>>>
>>>>>> [osd.0]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.1]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.2]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.3]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.4]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.5]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.6]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.7]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.8]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.9]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.10]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.11]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.12]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.13]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.14]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.15]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.16]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.17]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.18]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.19]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.20]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.21]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.22]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.23]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.24]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.25]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.26]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.27]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.28]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.29]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.30]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.31]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.32]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.33]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.34]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.35]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [client.volumes]
>>>>>> keyring = /etc/ceph/ceph.client.volumes.keyring
>>>>>>
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> *German Anders
>>>>>> *
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>>
>>
>> --
>> Mariusz Gronczewski, Administrator
>>
>> Efigence S. A.
>> ul. Wo?oska 9a, 02-583 Warszawa
>> T: [+48] 22 380 13 13
>> F: [+48] 22 380 13 14
>> E: mariusz.gronczewski at efigence.com
>> <mailto:mariusz.gronczewski at efigence.com>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/b7879822/attachment.htm>