Performance really drops from 700MB/s to 10MB/s

ganders@xxxxxxxxxxxx (German Anders) · Thu, 14 Aug 2014 10:24:44 -0400

Hi Mariusz,

      Thanks a lot for the ideas, I've rebooted the client server, map 
again the rbd and launch the fio test again, this time it work... very 
rare....while running the test I run also:

ceph at cephmon01:~$ ceph osd perf
osdid fs_commit_latency(ms) fs_apply_latency(ms)
    0                   506                   22
    1                   465                   26
    2                   490                    3
    3                   623                   13
    4                   548                   68
    5                   484                   16
    6                   448                    2
    7                   523                   27
    8                   489                   30
    9                   498                   52
   10                   472                   12
   11                   407                    7
   12                   315                    0
   13                   540                   17
   14                   599                   18
   15                   420                   14
   16                   515                    7
   17                   395                    3
   18                   565                   14
   19                   557                   59
   20                   515                    7
   21                   689                   56
   22                   474                   10
   23                   142                    1
   24                   364                    7
   25                   390                    6
   26                   507                  107
   27                   573                   20
   28                   158                    1
   29                   490                   25
   30                   301                    0
   31                   381                   15
   32                   440                   27
   33                   482                   16
   34                   323                    9
   35                   414                   21

I don't see any suspicious here. The fio command was:

$ sudo fio --filename=/dev/rbd0 --direct=1 --rw=write --bs=4m 
--size=10G --iodepth=16 --ioengine=libaio --runtime=60 
--group_reporting --name=fileB
fileB: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, 
iodepth=16
fio-2.1.3
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0KB/748.0MB/0KB /s] [0/187/0 iops] 
[eta 00m:00s]
fileB: (groupid=0, jobs=1): err= 0: pid=2172: Thu Aug 14 10:21:13 2014
  write: io=10240MB, bw=741672KB/s, iops=181, runt= 14138msec
    slat (usec): min=569, max=2747, avg=1741.44, stdev=507.08
    clat (msec): min=19, max=465, avg=86.55, stdev=35.16
     lat (msec): min=20, max=466, avg=88.30, stdev=34.92
    clat percentiles (msec):
     |  1.00th=[   39],  5.00th=[   54], 10.00th=[   60], 20.00th=[   
64],
     | 30.00th=[   69], 40.00th=[   75], 50.00th=[   81], 60.00th=[   
85],
     | 70.00th=[   92], 80.00th=[  102], 90.00th=[  124], 95.00th=[  
147],
     | 99.00th=[  217], 99.50th=[  258], 99.90th=[  424], 99.95th=[  
441],
     | 99.99th=[  465]
    bw (KB  /s): min=686754, max=783298, per=99.81%, avg=740262.96, 
stdev=19845.43
    lat (msec) : 20=0.04%, 50=3.36%, 100=75.51%, 250=20.51%, 500=0.59%
  cpu          : usr=6.18%, sys=12.97%, ctx=11554, majf=0, minf=2225
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued    : total=r=0/w=2560/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=10240MB, aggrb=741672KB/s, minb=741672KB/s, 
maxb=741672KB/s, mint=14138msec, maxt=14138msec

Disk stats (read/write):
  rbd0: ios=182/20459, merge=0/0, ticks=92/1213748, in_queue=1214796, 
util=99.80%
ceph at mail02-old:~$

German Anders

> --- Original message ---
> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to 
> 10MB/s
> De: Mariusz Gronczewski <mariusz.gronczewski at efigence.com>
> Para: German Anders <ganders at despegar.com>
> Cc: <ceph-users at lists.ceph.com>
> Fecha: Thursday, 14/08/2014 10:56
>
> Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful.
>
> Few ideas:
>
> * do 'ceph health detail' to get detail of which OSD is stalling
> * 'ceph osd perf' to see latency of each osd
> * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok 
> dump_historic_ops' shows "recent slow" ops
>
> I actually have very similiar problem, cluster goes full speed 
> (sometimes even for hours) and suddenly everything stops for a minute 
> or 5, no disk IO, no IO wait (so disks are fine), no IO errors in 
> kernel log, and OSDs only complain that other OSD subop is slow (but 
> on that OSD everything looks fine too)
>
> On Wed, 13 Aug 2014 16:04:30 -0400, German Anders
> <ganders at despegar.com> wrote:
>
>>
>> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that
>> it freeze the prompt. Any ideas? I've attach some syslogs from one of
>> the OSD servers and also from the client. Both are running Ubuntu
>> 14.04LTS with Kernel  3.15.8.
>> The cluster is not usable at this point, since I can't run a "ls" on
>> the rbd.
>>
>> Thanks in advance,
>>
>> Best regards,
>>
>>
>> German Anders
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>> --- Original message ---
>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>> 10MB/s
>>> De: German Anders <ganders at despegar.com>
>>> Para: Mark Nelson <mark.nelson at inktank.com>
>>> Cc: <ceph-users at lists.ceph.com>
>>> Fecha: Wednesday, 13/08/2014 11:09
>>>
>>>
>>> Actually is very strange, since if i run the fio test on the client,
>>> and also un parallel run a iostat on all the OSD servers, i don't see
>>> any workload going on over the disks, I mean... nothing! 0.00....and
>>> also the fio script on the client is reacting very rare too:
>>>
>>>
>>> $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m
>>> --size=10G --iodepth=16 --ioengine=libaio --runtime=60
>>> --group_reporting --name=file99
>>> file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio,
>>> iodepth=16
>>> fio-2.1.3
>>> Starting 1 process
>>> Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>> 01h:26m:43s]
>>>
>>> It's seems like is doing nothing..
>>>
>>>
>>>
>>> German Anders
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> --- Original message ---
>>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>>> 10MB/s
>>>> De: Mark Nelson <mark.nelson at inktank.com>
>>>> Para: <ceph-users at lists.ceph.com>
>>>> Fecha: Wednesday, 13/08/2014 11:00
>>>>
>>>> On 08/13/2014 08:19 AM, German Anders wrote:
>>>>>
>>>>>
>>>>> Hi to all,
>>>>>
>>>>>                                I'm having a particular behavior on a 
>>>>> new Ceph cluster.
>>>>> I've map
>>>>> a RBD to a client and issue some performance tests with fio, at this
>>>>> point everything goes just fine (also the results :) ), but then I try
>>>>> to run another new test on a new RBD on the same client, and suddenly
>>>>> the performance goes below 10MB/s and it took almost 10 minutes to
>>>>> complete a 10G file test, if I issue a *ceph -w* I don't see anything
>>>>> suspicious, any idea what can be happening here?
>>>>
>>>> When things are going fast, are your disks actually writing data out
>>>> as
>>>> fast as your client IO would indicate? (don't forgot to count
>>>> replication!)  It may be that the great speed is just writing data
>>>> into
>>>> the tmpfs journals (if the test is only 10GB and spread across 36
>>>> OSDs,
>>>> it could finish pretty quickly writing to tmpfs!).  FWIW, tmpfs
>>>> journals
>>>> aren't very safe.  It's not something you want to use outside of
>>>> testing
>>>> except in unusual circumstances.
>>>>
>>>> In your tests, when things are bad: it's generally worth checking to
>>>> see
>>>> if any one disk/osd is backed up relative to the others.  There are a
>>>> couple of ways to accomplish this.  the Ceph admin socket can tell you
>>>> information about each OSD ie how many outstanding IOs and a history
>>>> of
>>>> slow ops.  You can also look at per-disk statistics with something
>>>> like
>>>> iostat or collectl.
>>>>
>>>> Hope this helps!
>>>>
>>>>>
>>>>>
>>>>>
>>>>>                                The cluster is made of:
>>>>>
>>>>> 3 x MON Servers
>>>>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal
>>>>> ->
>>>>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each
>>>>> server)
>>>>> 2 x Network SW (Cluster and Public)
>>>>> 10GbE speed on both networks
>>>>>
>>>>>                                The ceph.conf file is the following:
>>>>>
>>>>> [global]
>>>>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1
>>>>> mon_initial_members = cephmon01, cephmon02, cephmon03
>>>>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3
>>>>> auth_client_required = cephx
>>>>> auth_cluster_required = cephx
>>>>> auth_service_required = cephx
>>>>> filestore_xattr_use_omap = true
>>>>> public_network = 10.97.0.0/16
>>>>> cluster_network = 192.168.10.0/24
>>>>> osd_pool_default_size = 2
>>>>> glance_api_version = 2
>>>>>
>>>>> [mon]
>>>>> debug_optracker = 0
>>>>>
>>>>> [mon.cephmon01]
>>>>> host = cephmon01
>>>>> mon_addr = 10.97.10.1:6789
>>>>>
>>>>> [mon.cephmon02]
>>>>> host = cephmon02
>>>>> mon_addr = 10.97.10.2:6789
>>>>>
>>>>> [mon.cephmon03]
>>>>> host = cephmon03
>>>>> mon_addr = 10.97.10.3:6789
>>>>>
>>>>> [osd]
>>>>> journal_dio = false
>>>>> osd_journal_size = 4096
>>>>> fstype = btrfs
>>>>> debug_optracker = 0
>>>>>
>>>>> [osd.0]
>>>>> host = cephosd01
>>>>> devs = /dev/sdc1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.1]
>>>>> host = cephosd01
>>>>> devs = /dev/sdd1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.2]
>>>>> host = cephosd01
>>>>> devs = /dev/sdf1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.3]
>>>>> host = cephosd01
>>>>> devs = /dev/sdg1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.4]
>>>>> host = cephosd01
>>>>> devs = /dev/sdi1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.5]
>>>>> host = cephosd01
>>>>> devs = /dev/sdj1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.6]
>>>>> host = cephosd01
>>>>> devs = /dev/sdl1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.7]
>>>>> host = cephosd01
>>>>> devs = /dev/sdm1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.8]
>>>>> host = cephosd01
>>>>> devs = /dev/sdn1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.9]
>>>>> host = cephosd02
>>>>> devs = /dev/sdc1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.10]
>>>>> host = cephosd02
>>>>> devs = /dev/sdd1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.11]
>>>>> host = cephosd02
>>>>> devs = /dev/sdf1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.12]
>>>>> host = cephosd02
>>>>> devs = /dev/sdg1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.13]
>>>>> host = cephosd02
>>>>> devs = /dev/sdi1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.14]
>>>>> host = cephosd02
>>>>> devs = /dev/sdj1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.15]
>>>>> host = cephosd02
>>>>> devs = /dev/sdl1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.16]
>>>>> host = cephosd02
>>>>> devs = /dev/sdm1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.17]
>>>>> host = cephosd02
>>>>> devs = /dev/sdn1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.18]
>>>>> host = cephosd03
>>>>> devs = /dev/sdc1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.19]
>>>>> host = cephosd03
>>>>> devs = /dev/sdd1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.20]
>>>>> host = cephosd03
>>>>> devs = /dev/sdf1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.21]
>>>>> host = cephosd03
>>>>> devs = /dev/sdg1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.22]
>>>>> host = cephosd03
>>>>> devs = /dev/sdi1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.23]
>>>>> host = cephosd03
>>>>> devs = /dev/sdj1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.24]
>>>>> host = cephosd03
>>>>> devs = /dev/sdl1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.25]
>>>>> host = cephosd03
>>>>> devs = /dev/sdm1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.26]
>>>>> host = cephosd03
>>>>> devs = /dev/sdn1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.27]
>>>>> host = cephosd04
>>>>> devs = /dev/sdc1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.28]
>>>>> host = cephosd04
>>>>> devs = /dev/sdd1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.29]
>>>>> host = cephosd04
>>>>> devs = /dev/sdf1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.30]
>>>>> host = cephosd04
>>>>> devs = /dev/sdg1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.31]
>>>>> host = cephosd04
>>>>> devs = /dev/sdi1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.32]
>>>>> host = cephosd04
>>>>> devs = /dev/sdj1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.33]
>>>>> host = cephosd04
>>>>> devs = /dev/sdl1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.34]
>>>>> host = cephosd04
>>>>> devs = /dev/sdm1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [osd.35]
>>>>> host = cephosd04
>>>>> devs = /dev/sdn1
>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>
>>>>> [client.volumes]
>>>>> keyring = /etc/ceph/ceph.client.volumes.keyring
>>>>>
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Best regards,
>>>>>
>>>>> *German Anders
>>>>> *
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wo?oska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczewski at efigence.com
> <mailto:mariusz.gronczewski at efigence.com>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/a35cfe29/attachment.htm>