Performance really drops from 700MB/s to 10MB/s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also I attach a screenshot of a nmon process running on node cephosd01 
while running the fio test


German Anders
















> --- Original message ---
> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to 
> 10MB/s
> De: German Anders <ganders at despegar.com>
> Para: Mariusz Gronczewski <mariusz.gronczewski at efigence.com>
> Cc: <ceph-users at lists.ceph.com>
> Fecha: Thursday, 14/08/2014 11:25
>
>
>
> Hi Mariusz,
>
>      Thanks a lot for the ideas, I've rebooted the client server, map 
> again the rbd and launch the fio test again, this time it work... very 
> rare....while running the test I run also:
>
> ceph at cephmon01:~$ ceph osd perf
> osdid fs_commit_latency(ms) fs_apply_latency(ms)
>    0                   506                   22
>    1                   465                   26
>    2                   490                    3
>    3                   623                   13
>    4                   548                   68
>    5                   484                   16
>    6                   448                    2
>    7                   523                   27
>    8                   489                   30
>    9                   498                   52
>   10                   472                   12
>   11                   407                    7
>   12                   315                    0
>   13                   540                   17
>   14                   599                   18
>   15                   420                   14
>   16                   515                    7
>   17                   395                    3
>   18                   565                   14
>   19                   557                   59
>   20                   515                    7
>   21                   689                   56
>   22                   474                   10
>   23                   142                    1
>   24                   364                    7
>   25                   390                    6
>   26                   507                  107
>   27                   573                   20
>   28                   158                    1
>   29                   490                   25
>   30                   301                    0
>   31                   381                   15
>   32                   440                   27
>   33                   482                   16
>   34                   323                    9
>   35                   414                   21
>
> I don't see any suspicious here. The fio command was:
>
>
> $ sudo fio --filename=/dev/rbd0 --direct=1 --rw=write --bs=4m 
> --size=10G --iodepth=16 --ioengine=libaio --runtime=60 
> --group_reporting --name=fileB
> fileB: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, 
> iodepth=16
> fio-2.1.3
> Starting 1 process
> Jobs: 1 (f=1): [W] [100.0% done] [0KB/748.0MB/0KB /s] [0/187/0 iops] 
> [eta 00m:00s]
> fileB: (groupid=0, jobs=1): err= 0: pid=2172: Thu Aug 14 10:21:13 2014
>  write: io=10240MB, bw=741672KB/s, iops=181, runt= 14138msec
>    slat (usec): min=569, max=2747, avg=1741.44, stdev=507.08
>    clat (msec): min=19, max=465, avg=86.55, stdev=35.16
>     lat (msec): min=20, max=466, avg=88.30, stdev=34.92
>    clat percentiles (msec):
>     |  1.00th=[   39],  5.00th=[   54], 10.00th=[   60], 20.00th=[   
> 64],
>     | 30.00th=[   69], 40.00th=[   75], 50.00th=[   81], 60.00th=[   
> 85],
>     | 70.00th=[   92], 80.00th=[  102], 90.00th=[  124], 95.00th=[  
> 147],
>     | 99.00th=[  217], 99.50th=[  258], 99.90th=[  424], 99.95th=[  
> 441],
>     | 99.99th=[  465]
>    bw (KB  /s): min=686754, max=783298, per=99.81%, avg=740262.96, 
> stdev=19845.43
>    lat (msec) : 20=0.04%, 50=3.36%, 100=75.51%, 250=20.51%, 500=0.59%
>  cpu          : usr=6.18%, sys=12.97%, ctx=11554, majf=0, minf=2225
>  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, 
>>=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>>=64=0.0%
>     issued    : total=r=0/w=2560/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
>  WRITE: io=10240MB, aggrb=741672KB/s, minb=741672KB/s, 
> maxb=741672KB/s, mint=14138msec, maxt=14138msec
>
> Disk stats (read/write):
>  rbd0: ios=182/20459, merge=0/0, ticks=92/1213748, in_queue=1214796, 
> util=99.80%
> ceph at mail02-old:~$
>
>
>
>
>
> German Anders
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>> --- Original message ---
>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to 
>> 10MB/s
>> De: Mariusz Gronczewski <mariusz.gronczewski at efigence.com>
>> Para: German Anders <ganders at despegar.com>
>> Cc: <ceph-users at lists.ceph.com>
>> Fecha: Thursday, 14/08/2014 10:56
>>
>> Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful.
>>
>> Few ideas:
>>
>> * do 'ceph health detail' to get detail of which OSD is stalling
>> * 'ceph osd perf' to see latency of each osd
>> * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok 
>> dump_historic_ops' shows "recent slow" ops
>>
>> I actually have very similiar problem, cluster goes full speed 
>> (sometimes even for hours) and suddenly everything stops for a minute 
>> or 5, no disk IO, no IO wait (so disks are fine), no IO errors in 
>> kernel log, and OSDs only complain that other OSD subop is slow (but 
>> on that OSD everything looks fine too)
>>
>> On Wed, 13 Aug 2014 16:04:30 -0400, German Anders
>> <ganders at despegar.com> wrote:
>>
>>>
>>> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that
>>> it freeze the prompt. Any ideas? I've attach some syslogs from one of
>>> the OSD servers and also from the client. Both are running Ubuntu
>>> 14.04LTS with Kernel  3.15.8.
>>> The cluster is not usable at this point, since I can't run a "ls" on
>>> the rbd.
>>>
>>> Thanks in advance,
>>>
>>> Best regards,
>>>
>>>
>>> German Anders
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> --- Original message ---
>>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>>> 10MB/s
>>>> De: German Anders <ganders at despegar.com>
>>>> Para: Mark Nelson <mark.nelson at inktank.com>
>>>> Cc: <ceph-users at lists.ceph.com>
>>>> Fecha: Wednesday, 13/08/2014 11:09
>>>>
>>>>
>>>> Actually is very strange, since if i run the fio test on the client,
>>>> and also un parallel run a iostat on all the OSD servers, i don't see
>>>> any workload going on over the disks, I mean... nothing! 0.00....and
>>>> also the fio script on the client is reacting very rare too:
>>>>
>>>>
>>>> $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m
>>>> --size=10G --iodepth=16 --ioengine=libaio --runtime=60
>>>> --group_reporting --name=file99
>>>> file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio,
>>>> iodepth=16
>>>> fio-2.1.3
>>>> Starting 1 process
>>>> Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>>> 01h:26m:43s]
>>>>
>>>> It's seems like is doing nothing..
>>>>
>>>>
>>>>
>>>> German Anders
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> --- Original message ---
>>>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>>>>> 10MB/s
>>>>> De: Mark Nelson <mark.nelson at inktank.com>
>>>>> Para: <ceph-users at lists.ceph.com>
>>>>> Fecha: Wednesday, 13/08/2014 11:00
>>>>>
>>>>> On 08/13/2014 08:19 AM, German Anders wrote:
>>>>>>
>>>>>>
>>>>>> Hi to all,
>>>>>>
>>>>>>                                I'm having a particular behavior on a 
>>>>>> new Ceph cluster.
>>>>>> I've map
>>>>>> a RBD to a client and issue some performance tests with fio, at this
>>>>>> point everything goes just fine (also the results :) ), but then I try
>>>>>> to run another new test on a new RBD on the same client, and suddenly
>>>>>> the performance goes below 10MB/s and it took almost 10 minutes to
>>>>>> complete a 10G file test, if I issue a *ceph -w* I don't see anything
>>>>>> suspicious, any idea what can be happening here?
>>>>>
>>>>> When things are going fast, are your disks actually writing data out
>>>>> as
>>>>> fast as your client IO would indicate? (don't forgot to count
>>>>> replication!)  It may be that the great speed is just writing data
>>>>> into
>>>>> the tmpfs journals (if the test is only 10GB and spread across 36
>>>>> OSDs,
>>>>> it could finish pretty quickly writing to tmpfs!).  FWIW, tmpfs
>>>>> journals
>>>>> aren't very safe.  It's not something you want to use outside of
>>>>> testing
>>>>> except in unusual circumstances.
>>>>>
>>>>> In your tests, when things are bad: it's generally worth checking to
>>>>> see
>>>>> if any one disk/osd is backed up relative to the others.  There are a
>>>>> couple of ways to accomplish this.  the Ceph admin socket can tell you
>>>>> information about each OSD ie how many outstanding IOs and a history
>>>>> of
>>>>> slow ops.  You can also look at per-disk statistics with something
>>>>> like
>>>>> iostat or collectl.
>>>>>
>>>>> Hope this helps!
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                                The cluster is made of:
>>>>>>
>>>>>> 3 x MON Servers
>>>>>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal
>>>>>> ->
>>>>>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each
>>>>>> server)
>>>>>> 2 x Network SW (Cluster and Public)
>>>>>> 10GbE speed on both networks
>>>>>>
>>>>>>                                The ceph.conf file is the following:
>>>>>>
>>>>>> [global]
>>>>>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1
>>>>>> mon_initial_members = cephmon01, cephmon02, cephmon03
>>>>>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3
>>>>>> auth_client_required = cephx
>>>>>> auth_cluster_required = cephx
>>>>>> auth_service_required = cephx
>>>>>> filestore_xattr_use_omap = true
>>>>>> public_network = 10.97.0.0/16
>>>>>> cluster_network = 192.168.10.0/24
>>>>>> osd_pool_default_size = 2
>>>>>> glance_api_version = 2
>>>>>>
>>>>>> [mon]
>>>>>> debug_optracker = 0
>>>>>>
>>>>>> [mon.cephmon01]
>>>>>> host = cephmon01
>>>>>> mon_addr = 10.97.10.1:6789
>>>>>>
>>>>>> [mon.cephmon02]
>>>>>> host = cephmon02
>>>>>> mon_addr = 10.97.10.2:6789
>>>>>>
>>>>>> [mon.cephmon03]
>>>>>> host = cephmon03
>>>>>> mon_addr = 10.97.10.3:6789
>>>>>>
>>>>>> [osd]
>>>>>> journal_dio = false
>>>>>> osd_journal_size = 4096
>>>>>> fstype = btrfs
>>>>>> debug_optracker = 0
>>>>>>
>>>>>> [osd.0]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.1]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.2]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.3]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.4]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.5]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.6]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.7]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.8]
>>>>>> host = cephosd01
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.9]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.10]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.11]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.12]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.13]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.14]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.15]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.16]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.17]
>>>>>> host = cephosd02
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.18]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.19]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.20]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.21]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.22]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.23]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.24]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.25]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.26]
>>>>>> host = cephosd03
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.27]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdc1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.28]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdd1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.29]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdf1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.30]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdg1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.31]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdi1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.32]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdj1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.33]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdl1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.34]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdm1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [osd.35]
>>>>>> host = cephosd04
>>>>>> devs = /dev/sdn1
>>>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>>>>>>
>>>>>> [client.volumes]
>>>>>> keyring = /etc/ceph/ceph.client.volumes.keyring
>>>>>>
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> *German Anders
>>>>>> *
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>>
>>
>> --
>> Mariusz Gronczewski, Administrator
>>
>> Efigence S. A.
>> ul. Wo?oska 9a, 02-583 Warszawa
>> T: [+48] 22 380 13 13
>> F: [+48] 22 380 13 14
>> E: mariusz.gronczewski at efigence.com
>> <mailto:mariusz.gronczewski at efigence.com>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/8ec0b308/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cephosd01-perf-while-running-fio.png
Type: image/png
Size: 116300 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/8ec0b308/attachment.png>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux