Hi Mariusz, Thanks a lot for the ideas, I've rebooted the client server, map again the rbd and launch the fio test again, this time it work... very rare....while running the test I run also: ceph at cephmon01:~$ ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms) 0 506 22 1 465 26 2 490 3 3 623 13 4 548 68 5 484 16 6 448 2 7 523 27 8 489 30 9 498 52 10 472 12 11 407 7 12 315 0 13 540 17 14 599 18 15 420 14 16 515 7 17 395 3 18 565 14 19 557 59 20 515 7 21 689 56 22 474 10 23 142 1 24 364 7 25 390 6 26 507 107 27 573 20 28 158 1 29 490 25 30 301 0 31 381 15 32 440 27 33 482 16 34 323 9 35 414 21 I don't see any suspicious here. The fio command was: $ sudo fio --filename=/dev/rbd0 --direct=1 --rw=write --bs=4m --size=10G --iodepth=16 --ioengine=libaio --runtime=60 --group_reporting --name=fileB fileB: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=16 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/748.0MB/0KB /s] [0/187/0 iops] [eta 00m:00s] fileB: (groupid=0, jobs=1): err= 0: pid=2172: Thu Aug 14 10:21:13 2014 write: io=10240MB, bw=741672KB/s, iops=181, runt= 14138msec slat (usec): min=569, max=2747, avg=1741.44, stdev=507.08 clat (msec): min=19, max=465, avg=86.55, stdev=35.16 lat (msec): min=20, max=466, avg=88.30, stdev=34.92 clat percentiles (msec): | 1.00th=[ 39], 5.00th=[ 54], 10.00th=[ 60], 20.00th=[ 64], | 30.00th=[ 69], 40.00th=[ 75], 50.00th=[ 81], 60.00th=[ 85], | 70.00th=[ 92], 80.00th=[ 102], 90.00th=[ 124], 95.00th=[ 147], | 99.00th=[ 217], 99.50th=[ 258], 99.90th=[ 424], 99.95th=[ 441], | 99.99th=[ 465] bw (KB /s): min=686754, max=783298, per=99.81%, avg=740262.96, stdev=19845.43 lat (msec) : 20=0.04%, 50=3.36%, 100=75.51%, 250=20.51%, 500=0.59% cpu : usr=6.18%, sys=12.97%, ctx=11554, majf=0, minf=2225 IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=2560/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=10240MB, aggrb=741672KB/s, minb=741672KB/s, maxb=741672KB/s, mint=14138msec, maxt=14138msec Disk stats (read/write): rbd0: ios=182/20459, merge=0/0, ticks=92/1213748, in_queue=1214796, util=99.80% ceph at mail02-old:~$ German Anders > --- Original message --- > Asunto: Re: [ceph-users] Performance really drops from 700MB/s to > 10MB/s > De: Mariusz Gronczewski <mariusz.gronczewski at efigence.com> > Para: German Anders <ganders at despegar.com> > Cc: <ceph-users at lists.ceph.com> > Fecha: Thursday, 14/08/2014 10:56 > > Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful. > > Few ideas: > > * do 'ceph health detail' to get detail of which OSD is stalling > * 'ceph osd perf' to see latency of each osd > * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok > dump_historic_ops' shows "recent slow" ops > > I actually have very similiar problem, cluster goes full speed > (sometimes even for hours) and suddenly everything stops for a minute > or 5, no disk IO, no IO wait (so disks are fine), no IO errors in > kernel log, and OSDs only complain that other OSD subop is slow (but > on that OSD everything looks fine too) > > On Wed, 13 Aug 2014 16:04:30 -0400, German Anders > <ganders at despegar.com> wrote: > >> >> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that >> it freeze the prompt. Any ideas? I've attach some syslogs from one of >> the OSD servers and also from the client. Both are running Ubuntu >> 14.04LTS with Kernel 3.15.8. >> The cluster is not usable at this point, since I can't run a "ls" on >> the rbd. >> >> Thanks in advance, >> >> Best regards, >> >> >> German Anders >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>> >>> --- Original message --- >>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to >>> 10MB/s >>> De: German Anders <ganders at despegar.com> >>> Para: Mark Nelson <mark.nelson at inktank.com> >>> Cc: <ceph-users at lists.ceph.com> >>> Fecha: Wednesday, 13/08/2014 11:09 >>> >>> >>> Actually is very strange, since if i run the fio test on the client, >>> and also un parallel run a iostat on all the OSD servers, i don't see >>> any workload going on over the disks, I mean... nothing! 0.00....and >>> also the fio script on the client is reacting very rare too: >>> >>> >>> $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m >>> --size=10G --iodepth=16 --ioengine=libaio --runtime=60 >>> --group_reporting --name=file99 >>> file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, >>> iodepth=16 >>> fio-2.1.3 >>> Starting 1 process >>> Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >>> 01h:26m:43s] >>> >>> It's seems like is doing nothing.. >>> >>> >>> >>> German Anders >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> >>>> --- Original message --- >>>> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to >>>> 10MB/s >>>> De: Mark Nelson <mark.nelson at inktank.com> >>>> Para: <ceph-users at lists.ceph.com> >>>> Fecha: Wednesday, 13/08/2014 11:00 >>>> >>>> On 08/13/2014 08:19 AM, German Anders wrote: >>>>> >>>>> >>>>> Hi to all, >>>>> >>>>> I'm having a particular behavior on a >>>>> new Ceph cluster. >>>>> I've map >>>>> a RBD to a client and issue some performance tests with fio, at this >>>>> point everything goes just fine (also the results :) ), but then I try >>>>> to run another new test on a new RBD on the same client, and suddenly >>>>> the performance goes below 10MB/s and it took almost 10 minutes to >>>>> complete a 10G file test, if I issue a *ceph -w* I don't see anything >>>>> suspicious, any idea what can be happening here? >>>> >>>> When things are going fast, are your disks actually writing data out >>>> as >>>> fast as your client IO would indicate? (don't forgot to count >>>> replication!) It may be that the great speed is just writing data >>>> into >>>> the tmpfs journals (if the test is only 10GB and spread across 36 >>>> OSDs, >>>> it could finish pretty quickly writing to tmpfs!). FWIW, tmpfs >>>> journals >>>> aren't very safe. It's not something you want to use outside of >>>> testing >>>> except in unusual circumstances. >>>> >>>> In your tests, when things are bad: it's generally worth checking to >>>> see >>>> if any one disk/osd is backed up relative to the others. There are a >>>> couple of ways to accomplish this. the Ceph admin socket can tell you >>>> information about each OSD ie how many outstanding IOs and a history >>>> of >>>> slow ops. You can also look at per-disk statistics with something >>>> like >>>> iostat or collectl. >>>> >>>> Hope this helps! >>>> >>>>> >>>>> >>>>> >>>>> The cluster is made of: >>>>> >>>>> 3 x MON Servers >>>>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal >>>>> -> >>>>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each >>>>> server) >>>>> 2 x Network SW (Cluster and Public) >>>>> 10GbE speed on both networks >>>>> >>>>> The ceph.conf file is the following: >>>>> >>>>> [global] >>>>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1 >>>>> mon_initial_members = cephmon01, cephmon02, cephmon03 >>>>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3 >>>>> auth_client_required = cephx >>>>> auth_cluster_required = cephx >>>>> auth_service_required = cephx >>>>> filestore_xattr_use_omap = true >>>>> public_network = 10.97.0.0/16 >>>>> cluster_network = 192.168.10.0/24 >>>>> osd_pool_default_size = 2 >>>>> glance_api_version = 2 >>>>> >>>>> [mon] >>>>> debug_optracker = 0 >>>>> >>>>> [mon.cephmon01] >>>>> host = cephmon01 >>>>> mon_addr = 10.97.10.1:6789 >>>>> >>>>> [mon.cephmon02] >>>>> host = cephmon02 >>>>> mon_addr = 10.97.10.2:6789 >>>>> >>>>> [mon.cephmon03] >>>>> host = cephmon03 >>>>> mon_addr = 10.97.10.3:6789 >>>>> >>>>> [osd] >>>>> journal_dio = false >>>>> osd_journal_size = 4096 >>>>> fstype = btrfs >>>>> debug_optracker = 0 >>>>> >>>>> [osd.0] >>>>> host = cephosd01 >>>>> devs = /dev/sdc1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.1] >>>>> host = cephosd01 >>>>> devs = /dev/sdd1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.2] >>>>> host = cephosd01 >>>>> devs = /dev/sdf1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.3] >>>>> host = cephosd01 >>>>> devs = /dev/sdg1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.4] >>>>> host = cephosd01 >>>>> devs = /dev/sdi1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.5] >>>>> host = cephosd01 >>>>> devs = /dev/sdj1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.6] >>>>> host = cephosd01 >>>>> devs = /dev/sdl1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.7] >>>>> host = cephosd01 >>>>> devs = /dev/sdm1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.8] >>>>> host = cephosd01 >>>>> devs = /dev/sdn1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.9] >>>>> host = cephosd02 >>>>> devs = /dev/sdc1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.10] >>>>> host = cephosd02 >>>>> devs = /dev/sdd1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.11] >>>>> host = cephosd02 >>>>> devs = /dev/sdf1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.12] >>>>> host = cephosd02 >>>>> devs = /dev/sdg1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.13] >>>>> host = cephosd02 >>>>> devs = /dev/sdi1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.14] >>>>> host = cephosd02 >>>>> devs = /dev/sdj1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.15] >>>>> host = cephosd02 >>>>> devs = /dev/sdl1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.16] >>>>> host = cephosd02 >>>>> devs = /dev/sdm1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.17] >>>>> host = cephosd02 >>>>> devs = /dev/sdn1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.18] >>>>> host = cephosd03 >>>>> devs = /dev/sdc1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.19] >>>>> host = cephosd03 >>>>> devs = /dev/sdd1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.20] >>>>> host = cephosd03 >>>>> devs = /dev/sdf1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.21] >>>>> host = cephosd03 >>>>> devs = /dev/sdg1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.22] >>>>> host = cephosd03 >>>>> devs = /dev/sdi1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.23] >>>>> host = cephosd03 >>>>> devs = /dev/sdj1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.24] >>>>> host = cephosd03 >>>>> devs = /dev/sdl1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.25] >>>>> host = cephosd03 >>>>> devs = /dev/sdm1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.26] >>>>> host = cephosd03 >>>>> devs = /dev/sdn1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.27] >>>>> host = cephosd04 >>>>> devs = /dev/sdc1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.28] >>>>> host = cephosd04 >>>>> devs = /dev/sdd1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.29] >>>>> host = cephosd04 >>>>> devs = /dev/sdf1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.30] >>>>> host = cephosd04 >>>>> devs = /dev/sdg1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.31] >>>>> host = cephosd04 >>>>> devs = /dev/sdi1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.32] >>>>> host = cephosd04 >>>>> devs = /dev/sdj1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.33] >>>>> host = cephosd04 >>>>> devs = /dev/sdl1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.34] >>>>> host = cephosd04 >>>>> devs = /dev/sdm1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [osd.35] >>>>> host = cephosd04 >>>>> devs = /dev/sdn1 >>>>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >>>>> >>>>> [client.volumes] >>>>> keyring = /etc/ceph/ceph.client.volumes.keyring >>>>> >>>>> >>>>> Thanks in advance, >>>>> >>>>> Best regards, >>>>> >>>>> *German Anders >>>>> * >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users at lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users at lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users at lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > > > > -- > Mariusz Gronczewski, Administrator > > Efigence S. A. > ul. Wo?oska 9a, 02-583 Warszawa > T: [+48] 22 380 13 13 > F: [+48] 22 380 13 14 > E: mariusz.gronczewski at efigence.com > <mailto:mariusz.gronczewski at efigence.com> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/a35cfe29/attachment.htm>