I wouldn't trust the 3.15.x kernel, its already EOL, and has issues. http://tracker.ceph.com/issues/8818. that one hit me, i switched to a 3.14 kernel and my problems went away. its suppose to be released in 3.16.2, but i looked at the change log and couldn't find any reference of it, so i'm not sure if it made it into that release. mr.npp On Wed, Aug 13, 2014 at 1:04 PM, German Anders <ganders at despegar.com> wrote: > Also, even a "ls -ltr" could be done inside the /mnt of the RBD that it > freeze the prompt. Any ideas? I've attach some syslogs from one of the OSD > servers and also from the client. Both are running Ubuntu 14.04LTS with > Kernel 3.15.8. > The cluster is not usable at this point, since I can't run a "ls" on the > rbd. > > Thanks in advance, > > Best regards, > > > > *German Anders* > > > > > > > > > > > > > > > > --- Original message --- > *Asunto:* Re: Performance really drops from 700MB/s to > 10MB/s > *De:* German Anders <ganders at despegar.com> > *Para:* Mark Nelson <mark.nelson at inktank.com> > *Cc:* <ceph-users at lists.ceph.com> > *Fecha:* Wednesday, 13/08/2014 11:09 > > Actually is very strange, since if i run the fio test on the client, and > also un parallel run a iostat on all the OSD servers, i don't see any > workload going on over the disks, I mean... nothing! 0.00....and also the > fio script on the client is reacting very rare too: > > $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m --size=10G > --iodepth=16 --ioengine=libaio --runtime=60 --group_reporting --name=file99 > file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=16 > fio-2.1.3 > Starting 1 process > Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta > 01h:26m:43s] > > It's seems like is doing nothing.. > > > > > *German Anders* > > > > > > > > > > > > > > > > > > --- Original message --- > *Asunto:* Re: Performance really drops from 700MB/s to > 10MB/s > *De:* Mark Nelson <mark.nelson at inktank.com> > *Para:* <ceph-users at lists.ceph.com> > *Fecha:* Wednesday, 13/08/2014 11:00 > > On 08/13/2014 08:19 AM, German Anders wrote: > > Hi to all, > > I'm having a particular behavior on a new Ceph cluster. I've map > a RBD to a client and issue some performance tests with fio, at this > point everything goes just fine (also the results :) ), but then I try > to run another new test on a new RBD on the same client, and suddenly > the performance goes below 10MB/s and it took almost 10 minutes to > complete a 10G file test, if I issue a *ceph -w* I don't see anything > suspicious, any idea what can be happening here? > > > When things are going fast, are your disks actually writing data out as > fast as your client IO would indicate? (don't forgot to count > replication!) It may be that the great speed is just writing data into > the tmpfs journals (if the test is only 10GB and spread across 36 OSDs, > it could finish pretty quickly writing to tmpfs!). FWIW, tmpfs journals > aren't very safe. It's not something you want to use outside of testing > except in unusual circumstances. > > In your tests, when things are bad: it's generally worth checking to see > if any one disk/osd is backed up relative to the others. There are a > couple of ways to accomplish this. the Ceph admin socket can tell you > information about each OSD ie how many outstanding IOs and a history of > slow ops. You can also look at per-disk statistics with something like > iostat or collectl. > > Hope this helps! > > > The cluster is made of: > > 3 x MON Servers > 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal -> > there's one tmpfs of 36GB that is share by 9 OSD daemons, on each server) > 2 x Network SW (Cluster and Public) > 10GbE speed on both networks > > The ceph.conf file is the following: > > [global] > fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1 > mon_initial_members = cephmon01, cephmon02, cephmon03 > mon_host = 10.97.10.1,10.97.10.2,10.97.10.3 > auth_client_required = cephx > auth_cluster_required = cephx > auth_service_required = cephx > filestore_xattr_use_omap = true > public_network = 10.97.0.0/16 > cluster_network = 192.168.10.0/24 > osd_pool_default_size = 2 > glance_api_version = 2 > > [mon] > debug_optracker = 0 > > [mon.cephmon01] > host = cephmon01 > mon_addr = 10.97.10.1:6789 > > [mon.cephmon02] > host = cephmon02 > mon_addr = 10.97.10.2:6789 > > [mon.cephmon03] > host = cephmon03 > mon_addr = 10.97.10.3:6789 > > [osd] > journal_dio = false > osd_journal_size = 4096 > fstype = btrfs > debug_optracker = 0 > > [osd.0] > host = cephosd01 > devs = /dev/sdc1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.1] > host = cephosd01 > devs = /dev/sdd1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.2] > host = cephosd01 > devs = /dev/sdf1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.3] > host = cephosd01 > devs = /dev/sdg1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.4] > host = cephosd01 > devs = /dev/sdi1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.5] > host = cephosd01 > devs = /dev/sdj1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.6] > host = cephosd01 > devs = /dev/sdl1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.7] > host = cephosd01 > devs = /dev/sdm1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.8] > host = cephosd01 > devs = /dev/sdn1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.9] > host = cephosd02 > devs = /dev/sdc1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.10] > host = cephosd02 > devs = /dev/sdd1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.11] > host = cephosd02 > devs = /dev/sdf1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.12] > host = cephosd02 > devs = /dev/sdg1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.13] > host = cephosd02 > devs = /dev/sdi1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.14] > host = cephosd02 > devs = /dev/sdj1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.15] > host = cephosd02 > devs = /dev/sdl1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.16] > host = cephosd02 > devs = /dev/sdm1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.17] > host = cephosd02 > devs = /dev/sdn1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.18] > host = cephosd03 > devs = /dev/sdc1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.19] > host = cephosd03 > devs = /dev/sdd1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.20] > host = cephosd03 > devs = /dev/sdf1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.21] > host = cephosd03 > devs = /dev/sdg1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.22] > host = cephosd03 > devs = /dev/sdi1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.23] > host = cephosd03 > devs = /dev/sdj1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.24] > host = cephosd03 > devs = /dev/sdl1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.25] > host = cephosd03 > devs = /dev/sdm1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.26] > host = cephosd03 > devs = /dev/sdn1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.27] > host = cephosd04 > devs = /dev/sdc1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.28] > host = cephosd04 > devs = /dev/sdd1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.29] > host = cephosd04 > devs = /dev/sdf1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.30] > host = cephosd04 > devs = /dev/sdg1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.31] > host = cephosd04 > devs = /dev/sdi1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.32] > host = cephosd04 > devs = /dev/sdj1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.33] > host = cephosd04 > devs = /dev/sdl1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.34] > host = cephosd04 > devs = /dev/sdm1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [osd.35] > host = cephosd04 > devs = /dev/sdn1 > osd_journal = /mnt/ramdisk/$cluster-$id-journal > > [client.volumes] > keyring = /etc/ceph/ceph.client.volumes.keyring > > > Thanks in advance, > > Best regards, > > *German Anders > * > > > > > > > > > > > > > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140907/c7cfac23/attachment.htm>