Performance really drops from 700MB/s to 10MB/s

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Thu, 14 Aug 2014 11:42:25 -0700

I find graphs really help here.  One screen that has all the disk I/O
and latency for all OSDs makes it easy to pin point the bottleneck.

If you don't have that, I'd go low tech: Watch the blinky lights. It's
really easy to see which disk is the hotspot.

On Thu, Aug 14, 2014 at 6:56 AM, Mariusz Gronczewski
<mariusz.gronczewski at efigence.com> wrote:
> Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful.
>
> Few ideas:
>
> * do 'ceph health detail' to get detail of which OSD is stalling
> * 'ceph osd perf' to see latency of each osd
> * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok dump_historic_ops' shows "recent slow" ops
>
> I actually have very similiar problem, cluster goes full speed (sometimes even for hours) and suddenly everything stops for a minute or 5, no disk IO, no IO wait (so disks are fine), no IO errors in kernel log, and OSDs only complain that other OSD subop is slow (but on that OSD everything looks fine too)
>
> On Wed, 13 Aug 2014 16:04:30 -0400, German Anders
> <ganders at despegar.com> wrote:
>
>> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that
>> it freeze the prompt. Any ideas? I've attach some syslogs from one of
>> the OSD servers and also from the client. Both are running Ubuntu
>> 14.04LTS with Kernel  3.15.8.
>> The cluster is not usable at this point, since I can't run a "ls" on
>> the rbd.
>>
>> Thanks in advance,
>>
>> Best regards,
>>
>>
>> German Anders
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > --- Original message ---
>> > Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>> > 10MB/s
>> > De: German Anders <ganders at despegar.com>
>> > Para: Mark Nelson <mark.nelson at inktank.com>
>> > Cc: <ceph-users at lists.ceph.com>
>> > Fecha: Wednesday, 13/08/2014 11:09
>> >
>> >
>> > Actually is very strange, since if i run the fio test on the client,
>> > and also un parallel run a iostat on all the OSD servers, i don't see
>> > any workload going on over the disks, I mean... nothing! 0.00....and
>> > also the fio script on the client is reacting very rare too:
>> >
>> >
>> > $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m
>> > --size=10G --iodepth=16 --ioengine=libaio --runtime=60
>> > --group_reporting --name=file99
>> > file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio,
>> > iodepth=16
>> > fio-2.1.3
>> > Starting 1 process
>> > Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> > 01h:26m:43s]
>> >
>> > It's seems like is doing nothing..
>> >
>> >
>> >
>> > German Anders
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >> --- Original message ---
>> >> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>> >> 10MB/s
>> >> De: Mark Nelson <mark.nelson at inktank.com>
>> >> Para: <ceph-users at lists.ceph.com>
>> >> Fecha: Wednesday, 13/08/2014 11:00
>> >>
>> >> On 08/13/2014 08:19 AM, German Anders wrote:
>> >>>
>> >>> Hi to all,
>> >>>
>> >>>                I'm having a particular behavior on a new Ceph cluster.
>> >>> I've map
>> >>> a RBD to a client and issue some performance tests with fio, at this
>> >>> point everything goes just fine (also the results :) ), but then I try
>> >>> to run another new test on a new RBD on the same client, and suddenly
>> >>> the performance goes below 10MB/s and it took almost 10 minutes to
>> >>> complete a 10G file test, if I issue a *ceph -w* I don't see anything
>> >>> suspicious, any idea what can be happening here?
>> >>
>> >> When things are going fast, are your disks actually writing data out
>> >> as
>> >> fast as your client IO would indicate? (don't forgot to count
>> >> replication!)  It may be that the great speed is just writing data
>> >> into
>> >> the tmpfs journals (if the test is only 10GB and spread across 36
>> >> OSDs,
>> >> it could finish pretty quickly writing to tmpfs!).  FWIW, tmpfs
>> >> journals
>> >> aren't very safe.  It's not something you want to use outside of
>> >> testing
>> >> except in unusual circumstances.
>> >>
>> >> In your tests, when things are bad: it's generally worth checking to
>> >> see
>> >> if any one disk/osd is backed up relative to the others.  There are a
>> >> couple of ways to accomplish this.  the Ceph admin socket can tell you
>> >> information about each OSD ie how many outstanding IOs and a history
>> >> of
>> >> slow ops.  You can also look at per-disk statistics with something
>> >> like
>> >> iostat or collectl.
>> >>
>> >> Hope this helps!
>> >>
>> >>>
>> >>>
>> >>>                The cluster is made of:
>> >>>
>> >>> 3 x MON Servers
>> >>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal
>> >>> ->
>> >>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each
>> >>> server)
>> >>> 2 x Network SW (Cluster and Public)
>> >>> 10GbE speed on both networks
>> >>>
>> >>>                The ceph.conf file is the following:
>> >>>
>> >>> [global]
>> >>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1
>> >>> mon_initial_members = cephmon01, cephmon02, cephmon03
>> >>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3
>> >>> auth_client_required = cephx
>> >>> auth_cluster_required = cephx
>> >>> auth_service_required = cephx
>> >>> filestore_xattr_use_omap = true
>> >>> public_network = 10.97.0.0/16
>> >>> cluster_network = 192.168.10.0/24
>> >>> osd_pool_default_size = 2
>> >>> glance_api_version = 2
>> >>>
>> >>> [mon]
>> >>> debug_optracker = 0
>> >>>
>> >>> [mon.cephmon01]
>> >>> host = cephmon01
>> >>> mon_addr = 10.97.10.1:6789
>> >>>
>> >>> [mon.cephmon02]
>> >>> host = cephmon02
>> >>> mon_addr = 10.97.10.2:6789
>> >>>
>> >>> [mon.cephmon03]
>> >>> host = cephmon03
>> >>> mon_addr = 10.97.10.3:6789
>> >>>
>> >>> [osd]
>> >>> journal_dio = false
>> >>> osd_journal_size = 4096
>> >>> fstype = btrfs
>> >>> debug_optracker = 0
>> >>>
>> >>> [osd.0]
>> >>> host = cephosd01
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.1]
>> >>> host = cephosd01
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.2]
>> >>> host = cephosd01
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.3]
>> >>> host = cephosd01
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.4]
>> >>> host = cephosd01
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.5]
>> >>> host = cephosd01
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.6]
>> >>> host = cephosd01
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.7]
>> >>> host = cephosd01
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.8]
>> >>> host = cephosd01
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.9]
>> >>> host = cephosd02
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.10]
>> >>> host = cephosd02
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.11]
>> >>> host = cephosd02
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.12]
>> >>> host = cephosd02
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.13]
>> >>> host = cephosd02
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.14]
>> >>> host = cephosd02
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.15]
>> >>> host = cephosd02
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.16]
>> >>> host = cephosd02
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.17]
>> >>> host = cephosd02
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.18]
>> >>> host = cephosd03
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.19]
>> >>> host = cephosd03
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.20]
>> >>> host = cephosd03
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.21]
>> >>> host = cephosd03
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.22]
>> >>> host = cephosd03
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.23]
>> >>> host = cephosd03
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.24]
>> >>> host = cephosd03
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.25]
>> >>> host = cephosd03
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.26]
>> >>> host = cephosd03
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.27]
>> >>> host = cephosd04
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.28]
>> >>> host = cephosd04
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.29]
>> >>> host = cephosd04
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.30]
>> >>> host = cephosd04
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.31]
>> >>> host = cephosd04
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.32]
>> >>> host = cephosd04
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.33]
>> >>> host = cephosd04
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.34]
>> >>> host = cephosd04
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.35]
>> >>> host = cephosd04
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [client.volumes]
>> >>> keyring = /etc/ceph/ceph.client.volumes.keyring
>> >>>
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> Best regards,
>> >>>
>> >>> *German Anders
>> >>> *
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users at lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users at lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users at lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wo?oska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczewski at efigence.com
> <mailto:mariusz.gronczewski at efigence.com>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>