Re: very different performance on two volumes in the same pool

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Sun, 26 Apr 2015 17:17:18 +0000

Hi Nik,
Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc trace, are you running with tcmalloc by the way ?
What about my other question, is the performance of slow volume increasing if you stop IO on the other volume ?
Are you using default ceph.conf ? Probably, you want to try with different osd_op_num_shards (may be = 10 , based on your osd server config) and osd_op_num_threads_per_shard (may be = 1). Also, you may want to see the effect by doing osd_enable_op_tracker = false

Are you seeing similar resource consumption on both the servers while IO is going on ?

Need some information about your client, are the volumes exposed with krbd or running with librbd environment ? If krbd and with same physical box, hope you mapped the images with 'noshare' enabled.

Too many questions :-)  But, this may give some indication what is going on there.

Thanks & Regards
Somnath

-----Original Message-----
From: Nikola Ciprich [mailto:nikola.ciprich@xxxxxxxxxxx] 
Sent: Sunday, April 26, 2015 7:32 AM
To: Somnath Roy
Cc: ceph-users@xxxxxxxxxxxxxx; nik@xxxxxxxxxxx
Subject: Re:  very different performance on two volumes in the same pool

Hello Somnath,

On Fri, Apr 24, 2015 at 04:23:19PM +0000, Somnath Roy wrote:
> This could be again because of tcmalloc issue I reported earlier.
> 
> Two things to observe.
> 
> 1. Is the performance improving if you stop IO on other volume ? If so, it could be different issue.
there is no other IO.. only cephfs mounted, but no users of it.

> 
> 2. Run perf top in the OSD node and see if tcmalloc traces are popping up.

don't see anything special:

  3.34%  libc-2.12.so                  [.] _int_malloc
  2.87%  libc-2.12.so                  [.] _int_free
  2.79%  [vdso]                        [.] __vdso_gettimeofday
  2.67%  libsoftokn3.so                [.] 0x000000000001fad9
  2.34%  libfreeblpriv3.so             [.] 0x00000000000355e6
  2.33%  libpthread-2.12.so            [.] pthread_mutex_unlock
  2.19%  libpthread-2.12.so            [.] pthread_mutex_lock
  1.80%  libc-2.12.so                  [.] malloc
  1.43%  [kernel]                      [k] do_raw_spin_lock
  1.42%  libc-2.12.so                  [.] memcpy
  1.23%  [kernel]                      [k] __switch_to
  1.19%  [kernel]                      [k] acpi_processor_ffh_cstate_enter
  1.09%  libc-2.12.so                  [.] malloc_consolidate
  1.08%  [kernel]                      [k] __schedule
  1.05%  libtcmalloc.so.4.1.0          [.] 0x0000000000017e6f
  0.98%  libc-2.12.so                  [.] vfprintf
  0.83%  libstdc++.so.6.0.13           [.] std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char,
  0.76%  libstdc++.so.6.0.13           [.] 0x000000000008092a
  0.73%  libc-2.12.so                  [.] __memset_sse2
  0.72%  libc-2.12.so                  [.] __strlen_sse42
  0.70%  libstdc++.so.6.0.13           [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long)
  0.68%  libpthread-2.12.so            [.] pthread_mutex_trylock
  0.67%  librados.so.2.0.0             [.] ceph_crc32c_sctp
  0.63%  libpython2.6.so.1.0           [.] 0x000000000007d823
  0.55%  libnss3.so                    [.] 0x0000000000056d2a
  0.52%  libc-2.12.so                  [.] free
  0.50%  libstdc++.so.6.0.13           [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)

should I check anything else?
BR
nik

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Nikola Ciprich
> Sent: Friday, April 24, 2015 7:10 AM
> To: ceph-users@xxxxxxxxxxxxxx
> Cc: nik@xxxxxxxxxxx
> Subject:  very different performance on two volumes in the same pool
> 
> Hello,
> 
> I'm trying to solve a bit mysterious situation:
> 
> I've got 3 nodes CEPH cluster, and pool made of 3 OSDs (each on one node), OSDs are 1TB SSD drives.
> 
> pool has 3 replicas set. I'm measuring random IO performance using fio:
> 
> fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 --readwrite=randread --output=randio.log
> 
> it's giving very nice performance of ~ 186K IOPS for random read.
> 
> the problem is, I've got one volume on which it fives only ~20K IOPS and I can't figure why. It's created using python, so I first suspected it can be similar to missing layerign problem I was consulting here few days ago, but when I tried reproducing it, I'm beting ~180K IOPS even for another volumes created using python.
> 
> so there is only this one problematic, others are fine. Since there is only one SSD in each box and I'm using 3 replicas, there should not be any difference in physical storage used between volumes..
> 
> I'm using hammer, 0.94.1, fio 2.2.6.
> 
> here's RBD info:
> 
> "slow" volume:
> 
> [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-6 rbd image 'vmtst23-6':
>     size 30720 MB in 7680 objects
>     order 22 (4096 kB objects)
>     block_name_prefix: rbd_data.1376d82ae8944a
>     format: 2
>     features:
>     flags:
> 
> "fast" volume:
> [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-7 rbd image 'vmtst23-7':
>     size 30720 MB in 7680 objects
>     order 22 (4096 kB objects)
>     block_name_prefix: rbd_data.13d01d2ae8944a
>     format: 2
>     features:
>     flags:
> 
> any idea on what could be wrong here?
> 
> thanks a lot in advance!
> 
> BR
> 
> nik
> 
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: servis@xxxxxxxxxxx
> -------------------------------------
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com