Re: Significant slowdown of osds since v0.67 Dumpling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Saw your logs.  I thought it might be enabling
filestore_xattr_use_omap, but it isn't.  PGLog::undirty() doesn't seem
to be using very much cpu.
-Sam

On Mon, Aug 26, 2013 at 2:04 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> Hey Samuel,
>
> I have been trying to get it reproduced on my test-cluster and seem to
> have found a way.  Try: `rbd bench-write test --io-threads 80
> --io-pattern=rand'.  On my test-cluster, this closely replicates what I
> see during profiling on my production-cluster, including the extra
> CPU-usage by leveldb, which doesn't show up on Cuttlefish.  It's very
> curious that "PGLog::undirty()" is also still showing up near the top,
> even in 0.67.2.
>
> I'll send you the logs by private mail.
>
>
>    Regards,
>
>       Oliver
>
> On ma, 2013-08-26 at 13:35 -0700, Samuel Just wrote:
>> Can you attach a log from the startup of one of the dumpling osds on
>> your production machine (no need for logging, just need some of the
>> information dumped on every boot)?
>>
>> libleveldb is leveldb.  We've used leveldb for a few things since
>> bobtail.  If anything, the load on leveldb should be lighter in
>> dumpling, I would think...  I'll have to try to reproduce it locally.
>> I'll keep you posted.
>> -Sam
>>
>> On Sat, Aug 24, 2013 at 10:11 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> > Hey Samuel,
>> >
>> > Unfortunately, disabling "wbthrottle" made almost no difference on my
>> > production-cluster.  OSD-load was still much higher on Dumpling.
>> >
>> > I've mentioned this several times already, but when profiling with `perf
>> > top' on my production-cluster, any time I'm running a Dumpling-OSD,
>> > several "libleveldb"-related entries come up near the top, that don't
>> > show up when running the Cuttlefish-OSD at all.  Let's concentrate on
>> > that for a moment, as it's a clearly visible difference on my
>> > production-cluster, which shows the actual problem.
>> >
>> > Dumpling OSDs:
>> >  17.23%  [kernel]                     [k] intel_idle
>> >   6.35%  [kernel]                     [k] find_busiest_group
>> >   4.36%  kvm                          [.] 0x2cdbb0
>> >   3.38%  libleveldb.so.1.9            [.] 0x22821
>> >   2.40%  libc-2.11.3.so               [.] memcmp
>> >   2.04%  ceph-osd                     [.] ceph_crc32c_le_intel
>> >   1.90%  [kernel]                     [k] _raw_spin_lock
>> >   1.87%  [kernel]                     [k] copy_user_generic_string
>> >   1.35%  [kernel]                     [k]
>> > default_send_IPI_mask_sequence_phys
>> >   1.34%  [kernel]                     [k] __hrtimer_start_range_ns
>> >   1.14%  libc-2.11.3.so               [.] memcpy
>> >   1.03%  [kernel]                     [k] hrtimer_interrupt
>> >   1.01%  [kernel]                     [k] do_select
>> >   1.00%  [kernel]                     [k] __schedule
>> >   0.99%  [kernel]                     [k] _raw_spin_unlock_irqrestore
>> >   0.97%  [kernel]                     [k] cpumask_next_and
>> >   0.97%  [kernel]                     [k] find_next_bit
>> >   0.96%  libleveldb.so.1.9            [.]
>> > leveldb::InternalKeyComparator::Compar
>> >   0.91%  [kernel]                     [k] _raw_spin_lock_irqsave
>> >   0.91%  [kernel]                     [k] fget_light
>> >   0.89%  [kernel]                     [k] clockevents_program_event
>> >   0.79%  [kernel]                     [k] sync_inodes_sb
>> >   0.78%  libleveldb.so.1.9            [.] leveldb::Block::Iter::Next()
>> >   0.75%  [kernel]                     [k] apic_timer_interrupt
>> >   0.70%  [kernel]                     [k] native_write_cr0
>> >   0.60%  [kvm_intel]                  [k] vmx_vcpu_run
>> >   0.58%  [kernel]                     [k] load_balance
>> >   0.57%  [kernel]                     [k] rcu_needs_cpu
>> >   0.56%  ceph-osd                     [.] PGLog::undirty()
>> >   0.51%  libpthread-2.11.3.so         [.] pthread_mutex_lock
>> >   0.50%  [vdso]                       [.] 0x7fff6dbff6ce
>> >
>> > Same load, but with Cuttlefish-OSDs:
>> >  19.23%  [kernel]                     [k] intel_idle
>> >   6.43%  [kernel]                     [k] find_busiest_group
>> >   5.25%  kvm                          [.] 0x152a75
>> >   2.70%  ceph-osd                     [.] ceph_crc32c_le
>> >   2.44%  [kernel]                     [k] _raw_spin_lock
>> >   1.95%  [kernel]                     [k] copy_user_generic_string
>> >   1.53%  [kernel]                     [k]
>> > default_send_IPI_mask_sequence_phys
>> >   1.28%  [kernel]                     [k] __hrtimer_start_range_ns
>> >   1.21%  [kernel]                     [k] do_select
>> >   1.19%  [kernel]                     [k] hrtimer_interrupt
>> >   1.19%  [kernel]                     [k] _raw_spin_unlock_irqrestore
>> >   1.16%  [kernel]                     [k] fget_light
>> >   1.12%  [kernel]                     [k] cpumask_next_and
>> >   1.11%  [kernel]                     [k] clockevents_program_event
>> >   1.08%  [kernel]                     [k] __schedule
>> >   1.08%  [kernel]                     [k] find_next_bit
>> >   0.99%  [kernel]                     [k] _raw_spin_lock_irqsave
>> >   0.90%  [kernel]                     [k] native_write_cr0
>> >   0.83%  [kernel]                     [k] native_write_msr_safe
>> >   0.82%  [kernel]                     [k] apic_timer_interrupt
>> >   0.70%  libc-2.11.3.so               [.] memcpy
>> >   0.68%  [kernel]                     [k] sync_inodes_sb
>> >   0.63%  [kernel]                     [k] tg_load_down
>> >   0.63%  [kernel]                     [k] load_balance
>> >   0.61%  libpthread-2.11.3.so         [.] pthread_mutex_lock
>> >   0.58%  [kernel]                     [k] rcu_needs_cpu
>> >   0.57%  [kernel]                     [k] fput
>> >   0.56%  libc-2.11.3.so               [.] 0x7fb29
>> >   0.54%  [vdso]                       [.] 0x7fff2afb873a
>> >   0.50%  [kernel]                     [k] iput
>> >   0.50%  [kernel]                     [k] reschedule_interrupt
>> >
>> > It seems to me like "libleveldb" is accounting for significant extra
>> > CPU-loading on Dumpling.  Another interesting fact, is that I only see
>> > it use so much CPU on my production-cluster, which is serving around 80
>> > KVMs over RBD, over 2x10Gbit/s ethernet per node and so has a much
>> > higher and more diverse load than what I can generate on my
>> > test-cluster.
>> >
>> > Any ideas about this particular one?  Did anything change between
>> > Cuttlefish and Dumpling that would account for higher CPU-usage by
>> > "libleveldb", particularly under this client-load?  Anything I can try
>> > to bring it down?  Thanks!
>> >
>> >
>> >    Regards,
>> >
>> >      Oliver
>> >
>> > On vr, 2013-08-23 at 13:55 -0700, Samuel Just wrote:
>> >> Ok, can you try setting filestore_op_threads to 1 on both cuttlefish
>> >> and wip-dumpling-perf (with and with wbthrottle, default wbthrottle
>> >> settings).  I suspect I created contention in the filestore op threads
>> >> (FileStore::lfn_open specifically), and if so setting it to only use 1
>> >> thread should even out the performance.
>> >> -Sam
>> >>
>> >> On Fri, Aug 23, 2013 at 1:44 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> >> > Hey Samuel,
>> >> >
>> >> > I commented the earlier settings out, so it was with defaults.
>> >> >
>> >> >
>> >> >    Regards,
>> >> >
>> >> >       Oliver
>> >> >
>> >> > On vr, 2013-08-23 at 13:35 -0700, Samuel Just wrote:
>> >> >> When you were running with the wbthrottle on, did you have the
>> >> >> settings I gave you earlier set, or was it using the defaults?
>> >> >> -Sam
>> >> >>
>> >> >> On Fri, Aug 23, 2013 at 12:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> >> >> > Hey Samuel,
>> >> >> >
>> >> >> > That changed something, for the better. :-)
>> >> >> >
>> >> >> > Your test-version, with wbthrottle off:
>> >> >> > # ceph-osd --version
>> >> >> > ceph version 0.67.1-18-g3fe3368
>> >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8)
>> >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep
>> >> >> > wbthrottle_enable
>> >> >> >   "filestore_wbthrottle_enable": "false",
>> >> >> > # rbd bench-write test --io-pattern=rand
>> >> >> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern rand
>> >> >> >   SEC       OPS   OPS/SEC   BYTES/SEC
>> >> >> >     1       838    837.36  2283825.44
>> >> >> >     2      1679    827.66  2309860.21
>> >> >> >     3      2488    829.30  2320967.64
>> >> >> >     4      2812    655.55  1847713.58
>> >> >> >     5      3301    659.64  1849006.69
>> >> >> > <...>
>> >> >> >   120     54113    450.83  1271297.14
>> >> >> >   121     54847    453.13  1277957.21
>> >> >> >   122     55502    454.93  1281769.31
>> >> >> >   123     55768    451.09  1271512.03
>> >> >> >   124     55853    450.42  1269452.15
>> >> >> >   126     55889    442.22  1246494.89
>> >> >> >   127     56351    443.71  1250308.66
>> >> >> >   128     57135    446.36  1257746.91
>> >> >> >   129     57445    444.03  1251566.31
>> >> >> >   130     57944    445.72  1256426.99
>> >> >> >
>> >> >> > Now with the wbthrottle-option enabled:
>> >> >> > # ceph-osd --version
>> >> >> > ceph version 0.67.1-18-g3fe3368
>> >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8)
>> >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep
>> >> >> > wbthrottle_enable
>> >> >> >   "filestore_wbthrottle_enable": "true",
>> >> >> > # rbd bench-write test --io-pattern=randbench-write  io_size 4096
>> >> >> > io_threads 16 bytes 1073741824 pattern rand
>> >> >> >   SEC       OPS   OPS/SEC   BYTES/SEC
>> >> >> >     1       822    821.23  2303875.45
>> >> >> >     2      1510    738.54  2021390.37
>> >> >> >     3      2224    740.29  1989216.54
>> >> >> >     4      2843    708.01  1912598.24
>> >> >> >     5      3294    652.35  1774048.26
>> >> >> > <...>
>> >> >> >   120     59470    495.58  1388776.41
>> >> >> >   122     60067    489.29  1370954.96
>> >> >> >   123     60083    487.74  1366434.49
>> >> >> >   124     60449    487.49  1366477.77
>> >> >> >   125     61122    488.97  1370480.73
>> >> >> >   126     61679    489.52  1372290.03
>> >> >> >   127     62195    489.69  1372523.64
>> >> >> >   128     62608    489.02  1370226.98
>> >> >> >   129     62655    485.68  1360719.66
>> >> >> >   130     62688    482.01  1350560.76
>> >> >> >
>> >> >> > Slightly faster than with wbthrottle disabled..  For comparison, one
>> >> >> > more time with Cuttlefish-OSDs:
>> >> >> > # ceph-osd --version
>> >> >> > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>> >> >> > # rbd bench-write test --io-pattern=rand
>> >> >> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern rand
>> >> >> >   SEC       OPS   OPS/SEC   BYTES/SEC
>> >> >> >     1      1074   1073.80  3079631.51
>> >> >> >     2      2121   1060.21  3001542.58
>> >> >> >     3      3111    845.59  2422586.94
>> >> >> >     4      3132    745.40  2132915.48
>> >> >> >     5      3336    665.33  1904205.15
>> >> >> > <...>
>> >> >> >   120     63700    530.81  1530767.77
>> >> >> >   121     64781    535.37  1544861.37
>> >> >> >   122     64944    532.10  1535398.54
>> >> >> >   123     65312    527.70  1522971.49
>> >> >> >   124     65526    528.42  1525134.71
>> >> >> >   125     66160    529.12  1527158.94
>> >> >> >   126     67214    533.44  1539955.47
>> >> >> >   127     67855    534.29  1541965.13
>> >> >> >   128     67939    528.79  1526270.55
>> >> >> >   129     68178    528.44  1525239.57
>> >> >> >   130     68811    529.24  1527681.84
>> >> >> >
>> >> >> > Cuttlefish still wins, but by a much smaller margin now.  Looks like
>> >> >> > we're onto something.  The fdatasync seems to be the key here, rather
>> >> >> > than disabling wbthrottle.
>> >> >> >
>> >> >> >
>> >> >> >    Regards,
>> >> >> >
>> >> >> >       Oliver
>> >> >> >
>> >> >> > On 23-08-13 19:53, Samuel Just wrote:
>> >> >> >> I pushed a branch, wip-dumpling-perf.  It does two things:
>> >> >> >> 1) adds a config filestore_wbthrottle_enable (defaults to true) to
>> >> >> >> allow you to disable the wbthrottle altogether
>> >> >> >> 2) causes the wbthrottle when enabled to fdatasync rather than fsync.
>> >> >> >>
>> >> >> >> Can you rerun the random workload with that branch with
>> >> >> >> filestore_wbthrottle_enable on and then off?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Fri, Aug 23, 2013 at 10:26 AM, Oliver Daudey <oliver@xxxxxxxxx>
>> >> >> > wrote:
>> >> >> >>> Hey Sage,
>> >> >> >>>
>> >> >> >>> I'm all for it and will help testing.
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>    Regards,
>> >> >> >>>
>> >> >> >>>       Oliver
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux