RE: crimson-osd vs legacy-osd: should the perf difference be already noticeable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx <ceph-devel-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Roman Penyaev
> Sent: Friday, January 10, 2020 10:54 AM
> To: kefu chai <tchaikov@xxxxxxxxx>
> Cc: Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx>; Samuel Just
> <sjust@xxxxxxxxxx>; The Esoteric Order of the Squid Cybernetic <ceph-
> devel@xxxxxxxxxxxxxxx>
> Subject: Re: crimson-osd vs legacy-osd: should the perf difference be already
> noticeable?
> 
> On 2020-01-10 17:18, kefu chai wrote:
> 
> [skip]
> 
> >>
> >> First thing that catches my eye is that for small blocks there is no
> >> big difference at all, but as the block increases, crimsons iops
> >> starts to
> >
> > that's also our findings. and it's expected. as async messenger uses
> > the same reactor model as seastar does. actually its original
> > implementation was adapted from seastar's socket stream
> > implementation.
> 
> Hm, regardless of model messenger should not be a bottleneck.  Take a look on
> the results of fio_ceph_messenger load (runs pure messenger), I can squeeze
> IOPS=89.8k, BW=351MiB/s on 4k block size, iodepth=32.
> (also good example https://github.com/ceph/ceph/pull/26932 , almost
> ~200k)
> 
> With PG layer (memstore_debug_omit_block_device_write=true option) I can
> reach 40k iops max.  Without PG layer (immediate completion from the
> transport callback, osd_immediate_completions=true) I get almost 60k.
> 
> Seems that here starts playing costs on client side and these costs prevail.
> 
> >
> >> decline. Can it be the transport issue? Can be tested as well.
> >
> > because seastar's socket facility reads from the wire with 4K chunk
> > size, while classic OSD's async messenger reads the payload with the
> > size suggested by the header. so when it comes to larger block size,
> > it takes crimson-osd multiple syscalls and memcpy calls to read the
> > request from wire, that's why classic OSD wins in this case.
> 
> Do you plan to fix that?
> 
> > have you tried to use multiple fio clients to saturate CPU capacity of
> > OSD nodes?
> 
> Not yet.  But regarding CPU I have these numbers:
> 
> output of pidstat while rbd.fio is running, 4k block only:
> 
> legacy-osd
> 
> [roman@dell ~]$ pidstat 1 -p 109930
> Linux 5.3.13-arch1-1 (dell)     01/09/2020      _x86_64_        (8 CPU)
> 
> 03:51:49 PM   UID       PID    %usr %system  %guest   %wait    %CPU
> CPU  Command
> 03:51:51 PM  1000    109930   14.00    8.00    0.00    0.00   22.00
> 1  ceph-osd
> 03:51:52 PM  1000    109930   40.00   19.00    0.00    0.00   59.00
> 1  ceph-osd
> 03:51:53 PM  1000    109930   44.00   17.00    0.00    0.00   61.00
> 1  ceph-osd
> 03:51:54 PM  1000    109930   40.00   20.00    0.00    0.00   60.00
> 1  ceph-osd
> 03:51:55 PM  1000    109930   39.00   18.00    0.00    0.00   57.00
> 1  ceph-osd
> 03:51:56 PM  1000    109930   41.00   20.00    0.00    0.00   61.00
> 1  ceph-osd
> 03:51:57 PM  1000    109930   41.00   15.00    0.00    0.00   56.00
> 1  ceph-osd
> 03:51:58 PM  1000    109930   42.00   16.00    0.00    0.00   58.00
> 1  ceph-osd
> 03:51:59 PM  1000    109930   42.00   15.00    0.00    0.00   57.00
> 1  ceph-osd
> 03:52:00 PM  1000    109930   43.00   15.00    0.00    0.00   58.00
> 1  ceph-osd
> 03:52:01 PM  1000    109930   24.00   12.00    0.00    0.00   36.00
> 1  ceph-osd
> 
> 
> crimson-osd
> 
> [roman@dell ~]$ pidstat 1  -p 108141
> Linux 5.3.13-arch1-1 (dell)     01/09/2020      _x86_64_        (8 CPU)
> 
> 03:47:50 PM   UID       PID    %usr %system  %guest   %wait    %CPU
> CPU  Command
> 03:47:55 PM  1000    108141   67.00   11.00    0.00    0.00   78.00
> 0  crimson-osd
> 03:47:56 PM  1000    108141   79.00   12.00    0.00    0.00   91.00
> 0  crimson-osd
> 03:47:57 PM  1000    108141   81.00    9.00    0.00    0.00   90.00
> 0  crimson-osd
> 03:47:58 PM  1000    108141   78.00   12.00    0.00    0.00   90.00
> 0  crimson-osd
> 03:47:59 PM  1000    108141   78.00   12.00    0.00    1.00   90.00
> 0  crimson-osd
> 03:48:00 PM  1000    108141   78.00   13.00    0.00    0.00   91.00
> 0  crimson-osd
> 03:48:01 PM  1000    108141   79.00   13.00    0.00    0.00   92.00
> 0  crimson-osd
> 03:48:02 PM  1000    108141   78.00   12.00    0.00    0.00   90.00
> 0  crimson-osd
> 03:48:03 PM  1000    108141   77.00   11.00    0.00    0.00   88.00
> 0  crimson-osd
> 03:48:04 PM  1000    108141   79.00   12.00    0.00    1.00   91.00
> 0  crimson-osd
> 
> 
> Seems quite saturated, almost twice more than legacy-osd.  Did you see
> something similar?
Crimson-osd (seastar) use epoll, by default, it will use more cpu capacity,(you can change epoll mode setting to reduce it), add Ma, Jianpeng in the thread since he did more study on it. 
BTW, by default crimson-osd is one thread, and legacy ceph-osd (3 threads for async messenger, 2x8 threads for osd (SDD), finisher thread etc,) ,so by default setting, it is 1 thread compare to over 10 threads work,  it is expected crimson-osd not show obvious difference. you can change the default thread number for legacy ceph-osd(such as thread=1 for each layer to see more difference.)
BTW, please use release build to do test. 
Crimson-osd is aysnc model, if workload is very light, can't take more advantage of it.
> 
> --
> Roman





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux