Re: how to judge the results? - rados bench comparison

Christian Balzer <chibi@xxxxxxx> · Wed, 17 Apr 2019 23:35:16 +0900

On Wed, 17 Apr 2019 16:08:34 +0200 Lars Täuber wrote:

> Wed, 17 Apr 2019 20:01:28 +0900
> Christian Balzer <chibi@xxxxxxx> ==> Ceph Users <ceph-users@xxxxxxxxxxxxxx> :
> > On Wed, 17 Apr 2019 11:22:08 +0200 Lars Täuber wrote:
> >   
> > > Wed, 17 Apr 2019 10:47:32 +0200
> > > Paul Emmerich <paul.emmerich@xxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :    
> > > > The standard argument that it helps preventing recovery traffic from
> > > > clogging the network and impacting client traffic is missleading:      
> > > 
> > > What do you mean by "it"? I don't know the standard argument.
> > > Do you mean separating the networks or do you mean having both together in one switched network?
> > >     
> > He means separated networks, obviously.
> >   
> > > > 
> > > > * write client traffic relies on the backend network for replication
> > > > operations: your client (write) traffic is impacted anyways if the
> > > > backend network is full      
> > > 
> > > This I understand as an argument for separating the networks and the backend network being faster than the frontend network.
> > > So in case of reconstruction there should be some bandwidth left in the backend for the traffic that is used for the client IO.
> > >     
> > You need to run the numbers and look at the big picture.
> > As mentioned already, this is all moot in your case.
> > 
> > 6 HDDs at realistically 150MB/s each, if they were all doing sequential
> > I/O. which they aren't. 
> > But the for the sake of argument lest say that one of your nodes can read
> > (or write, not both at the same time) 900MB/s.
> > That's still less than half of a single 25Gb/s link.  
> 
> Is this really true also with the WAL device (combined with the DB device) which is a (fast) SSD in our setup?
> reading: 			2150MB/s
> writing: 			2120MB/s
> IOPS 4K reading/wrtiing 	440k/320k
> 
> If so, the next version of OSD host will be adjusted in HW requirements.
> 
Yes.
Read up on how WAL/DB is involved in client data writes (only small ones)
and reads (not at all). 

Small writes will incur CPU (Ceph) and latency penalties (Ceph and
network) and on top of that your WAL will run out of space quickly, too.
It's nice for small bursts, but nothing sustained.

> 
> > And that very hypothetical data rate (it's not sequential, you will
> > concurrent operations and thus seeks) is all your node can handle, if it
> > all going into recovery/rebalancing your clients are starved because of
> > that, not bandwidth exhaustion.  
> 
> If it like this also with our SSD WAL, the next version of OSD host will be adjusted in HW requirements.
> 
Most people trim down recovery/backfill settings so that they don't impact
client I/O. Which again makes the network separation less useful. 
And the WAL/DB is not involved in these activities at all, as they happen
on an object (4MB default) level.

If I had 25Gb/s and 10Gb/s ports and your nodes and were dead set on
separating networks (I'm not), I'd give the faster one to the clients so
they can benefit from cached reads while replication and recovery still
wouldn't be limited by the 10Gb/s network.

Christian

> Thanks
> Lars
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com