Re: Re layout help: need chassis local io to minimize net links

"Harry G. Coin" <hgcoin@xxxxxxxxx> · Tue, 30 Jun 2020 00:20:36 -0500

Anthony asked about the 'use case'.  Well, I haven't gone into details
because I worried it wouldn't help much.  From a 'ceph' perspective, the
sandbox layout goes like this:  4 pretty much identical old servers,
each with 6 drives, and a smaller server just running a mon to break
ties.  Usual front-side lan, separate back-side networking setup.  Each
of the servers is running a few vms, all more or less identical for the
test case. Each of the vms is supported by a rbd via user space libvirt
(not kernel mapped).  Each rbd belongs to a pool that is entirely local
to the chassis, presently a replica on 3 of the osds.  One of the
littler vms runs a mon+mgr per chassis.   Of course what's important is
there's a pool that spans the chassis and does all the usual things for
userland ceph is good at.  But for these tests I just unplugged all
that.  So, do any process that involves a bunch of little writes -- like
installing a package or updating a initramfs and be ready to sit for a
long time.  All the drives are 7200 rpm SATA spinners.  CPU's are not
overloaded (fewer vms than cores), no swapping, memory left over.  All
write-back caching, virtio drives.  Ceph octopus latest, though it's no
better than nautilus performance wise in this case.  Ubuntu
LTS/focal/20.04 I think.  Checked all the networking stats, no dropped
packets, no overflow buffers and anyhow there shouldn't be any important
traffic on the front side and only ceph owns the back end.  No ceph
problems reported, all pgs active, nothing misplaced, no erasure coded
pools.  

So, there's a tiny novel, thanks for sticking with it!

On 6/29/20 11:12 PM, Anthony D'Atri wrote:
>> Thanks for the thinking.  By 'traffic' I mean:  when a user space rbd
>> write has as a destination three replica osds in the same chassis
> eek.
>
>>  does the whole write get shipped out to the mon and then back
> Mons are control-plane only.
>
>> All the 'usual suspects' like lossy ethernets and miswirings, etc. have
>> been checked.   It's actually painful to sit and wait while
>> 'update-initramfs' can take over a minute when the vm is chassis-local
>> to the osds getting the write info.
> You have shared almost none of your hardware or use-case.  We know that you’re doing convergence, with unspecified CPU, memory, drives.  We also don’t know how heavy your colocated compute workload is.  Since you mention update-initramfs, I’ll guess that your workload is VMs with RBD volumes attached to libvirt/QEMU?  With unspecified RBD cache configuration.  We also know nothing of your network setup and saturation.
>
> I have to suspect that either you’re doing something fundamentally wrong, or should just set up a RAID6 volume and carve out LVMs.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx