Re: Effect of tunables on client system load

Nathanial Byrnes <nate@xxxxxxxxx> · Wed, 14 Jun 2017 14:26:13 -0400

Thanks for the input David. I'm not sold on xenserver per se, but, It is what we've been using for the past 7 years... Proxmox has been coming up a lot recently, I guess it is time to give it a look. I like the sound of directly using librbd.
   Regards,
   Nate

On Wed, Jun 14, 2017 at 10:30 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
I don't know if you're sold on Xen and only Xen, but I've been running a 3 node cluster hyper-converged on a 4 node Proxmox cluster for my home projects.  3 of the nodes are running on Proxmox with Ceph OSDs, Mon, and MDS daemons.  The fourth node is a much beefier system handling the majority of the Virtualization.  This is a system that works pretty much right out of the box and utilizes librbd instead of dealing with fuse or kernel drivers to access the Ceph disks.  Proxmox also has drivers for Gluster built-in if you want to compare to that as well.
In my setup I have all of my VM's primarily on the 4th dedicated VM host, but if it goes down for any reason, the important VM's will distribute themselves onto the other Proxmox nodes and then when the primary VM host is back up, they will move back.  Migrations between nodes takes less than a minute for live migration because it only needs to send over the current system state and ram information.

I would not suggest using Ceph for VM disks on any Hypervisor that does not utilize librbd to access the disks as RBDs.  That sort of VM usage is just not how Ceph was designed to host VM disks.  I know this doesn't answer your question, but I feel like you should have been asking a different question.

On Tue, Jun 13, 2017 at 9:43 PM Nathanial Byrnes <nate@xxxxxxxxx> wrote:
Thanks very much for the insights Greg!
My most recent suspicion around the resource consumption is that, with my current configuration, xen is provisioning rbd-nbd storage for guests, rather than just using the kernel module like I was last time around. And, (while I'm unsure of how this works) but it seems there is a tapdisk process for each guest on each xenserver along with the rbd-nbd processes. Perhaps due to this use of NBD xenserver is taking a scenic route through userspace that it wasn't before... That said, gluster is attached via fuse ... I apparently need to dig more into how Xen is attaching to Ceph vs gluster....

   Anyway, thanks again!

   Nate

On Tue, Jun 13, 2017 at 5:30 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes <nate@xxxxxxxxx> wrote:
Hi All,   First, some background: 
       I have been running a small (4 compute nodes) xen server cluster backed by both a small ceph (4 other nodes with a total of 18x 1-spindle osd's) and small gluster cluster (2 nodes each with a 14 spindle RAID array). I started with gluster 3-4 years ago, at first using NFS to access gluster, then upgraded to gluster FUSE. However, I had been facinated with ceph since I first read about it, and probably added ceph as soon as XCP released a kernel with RBD support, possibly approaching 2 years ago.
       With Ceph, since I started out with the kernel RBD, I believe it locked me to Bobtail tunables. I connected to XCP via a project that tricks XCP into running LVM on the RBDs managing all this through the iSCSI mgmt infrastructure somehow... Only recently I've switched to a newer project that uses the RBD-NBD mapping instead. This should let me use whatever tunables my client SW support AFAIK. I have not yet changed my tunables as the data re-org will probably take a day or two (only 1Gb networking...).

   Over this time period, I've observed that my gluster backed guests tend not to consume as much of domain-0's (the Xen VM management host) resources as do my Ceph backed guests. To me, this is somewhat intuitive  as the ceph client has to do more "thinking" than the gluster client. However, It seems to me that the IO performance of the VM guests is well outside than the difference in spindle count would suggest. I am open to the notion that there are probably quite a few sub-optimal design choices/constraints within the environment. However, I haven't the resources to conduct all that many experiments and benchmarks.... So, over time I've ended up treating ceph as my resilient storage, and gluster as my more performant (3x vs 2x replication, and, as mentioned above, my gluster guests had quicker guest IO and lower dom-0 load).

    So, on to my questions:

   Would setting my tunables to jewel (my present release), or anything newer than bobtail (which is what I think I am set to if I read the ceph status warning correctly) reduce my dom-0 load and/or improve any aspects of the client IO performance?

Unfortunately no. The tunables are entirely about how CRUSH works, and while it's possible to construct pessimal CRUSH maps that are impossible to satisfy and take a long time to churn through calculations, it's hard and you clearly haven't done that here. I think you're just seeing that the basic CPU cost of a Ceph IO is higher than in Gluster, or else there is something unusual about the Xen configuration you have here compared to more common deployments.

   Will adding nodes to the cluster ceph reduce load on dom-0, and/or improve client IO performance (I doubt the former and would expect the latter...)?

In general adding nodes will increase parallel throughput (ie, async IO on one client or the performance of multiple clients), but won't reduce latencies. It shouldn't have much (any?) impact on client CPU usage (other than if the client is pushing through more IO, it will use proportionally more CPU), nor on the CPU usage of existing daemons.

   So, why did I bring up gluster at all? In an ideal world, I would like to have just one storage environment that would satisfy all my organizations needs. If forced to choose with the knowledge I have today, I would have to select gluster. I am hoping to come up with some actionable data points that might help me discover some of my mistakes which might explain my experience to date and maybe even help remedy said mistakes. As I mentioned earlier, I like ceph, more so than gluster, and would like to employ more within my environment. But, given budgetary constraints, I need to do what's best for my organization.

Yeah. I'm a little surprised you noticed it in the environment you described, but there aren't many people running Xen on Ceph so perhaps there's something odd happening with the setup it has there which I and others aren't picking up on. :/

Good luck!
-Greg 

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com