Re: Poor Windows performance on ceph RBD.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I think you are hit by two different problems at the same time. The second problem might be the same that we also experience, namely that Windows VMs have very strange performance characteristics with libvirt, vd driver and RBD. With copy operations on very large files (>2GB) we see a sharp drop of bandwidth after ca. 1 to 1.5GB to a measly 25MB/s for as yet unknown reasons. We cannot reproduce this behaviour with Linux VMs, so chances that this is a Windows and not a ceph problem are rather high.

The first problem, however, has to do with how ceph uses disks. Bare spinning disks have very poor performance characteristics and a lot of development since their invention has been on smart controllers (internal and external) with volatile and persistent caches and OS file buffers that attempt to translate usual user's workloads into something that works reasonable well with spinning drives. The main ideas being to re-order and merge I/O, cache hot data and absorb I/O bursts for constant write back. The SANs you are used to are almost certainly high-end products with all the magic money can currently afford.

Ceph forcefully bypasses all of such logic and a rule of thumb I'm following is that with ceph and current hardware, using current generation drives will provide previous generation's drive performance. With NVMes you can achieve SSD performance, with SSDs you get good spinning SAS drive performance and with SAS drives you get, well, floppy or zip drive performance. I'm afraid that's what you are seeing with 15VMs saturating the available aggregated performance of the spindles.

If you want to stick with spindles as a data store, what you need is fast, reliable persistent cache. Reliable here means that the firmware is free of bugs with respect to power outages, which is quite a requirement in itself. Some expensive disk controllers claim to have that, they offer persistent NVMe cache. How much you want to trust the firmware is a different story. Alternatively, you could consider a few TB NVMe drives for a ceph cache pool. People report that they are happy with that. As long as the cache pool can hold all hot data plus write bursts, I would also expect this to work fine.

Instead of caching we decided to go for a split. We use datacenter grade low-cost SSDs for a small all-flash pool for OS RBD disks and a large HDD-only pool for data storage. This works quite well since the major annoying simultaneous I/O workload of Windows VMs happens on the OS disks. For ordinary data access, an EC HDD pool is perfectly fine and we provision machines with a second large data disk on HDD. Our users are quite happy with that model.

In any case, we are still stuck with the strange performance drop with Windows machines that you also seem to observe and are still looking for help with that. If you manage to figure out what is going on, I would like to hear about that. So far, we haven't found a clue.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: jcharles@xxxxxxxxxxxx <jcharles@xxxxxxxxxxxx>
Sent: 11 June 2020 12:38:32
To: ceph-users@xxxxxxx
Subject:  Re: Poor Windows performance on ceph RBD.

Hello,

we are using same environment, Opennebula + Ceph.
Our ceph cluster is composed by 5 ceph OSD Hosts with SSD, spinning 10ktrs and 7.2ktrs, with 10Gb/s fiber network
Each spinning OSD are associated with a db and wall devices on SSD

Nearly all our Windows VM RBD images are in a 10k/trs pool with erasure coding.
For the moment we are house about 15 VM (RDS and exchange)

What we are noting :
   - VM are far from respondig as well as on our old 10k SAN ( less than 30%)
   - RBD average Latency is oscillating between 50ms to 250ms with some peaks that can reach the second
   - some tests (crystal test drive) from inside the VM can show performance up to 700MB/s on read and 170 MB/s on write, but a single file copy barely reach 150 MB/s and  stay at a poor 25 MB/s most of the time
   -  test on 4K rnd, show some iops performance up to 4K iops read and 2kiops write, but view from RDB point of view, it's like the image iops cant barely go over 500 iops(read+write)

Since we have to migrate our VM from the old SAN to Ceph, I am really worried, there is mode than 150 VMs on it, and our Ceph seems to have hard time to cope with 15 VMs.

I can't find accurate date and relevant calculus templates  that should permit me to evualate what I can expect
All the documents I've read (and I read a lot ;) ) only reports empirical ascertainment with "it's better", or "it's worst".
There is a lot of parameters we can tweaks like block size, striping, stripe size, strip count, ... but those are poorly documented, especially the relation between them.

I will be more than happy to work with some peoples who are in the same situation to try to find some solutions, methods which can help us to be sure of our design. And break the "make the cluster, tweak it, and maybe it will be fine for you". I feel that each of us ( as I read in forums and mailing list) are a bit lonesome. Google is a real friend, but if feel he reached its limits ;)

Maybe my call will reach some volontee.

Best regards
JC Passard
CTO Provectio
France
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux