Dear all, a few more results regarding virtio-version, RAM size and ceph RBD caching. I got some wrong information from our operators. We are using virtio-win-0.1.171 and found that this version might have a regression that affects performance: https://forum.proxmox.com/threads/big-discovery-on-virtio-performance.62728/. We are considering to downgrade all machines to virtio-win-0.1.164-2 until virtio-win-0.1.185-1 is marked stable. Our tests show that with both of these versions, Windows server version 2016 and 2019 perform equally well. We also experimented with the memory size for these machines. They used to have 4GB only. With 4GB, both versions eventually run into stalled I/O. After increasing this to 8GB we don't see stalls any more. Ceph RBD caching should have been set to writeback. Not sure why caching was disabled by default. It does not have much if any effect on write performance, although transfer rates seem more steady. I mainly want to enable caching to reduce read operations, which compete with writes on OSD level. This should give much better overall experience. We will change this setting during forthcoming service windows. Looks like we more or less got it sorted. Hints in this thread helped pinpointing issues. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 13 July 2020 15:38:58 To: André Gemünd; ceph-users Subject: Re: Poor Windows performance on ceph RBD. > If I may ask, which version of the virtio drivers do you use? https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/latest-virtio/virtio-win.iso Looks like virtio-win-0.1.185.* > And do you use caching on libvirt driver level? In the ONE interface, we use DISK = [ driver = "raw" , cache = "none"] which translates to <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> in the XML. We have no qemu settings in the ceph.conf. Looks like caching is disabled. Not sure if this is the recommended way though and why caching is disabled by default. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: André Gemünd <andre.gemuend@xxxxxxxxxxxxxxxxxx> Sent: 13 July 2020 11:18 To: Frank Schilder Subject: Re: Re: Poor Windows performance on ceph RBD. If I may ask, which version of the virtio drivers do you use? And do you use caching on libvirt driver level? Greetings André ----- Am 13. Jul 2020 um 10:43 schrieb Frank Schilder frans@xxxxxx: >> > To anyone who is following this thread, we found a possible explanation for >> > (some of) our observations. > >> If someone is following this, they probably want the possible >> explanation and not the knowledge of you having the possible >> explanation. > >> So you are saying if you do eg. a core installation (without gui) of >> 2016/2019 disable all services. The fio test results are signficantly >> different to eg. a centos 7 vm doing the same fio test? Are you sure >> this is not related to other processes writing to disk? > > Right, its not an explanation but rather a further observation. We don't really > have an explanation yet. > > Its an identical installation of both server versions, same services configured. > Our operators are not really into debugging Windows, that's why we were asking > here. Their hypothesis is, that the VD driver for accessing RBD images has > problems with Windows servers newer than 2016. I'm not a Windows guy, so can't > really comment on this. > > The test we do is a simple copy-test of a single 10g file and we monitor the > transfer speed. This info was cut out of this e-mail, the original report for > reference is: > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/ > . > > We are very sure that it is not related to other processes writing to disk, we > monitor that too. There is also no competition on the RBD pool at the time of > testing. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > Sent: 13 July 2020 10:24 > To: ceph-users; Frank Schilder > Subject: RE: Re: Poor Windows performance on ceph RBD. > >>> To anyone who is following this thread, we found a possible > explanation for >>> (some of) our observations. > > If someone is following this, they probably want the possible > explanation and not the knowledge of you having the possible > explanation. > > So you are saying if you do eg. a core installation (without gui) of > 2016/2019 disable all services. The fio test results are signficantly > different to eg. a centos 7 vm doing the same fio test? Are you sure > this is not related to other processes writing to disk? > > > > -----Original Message----- > From: Frank Schilder [mailto:frans@xxxxxx] > Sent: maandag 13 juli 2020 9:28 > To: ceph-users@xxxxxxx > Subject: Re: Poor Windows performance on ceph RBD. > > To anyone who is following this thread, we found a possible explanation > for (some of) our observations. > > We are running Windows servers version 2016 and 2019 as storage servers > exporting data on an rbd image/disk. We recently found that Windows > server 2016 runs fine. It is still not as fast as Linux + SAMBA share on > an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth. > With Windows server 2019, however, we observe near-complete stall of > file transfers and time-outs using standard copy tools (robocopy). We > don't have an explanation yet and are downgrading Windows servers where > possible. > > If anyone has a hint what we can do, please let us know. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Dipl.-Inf. André Gemünd, Leiter IT / Head of IT Fraunhofer-Institute for Algorithms and Scientific Computing andre.gemuend@xxxxxxxxxxxxxxxxxx Tel: +49 2241 14-2193 /C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx