Re: Poor Windows performance on ceph RBD.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/07/2020 10:43, Frank Schilder wrote:
To anyone who is following this thread, we found a possible explanation for
(some of) our observations.
If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.
So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?
Right, its not an explanation but rather a further observation. We don't really have an explanation yet.

Its an identical installation of both server versions, same services configured. Our operators are not really into debugging Windows, that's why we were asking here. Their hypothesis is, that the VD driver for accessing RBD images has problems with Windows servers newer than 2016. I'm not a Windows guy, so can't really comment on this.

The test we do is a simple copy-test of a single 10g file and we monitor the transfer speed. This info was cut out of this e-mail, the original report for reference is: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/ .

We are very sure that it is not related to other processes writing to disk, we monitor that too. There is also no competition on the RBD pool at the time of testing.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
Sent: 13 July 2020 10:24
To: ceph-users; Frank Schilder
Subject: RE:  Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible
explanation for
(some of) our observations.
If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?



-----Original Message-----
From: Frank Schilder [mailto:frans@xxxxxx]
Sent: maandag 13 juli 2020 9:28
To: ceph-users@xxxxxxx
Subject:  Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible explanation
for (some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers
exporting data on an rbd image/disk. We recently found that Windows
server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
With Windows server 2019, however, we observe near-complete stall of
file transfers and time-outs using standard copy tools (robocopy). We
don't have an explanation yet and are downgrading Windows servers where
possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

I am not sure exactly how you are testing the speed on Windows, but 2 possible factors are block size and caching.

Block size depends on the client application, so a Windows file copy from ui will have a 512k block size which is different than xcopy or robocopy, the later can change block size depending on flags  / restart mode..etc. Similarly the dd command on Linux will give different speed depending on block size.

Caching: Caching will make a big difference for sequential writes as it merges smaller blocks, but in some cases it is not obvious if caching is being used or not since it could be at different layers, for example in your Linux Samba export test, there could be caching done at the gateway, a clustered setup with high availability may explicitly turn caching off. What you report as initial high speed then decrease could be indicative of initially writing to a cache buffer then slowing when it fills.

It will help to quantify/compare latency (iops qd=1) via:
on Linux:
rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size 4K  --io-pattern rand --rbd_cache=false fio --name=xx --filename=FILE_NAME --iodepth=1 --rw=randwrite --bs=4k --direct=1 --runtime=30 --time_based
On Windows vm:
diskspd -b4k -d30 -o1 -t1 -r -Su -w100 -c1G  FILE_NAME


Measure/compare sequential writes with 512k block size
on Linux:
rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size 512K  --io-pattern seq --rbd_cache=false fio --name=xx --filename=FILE_NAME --iodepth=1 --rw=write --bs=512k --direct=1 --runtime=30 --time_based
On Windows vm:
diskspd -b512k -d30 -o1 -t1  -Su -w100 -c1G  FILE_NAME

/Maged

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux