Hello, we are on Debian Jessie and Hammer 0.94.9 and recently we decided to upgrade our kernel from 3.16 to 4.9 (jessie-backports). We experience the same regression but with some shiny points -- ceph tell osd average across the cluster -- 3.16.39-1: 204MB/s 4.9.0-0 : 158MB/s -- 1 rados bench client 4K 2048 threads avg IOPS -- 3.16.39-1: 1604 4.9.0-0 : 451 -- 1 rados bench client 64K 512 threads avg BW MB/s-- 3.16.39-1: 78 4.9.0-0 : 31 The shiny points are on the following tests: 1 rados bench client 4K 512 threads avg IOPS 1 rados bench client 64K 2048 threads avg BW MB/s where machines with kernel 4.9 seem to perform slightly better. The overall impression though is that there is a serious regression or something that should be tuned to get the same performance out of the cluster. Our demo cluster is 4 nodes X 12 OSDs, separate journal on SSD, firefly tunables and everything else default considering our Ceph installation and Debian OS. Each rados bench run 5 times to get an average and caches were dropped before each test. I wonder if anyone has discovered the culprit so far? Any hints from others to focus our investigation on? Best regards, Kostis On 19 December 2016 at 17:17, Yoann Moulin <yoann.moulin@xxxxxxx> wrote: > Hello, > > Finally, I found time to do some new benchmarks with the latest jewel release (10.2.5) on 4 nodes. Each node has 10 OSDs. > > I ran 2 times "ceph tell osd.* bench" over 40 OSDs, here the average speed : > > 4.2.0-42-generic 97.45 MB/s > 4.4.0-53-generic 55.73 MB/s > 4.8.15-040815-generic 62.41 MB/s > 4.9.0-040900-generic 60.88 MB/s > > I have the same behaviour with at least 35 to 40% performance drop between kernel 4.2 and kernel > 4.4 > > I can do further benches if needed. > > Yoann > > Le 26/07/2016 à 09:09, Lomayani S. Laizer a écrit : >> Hello, >> do you have journal on disk too ? >> >> Yes am having journal on same hard disk. >> >> ok and could you do bench with kernel 4.2 ? just to see if you have better >> throughput. Thanks >> >> In ubuntu 14 I was running 4.2 kernel. the throughput was the same around 80-90MB/s per osd. I cant tell the difference because each test gives >> the speeds on same range. I did not test kernel 4.4 in ubuntu 14 >> >> >> -- >> Lomayani >> >> On Tue, Jul 26, 2016 at 9:39 AM, Yoann Moulin <yoann.moulin@xxxxxxx <mailto:yoann.moulin@xxxxxxx>> wrote: >> >> Hello, >> >> > Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are similar. >> >> do you have journal on disk too ? >> >> > I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have around >> > 80-90MB/s of OSD speeds in both operating systems >> >> ok and could you do bench with kernel 4.2 ? just to see if you have better >> throughput. Thanks >> >> > Only issue am observing now with ubuntu 16 is sometime osd fails on rebooting >> > until i start them manually or adding starting commands in rc.local. >> >> in my case, it's a test environment, so I don't have notice those behaviours >> >> -- >> Yoann >> >> > On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin <yoann.moulin@xxxxxxx <mailto:yoann.moulin@xxxxxxx> >> > <mailto:yoann.moulin@xxxxxxx <mailto:yoann.moulin@xxxxxxx>>> wrote: >> > >> > Hello, >> > >> > (this is a repost, my previous message seems to be slipping under the radar) >> > >> > Does anyone get a similar behaviour to the one described below ? >> > >> > I found a big performance drop between kernel 3.13.0-88 (default kernel on >> > Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default kernel on >> > Ubuntu Xenial 16.04) >> > >> > - ceph version is Jewel (10.2.2). >> > - All tests have been done under Ubuntu 14.04 on >> > - Each cluster has 5 nodes strictly identical. >> > - Each node has 10 OSDs. >> > - Journals are on the disk. >> > >> > Kernel 4.4 has a drop of more than 50% compared to 4.2 >> > Kernel 4.4 has a drop of 40% compared to 3.13 >> > >> > details below : >> > >> > With the 3 kernel I have the same performance on disks : >> > >> > Raw benchmark: >> > dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct => average ~230MB/s >> > dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average ~220MB/s >> > >> > Filesystem mounted benchmark: >> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average ~205MB/s >> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s >> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average ~190MB/s >> > >> > Ceph osd Benchmark: >> > Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average ~81MB/s >> > Kernel 4.2.0-38-generic : ceph tell osd.ID bench => average ~109MB/s >> > Kernel 4.4.0-24-generic : ceph tell osd.ID bench => average ~50MB/s >> > >> > I did new benchmarks then on 3 new fresh clusters. >> > >> > - Each cluster has 3 nodes strictly identical. >> > - Each node has 10 OSDs. >> > - Journals are on the disk. >> > >> > bench5 : Ubuntu 14.04 / Ceph Infernalis >> > bench6 : Ubuntu 14.04 / Ceph Jewel >> > bench7 : Ubuntu 16.04 / Ceph jewel >> > >> > this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 x 30 >> > OSDs) >> > >> > bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s >> > bench6 / 14.04 / Jewel / kernel 3.13 : 86.47 MB/s >> > >> > bench5 / 14.04 / Infernalis / kernel 4.2 : 63.38 MB/s >> > bench6 / 14.04 / Jewel / kernel 4.2 : 107.75 MB/s >> > bench7 / 16.04 / Jewel / kernel 4.2 : 101.54 MB/s >> > >> > bench5 / 14.04 / Infernalis / kernel 4.4 : 53.61 MB/s >> > bench6 / 14.04 / Jewel / kernel 4.4 : 65.82 MB/s >> > bench7 / 16.04 / Jewel / kernel 4.4 : 61.57 MB/s >> > >> > If needed, I have the raw output of "ceph tell osd.* bench" >> > >> > Best regards >> >> > > > -- > Yoann Moulin > EPFL IC-IT > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com