Can you set osd and filestore debugging to 20, restart the osds, run rados bench as before, and post the logs? -Sam Just On Tue, Mar 20, 2012 at 1:37 PM, Andrey Korolyov <andrey@xxxxxxx> wrote: > rados bench 60 write -p data > <skip> > Total time run: 61.217676 > Total writes made: 989 > Write size: 4194304 > Bandwidth (MB/sec): 64.622 > > Average Latency: 0.989608 > Max latency: 2.21701 > Min latency: 0.255315 > > Here a snip from osd log, seems write size is okay. > > 2012-03-21 00:00:39.397066 7fdda86a7700 osd.0 10 pg[0.58( v 10'83 > (0'0,10'83] n=50 ec=1 les/c 9/9 8/8/6) [0,1] r=0 lpr=8 mlcod 10'82 > active+clean] removing repgather(0x31b5360 applying 10'83 rep_tid=597 > wfack= wfdisk= op=osd_op(client.4599.0:2533 rb.0.2.000000000040 [write > 1220608~4096] 0.17eb9fd8) v4) > 2012-03-21 00:00:39.397086 7fdda86a7700 osd.0 10 pg[0.58( v 10'83 > (0'0,10'83] n=50 ec=1 les/c 9/9 8/8/6) [0,1] r=0 lpr=8 mlcod 10'82 > active+clean] q front is repgather(0x31b5360 applying 10'83 > rep_tid=597 wfack= wfdisk= op=osd_op(client.4599.0:2533 > rb.0.2.000000000040 [write 1220608~4096] 0.17eb9fd8) v4) > > Sorry for my previous question about rbd chunks, it was really stupid :) > > On Mon, Mar 19, 2012 at 10:40 PM, Josh Durgin <josh.durgin@xxxxxxxxxxxxx> wrote: >> On 03/19/2012 11:13 AM, Andrey Korolyov wrote: >>> >>> Nope, I`m using KVM for rbd guests. Surely I`ve been noticed that Sage >>> mentioned too small value and I`ve changed it to 64M before posting >>> previous message with no success - both 8M and this value cause a >>> performance drop. When I tried to wrote small amount of data that can >>> be compared to writeback cache size(both on raw device and ext3 with >>> sync option), following results were made: >> >> >> I just want to clarify that the writeback window isn't a full writeback >> cache - it doesn't affect reads, and does not help with request merging etc. >> It simply allows a bunch of writes to be in flight while acking the write to >> the guest immediately. We're working on a full-fledged writeback cache that >> to replace the writeback window. >> >> >>> dd if=/dev/zero of=/var/img.1 bs=10M count=10 oflag=direct (almost >>> same without oflag there and in the following samples) >>> 10+0 records in >>> 10+0 records out >>> 104857600 bytes (105 MB) copied, 0.864404 s, 121 MB/s >>> dd if=/dev/zero of=/var/img.1 bs=10M count=20 oflag=direct >>> 20+0 records in >>> 20+0 records out >>> 209715200 bytes (210 MB) copied, 6.67271 s, 31.4 MB/s >>> dd if=/dev/zero of=/var/img.1 bs=10M count=30 oflag=direct >>> 30+0 records in >>> 30+0 records out >>> 314572800 bytes (315 MB) copied, 12.4806 s, 25.2 MB/s >>> >>> and so on. Reference test with bs=1M and count=2000 has slightly worse >>> results _with_ writeback cache than without, as I`ve mentioned before. >>> Here the bench results, they`re almost equal on both nodes: >>> >>> bench: wrote 1024 MB in blocks of 4096 KB in 9.037468 sec at 113 MB/sec >> >> >> One thing to check is the size of the writes that are actually being sent by >> rbd. The guest is probably splitting them into relatively small (128 or >> 256k) writes. Ideally it would be sending 4k writes, and this should be a >> lot faster. >> >> You can see the writes being sent by adding debug_ms=1 to the client or osd. >> The format is osd_op(.*[write OFFSET~LENGTH]). >> >> >>> Also, because I`ve not mentioned it before, network performance is >>> enough to hold fair gigabit connectivity with MTU 1500. Seems that it >>> is not interrupt problem or something like it - even if ceph-osd, >>> ethernet card queues and kvm instance pinned to different sets of >>> cores, nothing changes. >>> >>> On Mon, Mar 19, 2012 at 8:59 PM, Greg Farnum >>> <gregory.farnum@xxxxxxxxxxxxx> wrote: >>>> >>>> It sounds like maybe you're using Xen? The "rbd writeback window" option >>>> only works for userspace rbd implementations (eg, KVM). >>>> If you are using KVM, you probably want 81920000 (~80MB) rather than >>>> 8192000 (~8MB). >>>> >>>> What options are you running dd with? If you run a rados bench from both >>>> machines, what do the results look like? >>>> Also, can you do the ceph osd bench on each of your OSDs, please? >>>> (http://ceph.newdream.net/wiki/Troubleshooting#OSD_performance) >>>> -Greg >>>> >>>> >>>> On Monday, March 19, 2012 at 6:46 AM, Andrey Korolyov wrote: >>>> >>>>> More strangely, writing speed drops down by fifteen percent when this >>>>> option was set in vm` config(instead of result from >>>>> http://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg03685.html). >>>>> As I mentioned, I`m using 0.43, but due to crashed osds, ceph has been >>>>> recompiled with e43546dee9246773ffd6877b4f9495f1ec61cd55 and >>>>> 1468d95101adfad44247016a1399aab6b86708d2 - both cases caused crashes >>>>> under heavy load. >>>>> >>>>> On Sun, Mar 18, 2012 at 10:22 PM, Sage Weil<sage@xxxxxxxxxxxx >>>>> (mailto:sage@xxxxxxxxxxxx)> wrote: >>>>>> >>>>>> On Sat, 17 Mar 2012, Andrey Korolyov wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I`ve did some performance tests at the following configuration: >>>>>>> >>>>>>> mon0, osd0 and mon1, osd1 - two twelve-core r410 with 32G ram, mon2 - >>>>>>> dom0 with three dedicated cores and 1.5G, mostly idle. First three >>>>>>> disks on each r410 arranged into raid0 and holds osd data when fourth >>>>>>> holds os and osd` journal partition, all ceph-related stuff mounted on >>>>>>> the ext4 without barriers. >>>>>>> >>>>>>> Firstly, I`ve noticed about a difference of benchmark performance and >>>>>>> write speed through rbd from small kvm instance running on one of >>>>>>> first two machines - when bench gave me about 110Mb/s, writing zeros >>>>>>> to raw block device inside vm with dd was at top speed about 45 mb/s, >>>>>>> for vm`fs (ext4 with default options) performance drops to ~23Mb/s. >>>>>>> Things get worse, when I`ve started second vm at second host and tried >>>>>>> to continue same dd tests simultaneously - performance fairly divided >>>>>>> by half for each instance :). Enabling jumbo frames, playing with cpu >>>>>>> affinity for ceph and vm instances and trying different TCP congestion >>>>>>> protocols gave no effect at all - with DCTCP I have slightly smoother >>>>>>> network load graph and that`s all. >>>>>>> >>>>>>> Can ml please suggest anything to try to improve performance? >>>>>> >>>>>> >>>>>> Can you try setting >>>>>> >>>>>> rbd writeback window = 8192000 >>>>>> >>>>>> or similar, and see what kind of effect that has? I suspect it'll speed >>>>>> up dd; I'm less sure about ext3. >>>>>> >>>>>> Thanks! >>>>>> sage >>>>>> >>>>>> >>>>>>> >>>>>>> ceph-0.43, libvirt-0.9.8, qemu-1.0.0, kernel 3.2 >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>> in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> (mailto:majordomo@xxxxxxxxxxxxxxx) >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> (mailto:majordomo@xxxxxxxxxxxxxxx) >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html