On Sun, Dec 30, 2012 at 9:05 PM, Jens Kristian Søgaard <jens@xxxxxxxxxxxxxxxxxxxx> wrote: > Hi guys, > > I'm testing Ceph as storage for KVM virtual machine images and found an > inconvenience that I am hoping it is possible to find the cause of. > > I'm running a single KVM Linux guest on top of Ceph storage. In that guest I > run rsync to download files from the internet. When rsync is running, the > guest will seemingly stall and run very slowly. > > For example if I log in via SSH to the guest and use the command prompt, > nothing will happen for a long period (30+ seconds), then it processes a few > typed characters, and then it blocks for another long period of time, then > process a bit more, etc. > > I was hoping to be able to tweak the system so that it runs more like when > using conventional storage - i.e. perhaps the rsync won't be super fast, but > the machine will be equally responsive all the time. > > I'm hoping that you can provide some hints on how to best benchmark or test > the system to find the cause of this? > > The ceph OSDs periodically logs thse two messages, that I do not fully > understand: > > 12-12-30 17:07:12.894920 7fc8f3242700 1 heartbeat_map is_healthy > 'OSD::op_tp thread 0x7fc8cbfff700' had timed out after 30 > 2012-12-30 17:07:13.599126 7fc8cbfff700 1 heartbeat_map reset_timeout > 'OSD::op_tp thread 0x7fc8cbfff700' had timed out after 30 > > Is this to be expected when the system is in use, or does it indicate that > something is wrong? > > Ceph also logs messages such as this: > > 2012-12-30 17:07:36.932272 osd.0 10.0.0.1:6800/9157 286340 : [WRN] slow > request 30.751940 seconds old, received at 2012-12-30 17:07:06.180236: > osd_op(client.4705.0:16074961 rb.0.11b7.4a933baa.0000000c188f [write > 532480~4096] 0.f2a63fe) v4 currently waiting for sub ops > > > My setup: > > 3 servers running Fedora 17 with Ceph 0.55.1 from RPM. > Each server runs one osd and one mon. One of the servers also runs an mds. > Backing file system is btrfs stored on a md-raid . Journal is stored on the > same SATA disks as the rests of the data. > Each server has 3 bonded gigabit/sec NICs. > > One server running Fedora 16 with qemu-kvm. > Has gigabit/sec NIC connected to the same network as the Ceph servers, and a > gigabit/sec NIC connected to the Internet. > Disk is mounted with: > > -drive format=rbd,file=rbd:data/image1:rbd_cache=1,if=virtio > > > iostat on the KVM guest gives: > > avg-cpu: %user %nice %system %iowait %steal %idle > 0,00 0,00 0,00 100,00 0,00 0,00 > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz > avgqu-sz await svctm %util > vda 0,00 1,40 0,10 0,30 0,80 13,60 36,00 > 1,66 2679,25 2499,75 99,99 > > > Top on the KVM host shows 90% CPU idle and 0.0% I/O waiting. > > iostat on a OSD gives: > avg-cpu: %user %nice %system %iowait %steal %idle > 0,13 0,00 1,50 15,79 0,00 82,58 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sda 240,70 441,20 33,00 42,70 1122,40 1961,80 81,48 > 14,45 164,42 319,14 44,85 6,63 50,22 > sdb 299,10 393,10 33,90 38,40 1363,60 1720,60 85,32 > 13,55 171,32 316,21 43,41 6,55 47,39 > sdc 268,50 441,60 28,80 45,40 1191,60 1977,00 85,41 > 19,08 159,39 345,98 41,02 6,56 48,69 > sdd 255,50 445,50 30,20 45,00 1150,40 1975,80 83,14 > 18,18 155,97 338,90 33,20 6,95 52,23 > md0 0,00 0,00 1,20 132,70 4,80 4086,40 61,11 > 0,00 0,00 0,00 0,00 0,00 0,00 > > > The figures are similar on all three OSDs. > > I am thinking that one possible cause could be that the journal is stored on > the same disks as the rest of the data, but I don't know how to benchmark if > this is actually the case (?) > > Thanks for any help or advice, you can offer! Hi Jens, You may try do play with SCHED_RT, I have found it hard to use for myself, but you can achieve your goal by adding small RT slices via ``cpu'' cgroup to vcpu/emulator threads, it dramatically increases overall VM` responsibility. I have thrown it off because RT scheduler is a very strange thing - it may cause endless lockup on disk operation during heavy operations or produce ever-stuck ``kworker'' on some cores if you have killed VM which has separate RT slices for vcpu threads. Of course, some Ceph tuning like writeback cache and large journal may help you too, I`m speaking primarily of VM` performance by itself. > > -- > Jens Kristian Søgaard, Mermaid Consulting ApS, > jens@xxxxxxxxxxxxxxxxxxxx, > http://www.mermaidconsulting.com/ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html