I've just noticed that MB used is increasing with 60MB even if ceph says that it writes only a few kb: 63603 MB data, 39809 MB used, 2346 GB / 2389 GB avail; 974 kB/s wr, 1277 op/s 63649 MB data, 39863 MB used, 2346 GB / 2389 GB avail; 974 kB/s wr, 1369 op/s On Thu, Oct 30, 2014 at 5:13 PM, Cristian Falcas <cristi.falcas@xxxxxxxxx> wrote: > Hello, > > I have an one node ceph installation and when trying to import an > image using qemu, it works fine for some time and after that the osd > process starts using ~100% of cpu and the number of op/s increases and > the writes decrease dramatically. The osd process doesn't appear as > being cpu bound, because there is no cpu maxed out. > > How can I debug what is causing this? > > I'm reading from a sata disk and to a pool on ssd. The journal is on > ram and I'm using ceph-0.80.7-0.el7.centos.x86_64 > > Output from top: > > 26344 root 20 0 825900 314516 18852 S 114.0 0.4 5:03.82 ceph-osd > 27547 root 20 0 927364 153440 13044 S 51.7 0.2 2:50.83 qemu-img > > Writes in ceph when the OSD is using a lot of cpu: > > 2014-10-30 10:55:08.112259 mon.0 [INF] pgmap v135: 3264 pgs: 3264 > active+clean; 27595 MB data, 16116 MB used, 2385 GB / 2405 GB avail; > 618 kB/s wr, 1236 op/s > 2014-10-30 10:55:13.111438 mon.0 [INF] pgmap v136: 3264 pgs: 3264 > active+clean; 27646 MB data, 16174 MB used, 2385 GB / 2405 GB avail; > 643 kB/s wr, 1286 op/s > 2014-10-30 10:55:18.110992 mon.0 [INF] pgmap v137: 3264 pgs: 3264 > active+clean; 27693 MB data, 16195 MB used, 2385 GB / 2405 GB avail; > 632 kB/s wr, 1264 op/s > 2014-10-30 10:55:23.109454 mon.0 [INF] pgmap v138: 3264 pgs: 3264 > active+clean; 27747 MB data, 16195 MB used, 2385 GB / 2405 GB avail; > 645 kB/s wr, 1291 op/s > > > Writes in ceph when the OSD is ~10% of cpu: > > 2014-10-30 09:59:46.140964 mon.0 [INF] pgmap v80: 704 pgs: 704 > active+clean; 11935 MB data, 8413 MB used, 2392 GB / 2405 GB avail; > 100536 kB/s wr, 98 op/s > 2014-10-30 09:59:51.134338 mon.0 [INF] pgmap v81: 704 pgs: 704 > active+clean; 12575 MB data, 8775 MB used, 2392 GB / 2405 GB avail; > 107 MB/s wr, 107 op/s > 2014-10-30 09:59:56.157098 mon.0 [INF] pgmap v82: 704 pgs: 704 > active+clean; 12991 MB data, 8949 MB used, 2392 GB / 2405 GB avail; > 105 MB/s wr, 105 op/s > 2014-10-30 10:00:01.181859 mon.0 [INF] pgmap v83: 704 pgs: 704 > active+clean; 13631 MB data, 9114 MB used, 2392 GB / 2405 GB avail; > 104 MB/s wr, 105 op/s > > Qemu command: > > time qemu-img convert -f raw -O rbd /home/user/backup_2014_10_27.raw > rbd:instances_fast/backup_2014_10_27.21 > > ceph config: > > [global] > mon_initial_members = $(hostname -s) > mon_host = $IP > public_network = 10.100.0.0/16 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > osd pool default size = 1 > osd pool default min size = 1 > osd crush chooseleaf type = 0 > > ## ssd, journal on same partition > [osd] > osd data = /var/lib/ceph/osd/ceph-\$id > osd journal size = 7000 > osd journal = /var/lib/ceph/osd/ceph-\$id-journal/osd-\$id.journal > osd crush update on start = false > > ## osd 1 is on ssd and we put journal on ram > [osd.1] > osd journal = /dev/shm/osd.1.journal > journal dio = false > > Test performed with dd: > > sync > dd bs=4M count=512 if=/home/user/backup_2014_10_27.raw > of=/var/lib/ceph/osd/ceph-1/backup_2014_10_27.raw conv=fdatasync > 512+0 records in > 512+0 records out > 2147483648 bytes (2.1 GB) copied, 16.3971 s, 131 MB/s > > Thank you, > Cristian Falcas _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com