Hi,
I can see you're running mon, mds and osd on the same server.If you only have 16GB in your system I'm guessing you'll be swapping about now (or close). How much mem does the system hold?
Also, how busy are the disks? Or is it primarily cpu-bound? Do you have many processes waiting for run time or high interrupt count?
/Martin
On Mon, Mar 31, 2014 at 1:49 PM, Kenneth Waegeman <Kenneth.Waegeman@xxxxxxxx> wrote:
Hi all,
Before the weekend we started some copying tests over ceph-fuse. Initially, this went ok. But then the performance started dropping gradually. Things are going very slow now:
2014-03-31 13:36:37.047423 mon.0 [INF] pgmap v265871: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 44747 kB/s rd, 216 kB/s wr, 10 op/s
2014-03-31 13:36:38.049286 mon.0 [INF] pgmap v265872: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 4069 B/s rd, 363 kB/s wr, 24 op/s
2014-03-31 13:36:39.057680 mon.0 [INF] pgmap v265873: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 5092 B/s rd, 151 kB/s wr, 22 op/s
2014-03-31 13:36:40.075718 mon.0 [INF] pgmap v265874: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 25961 B/s rd, 1527 B/s wr, 10 op/s
2014-03-31 13:36:41.087764 mon.0 [INF] pgmap v265875: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 71574 kB/s rd, 4564 B/s wr, 17 op/s
2014-03-31 13:36:42.109200 mon.0 [INF] pgmap v265876: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 71238 kB/s rd, 3534 B/s wr, 9 op/s
2014-03-31 13:36:43.128113 mon.0 [INF] pgmap v265877: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 4022 B/s rd, 116 kB/s wr, 24 op/s
2014-03-31 13:36:44.143382 mon.0 [INF] pgmap v265878: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 8030 B/s rd, 117 kB/s wr, 29 op/s
2014-03-31 13:36:45.160405 mon.0 [INF] pgmap v265879: 1300 pgs: 1300 active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 7049 B/s rd, 4531 B/s wr, 9 op/s
ceph-mds seems very busy, and also only one osd!
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
54279 root 20 0 8561m 7.5g 4408 S 105.6 23.8 3202:05 ceph-mds
50242 root 20 0 1378m 373m 6452 S 0.7 1.2 523:38.77 ceph-osd
49446 root 18 -2 10644 356 320 S 0.0 0.0 0:00.00 udevd
49444 root 18 -2 10644 428 320 S 0.0 0.0 0:00.00 udevd
49319 root 20 0 1444m 405m 5684 S 0.0 1.3 513:41.13 ceph-osd
48452 root 20 0 1365m 364m 5636 S 0.0 1.1 551:52.31 ceph-osd
47641 root 20 0 1567m 388m 5880 S 0.0 1.2 754:50.60 ceph-osd
46811 root 20 0 1441m 393m 8256 S 0.0 1.2 603:11.26 ceph-osd
46028 root 20 0 1594m 398m 6156 S 0.0 1.2 657:22.16 ceph-osd
45275 root 20 0 1545m 510m 9920 S 18.9 1.6 943:11.99 ceph-osd
44532 root 20 0 1509m 395m 7380 S 0.0 1.2 665:30.66 ceph-osd
43835 root 20 0 1397m 384m 8292 S 0.0 1.2 466:35.47 ceph-osd
43146 root 20 0 1412m 393m 5884 S 0.0 1.2 506:42.07 ceph-osd
42496 root 20 0 1389m 364m 5292 S 0.0 1.1 522:37.70 ceph-osd
41863 root 20 0 1504m 393m 5864 S 0.0 1.2 462:58.11 ceph-osd
39035 root 20 0 918m 694m 3396 S 3.3 2.2 55:53.59 ceph-mon
Does this look familiar to someone?
How can we debug this further?
I already have set the debug level of mds to 5. There are a lot of 'lookup' entries, but I can't see any reported warnings or errors.
Thanks!
Kind regards,
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com