Re: MDS debugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Message from Martin B Nielsen <martin@xxxxxxxxxxx> ---------
   Date: Mon, 31 Mar 2014 15:55:24 +0200
   From: Martin B Nielsen <martin@xxxxxxxxxxx>
Subject: Re:  MDS debugging
     To: Kenneth Waegeman <Kenneth.Waegeman@xxxxxxxx>
     Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>


Hi,

I can see you're running mon, mds and osd on the same server.
That's true, we have 3 servers all having a monitor and osds..

Also, from a quick glance you're using around 13GB resident memory.

If you only have 16GB in your system I'm guessing you'll be swapping about
now (or close). How much mem does the system hold?
32GB on each of the 3 hosts, no swap used.

Also, how busy are the disks? Or is it primarily cpu-bound? Do you have
many processes waiting for run time or high interrupt count?

I already stopped the sync and rebooted the cluster, at first sight the performance is OK again, so I am going to re-run the test and I will report back what will happen in terms of interrupt and processes. But I looked to the disks while having the problem, and the disks weren't busy at all at that moment. At the server with the active mds, there was a lot of cpu usage for the ceph-mds. on the other hosts, there was a lot of migration/x processes with high cpu usage.

Thanks a lot!
Kenneth

/Martin


On Mon, Mar 31, 2014 at 1:49 PM, Kenneth Waegeman <Kenneth.Waegeman@xxxxxxxx
wrote:

Hi all,

Before the weekend we started some copying tests over ceph-fuse.
Initially, this went ok. But then the performance started dropping
gradually. Things are going very slow now:

2014-03-31 13:36:37.047423 mon.0 [INF] pgmap v265871: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 44747
kB/s rd, 216 kB/s wr, 10 op/s
2014-03-31 13:36:38.049286 mon.0 [INF] pgmap v265872: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 4069
B/s rd, 363 kB/s wr, 24 op/s
2014-03-31 13:36:39.057680 mon.0 [INF] pgmap v265873: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 5092
B/s rd, 151 kB/s wr, 22 op/s
2014-03-31 13:36:40.075718 mon.0 [INF] pgmap v265874: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 25961
B/s rd, 1527 B/s wr, 10 op/s
2014-03-31 13:36:41.087764 mon.0 [INF] pgmap v265875: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 71574
kB/s rd, 4564 B/s wr, 17 op/s
2014-03-31 13:36:42.109200 mon.0 [INF] pgmap v265876: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 71238
kB/s rd, 3534 B/s wr, 9 op/s
2014-03-31 13:36:43.128113 mon.0 [INF] pgmap v265877: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 4022
B/s rd, 116 kB/s wr, 24 op/s
2014-03-31 13:36:44.143382 mon.0 [INF] pgmap v265878: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 8030
B/s rd, 117 kB/s wr, 29 op/s
2014-03-31 13:36:45.160405 mon.0 [INF] pgmap v265879: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB used, 74117 GB / 130 TB avail; 7049
B/s rd, 4531 B/s wr, 9 op/s


ceph-mds seems very busy, and also only one osd!

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
54279 root      20   0 8561m 7.5g 4408 S 105.6 23.8   3202:05 ceph-mds
50242 root      20   0 1378m 373m 6452 S  0.7  1.2 523:38.77 ceph-osd
49446 root      18  -2 10644  356  320 S  0.0  0.0   0:00.00 udevd
49444 root      18  -2 10644  428  320 S  0.0  0.0   0:00.00 udevd
49319 root      20   0 1444m 405m 5684 S  0.0  1.3 513:41.13 ceph-osd
48452 root      20   0 1365m 364m 5636 S  0.0  1.1 551:52.31 ceph-osd
47641 root      20   0 1567m 388m 5880 S  0.0  1.2 754:50.60 ceph-osd
46811 root      20   0 1441m 393m 8256 S  0.0  1.2 603:11.26 ceph-osd
46028 root      20   0 1594m 398m 6156 S  0.0  1.2 657:22.16 ceph-osd
45275 root      20   0 1545m 510m 9920 S 18.9  1.6 943:11.99 ceph-osd
44532 root      20   0 1509m 395m 7380 S  0.0  1.2 665:30.66 ceph-osd
43835 root      20   0 1397m 384m 8292 S  0.0  1.2 466:35.47 ceph-osd
43146 root      20   0 1412m 393m 5884 S  0.0  1.2 506:42.07 ceph-osd
42496 root      20   0 1389m 364m 5292 S  0.0  1.1 522:37.70 ceph-osd
41863 root      20   0 1504m 393m 5864 S  0.0  1.2 462:58.11 ceph-osd
39035 root      20   0  918m 694m 3396 S  3.3  2.2  55:53.59 ceph-mon

Does this look familiar to someone?

How can we debug this further?
I already have set the debug level of mds to 5. There are a lot of
'lookup' entries, but I can't see any reported warnings or errors.

Thanks!

Kind regards,
Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



----- End message from Martin B Nielsen <martin@xxxxxxxxxxx> -----

--

Met vriendelijke groeten,
Kenneth Waegeman


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux