Hi, I think the high CPU usage was due to the system time not being right. I activated ntp and it had to do quite big adjustment, and after that the high CPU usage was gone. Anyway, I immediately ran into another issue. I ran a simple benchmark: # rados bench --pool benchmark 300 write --no-cleanup During the benchmark, one of my osd's went down. I checked the logs and apparently there was no hardware failure (the disk is still nicely mounted and the osd is still running, but the logfile fills up rapidly with these messages: 2013-08-02 00:03:40.014982 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >> 192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36884 s=2 pgs=86874 cs=173547 l=0).fault, initiating reconnect 2013-08-02 00:03:40.016682 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >> 192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36885 s=2 pgs=86875 cs=173549 l=0).fault, initiating reconnect 2013-08-02 00:03:40.019241 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >> 192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36886 s=2 pgs=86876 cs=173551 l=0).fault, initiating reconnect What could be wrong here? King regards, Erik. On 08/01/2013 08:00 AM, Dan Mick wrote: > Logging might well help. > > http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/ > > > > On 07/31/2013 03:51 PM, Erik Logtenberg wrote: >> Hi, >> >> I just added a second node to my ceph test platform. The first node has >> a mon and three osd's, the second node only has three osd's. Adding the >> osd's was pretty painless, and ceph distributed the data from the first >> node evenly over both nodes so everything seems to be fine. The monitor >> also thinks everything is fine: >> >> 2013-08-01 00:41:12.719640 mon.0 [INF] pgmap v1283: 292 pgs: 292 >> active+clean; 9264 MB data, 24826 MB used, 5541 GB / 5578 GB avail >> >> Unfortunately, the three osd's on the second node keep eating a lot of >> cpu, while there is no activity whatsoever: >> >> PID USER VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 21272 root 441440 34632 7848 S 61.8 0.9 4:08.62 ceph-osd >> 21145 root 439852 29316 8360 S 60.4 0.7 4:04.31 ceph-osd >> 21036 root 443828 31324 8336 S 60.1 0.8 4:07.55 ceph-osd >> >> Any idea why that is and how I can even ask an osd what it's doing? >> There is no corresponding hdd activity, it's only cpu and hardly any >> memory usage. >> >> Also the monitor on the first node is doing the same thing: >> >> PID USER VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 12825 root 186900 23492 5540 S 141.1 0.590 9:47.64 ceph-mon >> >> I tried stopping the three osd's: that makes the monitor calm down, but >> after restarting the osd's, the monitor resumes its cpu usage. I also >> tried stopping the monitor, which makes the three osd's calm down, but >> once again they will start eating cpu again as soon as the monitor is >> back online. >> >> In the mean time, the first three osd's, the ones on the same machine as >> the monitor, don't behave like this at all. Currently as there is no >> activity, they are just idling on low cpu usage, as expected. >> >> Kind regards, >> >> Erik. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com