I've got several osds that are spinning at 100%. I've retained some professional services to have a look. Its out of my newbie reach.. /Christopher On Tue, Sep 16, 2014 at 11:23 PM, Craig Lewis <clewis at centraldesktop.com> wrote: > Is it using any CPU or Disk I/O during the 15 minutes? > > On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen < > christopher.thorjussen at onlinebackupcompany.com> wrote: > >> I'm waiting for my cluster to recover from a crashed disk and a second >> osd that has been taken out (crushmap, rm, stopped). >> >> Now I'm stuck at looking at this output ('ceph -w') while my osd.58 goes >> down every 15 minute. >> >> 2014-09-14 20:08:56.535688 mon.0 [INF] pgmap v31056972: 24192 pgs: 1 >> active, 23888 active+clean, 2 active+remapped+backfilling, 301 >> active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; >> 148288/25473878 degraded (0.582%) >> 2014-09-14 20:08:57.549302 mon.0 [INF] pgmap v31056973: 24192 pgs: 1 >> active, 23888 active+clean, 2 active+remapped+backfilling, 301 >> active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; >> 148288/25473878 degraded (0.582%) >> 2014-09-14 20:08:58.562771 mon.0 [INF] pgmap v31056974: 24192 pgs: 1 >> active, 23888 active+clean, 2 active+remapped+backfilling, 301 >> active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; >> 148288/25473878 degraded (0.582%) >> 2014-09-14 20:08:59.569851 mon.0 [INF] pgmap v31056975: 24192 pgs: 1 >> active, 23888 active+clean, 2 active+remapped+backfilling, 301 >> active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; >> 148288/25473878 degraded (0.582%) >> >> Here is a log from when I restarted osd.58 and through the next reboot 15 >> minutes later: http://pastebin.com/rt64vx9M >> Short, it just waits for 15 minutes not doing anything and then goes down >> putting lots of lines like this in the log for that osd: >> >> 2014-09-14 20:02:08.517727 7fbd3909a700 0 -- 10.47.18.33:6812/27234 >> >> 10.47.18.32:6824/21269 pipe(0x35c12280 sd=117 :38289 s=2 pgs=159 cs=1 >> l=0 c=0x35bcf1e0).fault with nothing to send, going to standby >> 2014-09-14 20:02:08.519312 7fbd37b85700 0 -- 10.47.18.33:6812/27234 >> >> 10.47.18.34:6808/5278 pipe(0x36c64500 sd=130 :44909 s=2 pgs=16370 cs=1 >> l=0 c=0x36cc4f20).fault with nothing to send, going to standby >> >> Then I have to restart it. And it repeats. >> >> What should/can I do? Take it out? >> >> I've got 4 servers with 24 disks each. Details about servers: >> http://pastebin.com/XQeSh8gJ >> Running dumpling - 0.67.10 >> >> Cheers, >> Christopher Thorjussen >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140916/27a0d2a6/attachment.htm>