Is it using any CPU or Disk I/O during the 15 minutes? On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen < christopher.thorjussen at onlinebackupcompany.com> wrote: > I'm waiting for my cluster to recover from a crashed disk and a second osd > that has been taken out (crushmap, rm, stopped). > > Now I'm stuck at looking at this output ('ceph -w') while my osd.58 goes > down every 15 minute. > > 2014-09-14 20:08:56.535688 mon.0 [INF] pgmap v31056972: 24192 pgs: 1 > active, 23888 active+clean, 2 active+remapped+backfilling, 301 > active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; > 148288/25473878 degraded (0.582%) > 2014-09-14 20:08:57.549302 mon.0 [INF] pgmap v31056973: 24192 pgs: 1 > active, 23888 active+clean, 2 active+remapped+backfilling, 301 > active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; > 148288/25473878 degraded (0.582%) > 2014-09-14 20:08:58.562771 mon.0 [INF] pgmap v31056974: 24192 pgs: 1 > active, 23888 active+clean, 2 active+remapped+backfilling, 301 > active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; > 148288/25473878 degraded (0.582%) > 2014-09-14 20:08:59.569851 mon.0 [INF] pgmap v31056975: 24192 pgs: 1 > active, 23888 active+clean, 2 active+remapped+backfilling, 301 > active+degraded; 36677 GB data, 93360 GB used, 250 TB / 341 TB avail; > 148288/25473878 degraded (0.582%) > > Here is a log from when I restarted osd.58 and through the next reboot 15 > minutes later: http://pastebin.com/rt64vx9M > Short, it just waits for 15 minutes not doing anything and then goes down > putting lots of lines like this in the log for that osd: > > 2014-09-14 20:02:08.517727 7fbd3909a700 0 -- 10.47.18.33:6812/27234 >> > 10.47.18.32:6824/21269 pipe(0x35c12280 sd=117 :38289 s=2 pgs=159 cs=1 l=0 > c=0x35bcf1e0).fault with nothing to send, going to standby > 2014-09-14 20:02:08.519312 7fbd37b85700 0 -- 10.47.18.33:6812/27234 >> > 10.47.18.34:6808/5278 pipe(0x36c64500 sd=130 :44909 s=2 pgs=16370 cs=1 > l=0 c=0x36cc4f20).fault with nothing to send, going to standby > > Then I have to restart it. And it repeats. > > What should/can I do? Take it out? > > I've got 4 servers with 24 disks each. Details about servers: > http://pastebin.com/XQeSh8gJ > Running dumpling - 0.67.10 > > Cheers, > Christopher Thorjussen > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140916/dde09230/attachment.htm>