attached the log when I boot a mds up (cmds -i a -c /etc/ceph/ceph.conf) 2011-07-14 10:07:22.080571 mon0 192.168.10.1:6789/0 75 : [INF] mds? 192.168.10.4:6800/20883 up:boot 2011-07-14 10:14:59.182202 mon0 192.168.10.1:6789/0 76 : [INF] osd7 out (down for 300.007145) 2011-07-14 10:15:04.182906 mon0 192.168.10.1:6789/0 77 : [INF] osd15 out (down for 300.007564) 2011-07-14 10:15:09.183510 mon0 192.168.10.1:6789/0 78 : [INF] osd1 out (down for 300.007286) 2011-07-14 10:15:09.183553 mon0 192.168.10.1:6789/0 79 : [INF] osd3 out (down for 300.007285) 2011-07-14 10:15:09.183576 mon0 192.168.10.1:6789/0 80 : [INF] osd4 out (down for 300.007285) 2011-07-14 10:15:09.183593 mon0 192.168.10.1:6789/0 81 : [INF] osd5 out (down for 300.007284) 2011-07-14 10:15:09.183609 mon0 192.168.10.1:6789/0 82 : [INF] osd8 out (down for 300.007284) 2011-07-14 10:15:09.183627 mon0 192.168.10.1:6789/0 83 : [INF] osd12 out (down for 300.007284) 2011-07-14 10:15:09.183644 mon0 192.168.10.1:6789/0 84 : [INF] osd13 out (down for 300.007283) 2011-07-14 10:15:09.183660 mon0 192.168.10.1:6789/0 85 : [INF] osd19 out (down for 300.007283) 2011-07-14 10:15:09.183675 mon0 192.168.10.1:6789/0 86 : [INF] osd23 out (down for 300.007282) 2011-07-14 10:15:14.184369 mon0 192.168.10.1:6789/0 87 : [INF] osd9 out (down for 300.007294) 2011-07-14 10:15:14.184410 mon0 192.168.10.1:6789/0 88 : [INF] osd10 out (down for 300.007294) 2011-07-14 10:15:14.184431 mon0 192.168.10.1:6789/0 89 : [INF] osd11 out (down for 300.007293) 2011-07-14 10:15:14.184446 mon0 192.168.10.1:6789/0 90 : [INF] osd18 out (down for 300.007293) 2011-07-14 10:15:14.184465 mon0 192.168.10.1:6789/0 91 : [INF] osd21 out (down for 300.007292) 2011-07-14 10:16:09.188393 mon0 192.168.10.1:6789/0 92 : [INF] osd6 out (down for 300.009465) Did I operate incorrectly when starting up a mds ? Does anyone know why I start up a mds and most osds are out and down :( Thank you. 2011/7/14 AnnyRen <annyren6@xxxxxxxxx>: > Hi, developers: > > My environment is 3 mons, 2 mds, 25 osds, and ceph version is v0.30. > This morning, I found one mds(standby one) lost when I run "ceph -w" > > The original mds info should be {mds e42: 1/1/1 up {0=b=up:active}, 1 > up:standby} > but I found the standby one lost > > so I ssh to mds1 to run "cmds -i a -c /etc/ceph/ceph.conf" , and check > the mds daemon run correctly. > > After then, I found 17 osds suddenly down and out, > --------------------------------------------------------------------------------------------------------- > osd0 up in weight 1 up_from 139 up_thru 229 down_at 138 > last_clean_interval 120-137 192.168.10.10:6800/11191 > 192.168.10.10:6801/11191 192.168.10.10:6802/11191 > osd1 down out up_from 141 up_thru 160 down_at 166 last_clean_interval 117-139 > osd2 up in weight 1 up_from 147 up_thru 222 down_at 146 > last_clean_interval 125-145 192.168.10.12:6800/10173 > 192.168.10.12:6801/10173 192.168.10.12:6802/10173 > osd3 down out up_from 154 up_thru 160 down_at 166 last_clean_interval 130-152 > osd4 down out up_from 155 up_thru 160 down_at 166 last_clean_interval 130-153 > osd5 down out up_from 153 up_thru 160 down_at 166 last_clean_interval 134-151 > osd6 down out up_from 153 up_thru 160 down_at 170 last_clean_interval 135-151 > osd7 down out up_from 157 up_thru 160 down_at 162 last_clean_interval 133-155 > osd8 down out up_from 154 up_thru 160 down_at 166 last_clean_interval 134-152 > osd9 down out up_from 155 up_thru 160 down_at 168 last_clean_interval 133-153 > osd10 down out up_from 141 up_thru 160 down_at 168 last_clean_interval 118-139 > osd11 down out up_from 145 up_thru 160 down_at 168 last_clean_interval 119-143 > osd12 down out up_from 142 up_thru 160 down_at 166 last_clean_interval 123-140 > osd13 down out up_from 147 up_thru 160 down_at 166 last_clean_interval 123-145 > osd14 up in weight 1 up_from 143 up_thru 223 down_at 142 > last_clean_interval 124-141 192.168.10.24:6800/10122 > 192.168.10.24:6801/10122 192.168.10.24:6802/10122 > osd15 down out up_from 147 up_thru 160 down_at 164 last_clean_interval 121-145 > osd16 up in weight 1 up_from 148 up_thru 222 down_at 147 > last_clean_interval 124-146 192.168.10.26:6800/9881 > 192.168.10.26:6801/9881 192.168.10.26:6802/9881 > osd17 up in weight 1 up_from 148 up_thru 223 down_at 147 > last_clean_interval 122-146 192.168.10.27:6800/9986 > 192.168.10.27:6801/9986 192.168.10.27:6802/9986 > osd18 down out up_from 146 up_thru 160 down_at 168 last_clean_interval 124-144 > osd19 down out up_from 147 up_thru 160 down_at 166 last_clean_interval 125-145 > osd20 up in weight 1 up_from 148 up_thru 222 down_at 147 > last_clean_interval 126-146 192.168.10.30:6800/9816 > 192.168.10.30:6801/9816 192.168.10.30:6802/9816 > osd21 down out up_from 148 up_thru 160 down_at 168 last_clean_interval 126-146 > osd22 up in weight 1 up_from 149 up_thru 220 down_at 148 > last_clean_interval 127-147 192.168.10.32:6800/9640 > 192.168.10.32:6801/9640 192.168.10.32:6802/9640 > osd23 down out up_from 153 up_thru 160 down_at 166 last_clean_interval 128-151 > osd24 up in weight 1 up_from 150 up_thru 225 down_at 149 > last_clean_interval 132-148 192.168.10.34:6800/10581 > 192.168.10.34:6801/10581 192.168.10.34:6802/10581 > --------------------------------------------------------------------------------------------------------- > > Many pgs are degraded, I ssh to every down and out osd host to see log > (/var/log/ceph/osd.x.log), but there is nothing recorded in logs... > Why the logs stopped logging anything? > > So I run "cosd -i x -c /etc/ceph/ceph.conf" on every down osd > individually to make them up and in, and then I can read/write files > form ceph.... > > > Could anyone tell me what's going on in my environment? > 1. OSD Stability problem? > 2. Ceph didn't write logs unexpectedly. > > Thanks a lot! :) > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html