I do notice that (unless mds.a is configured to be a standby in the config file) you're starting up another MDS and claiming it's the same as the already-running one. This shouldn't cause the OSDc to crash but might be revealing a bug. On Thu, Jul 14, 2011 at 10:11 AM, Samuel Just <samuelj@xxxxxxxxxxxxxxx> wrote: > Not sure about the lack of logs, is something rotating the logs? There > could have been a bug that caused the osds to crash, but I'll need the logs > to hazard a guess as to what caused it. Starting the mds that way should > not have killed the osds. Do the running osds produce logs? The logging > should default to /var/log/ceph/. > -Sam > > On 07/13/2011 08:54 PM, AnnyRen wrote: >> >> Hi, developers: >> >> My environment is 3 mons, 2 mds, 25 osds, and ceph version is v0.30. >> This morning, I found one mds(standby one) lost when I run "ceph -w" >> >> The original mds info should be {mds e42: 1/1/1 up {0=b=up:active}, 1 >> up:standby} >> but I found the standby one lost >> >> so I ssh to mds1 to run "cmds -i a -c /etc/ceph/ceph.conf" , and check >> the mds daemon run correctly. >> >> After then, I found 17 osds suddenly down and out, >> >> --------------------------------------------------------------------------------------------------------- >> osd0 up in weight 1 up_from 139 up_thru 229 down_at 138 >> last_clean_interval 120-137 192.168.10.10:6800/11191 >> 192.168.10.10:6801/11191 192.168.10.10:6802/11191 >> osd1 down out up_from 141 up_thru 160 down_at 166 last_clean_interval >> 117-139 >> osd2 up in weight 1 up_from 147 up_thru 222 down_at 146 >> last_clean_interval 125-145 192.168.10.12:6800/10173 >> 192.168.10.12:6801/10173 192.168.10.12:6802/10173 >> osd3 down out up_from 154 up_thru 160 down_at 166 last_clean_interval >> 130-152 >> osd4 down out up_from 155 up_thru 160 down_at 166 last_clean_interval >> 130-153 >> osd5 down out up_from 153 up_thru 160 down_at 166 last_clean_interval >> 134-151 >> osd6 down out up_from 153 up_thru 160 down_at 170 last_clean_interval >> 135-151 >> osd7 down out up_from 157 up_thru 160 down_at 162 last_clean_interval >> 133-155 >> osd8 down out up_from 154 up_thru 160 down_at 166 last_clean_interval >> 134-152 >> osd9 down out up_from 155 up_thru 160 down_at 168 last_clean_interval >> 133-153 >> osd10 down out up_from 141 up_thru 160 down_at 168 last_clean_interval >> 118-139 >> osd11 down out up_from 145 up_thru 160 down_at 168 last_clean_interval >> 119-143 >> osd12 down out up_from 142 up_thru 160 down_at 166 last_clean_interval >> 123-140 >> osd13 down out up_from 147 up_thru 160 down_at 166 last_clean_interval >> 123-145 >> osd14 up in weight 1 up_from 143 up_thru 223 down_at 142 >> last_clean_interval 124-141 192.168.10.24:6800/10122 >> 192.168.10.24:6801/10122 192.168.10.24:6802/10122 >> osd15 down out up_from 147 up_thru 160 down_at 164 last_clean_interval >> 121-145 >> osd16 up in weight 1 up_from 148 up_thru 222 down_at 147 >> last_clean_interval 124-146 192.168.10.26:6800/9881 >> 192.168.10.26:6801/9881 192.168.10.26:6802/9881 >> osd17 up in weight 1 up_from 148 up_thru 223 down_at 147 >> last_clean_interval 122-146 192.168.10.27:6800/9986 >> 192.168.10.27:6801/9986 192.168.10.27:6802/9986 >> osd18 down out up_from 146 up_thru 160 down_at 168 last_clean_interval >> 124-144 >> osd19 down out up_from 147 up_thru 160 down_at 166 last_clean_interval >> 125-145 >> osd20 up in weight 1 up_from 148 up_thru 222 down_at 147 >> last_clean_interval 126-146 192.168.10.30:6800/9816 >> 192.168.10.30:6801/9816 192.168.10.30:6802/9816 >> osd21 down out up_from 148 up_thru 160 down_at 168 last_clean_interval >> 126-146 >> osd22 up in weight 1 up_from 149 up_thru 220 down_at 148 >> last_clean_interval 127-147 192.168.10.32:6800/9640 >> 192.168.10.32:6801/9640 192.168.10.32:6802/9640 >> osd23 down out up_from 153 up_thru 160 down_at 166 last_clean_interval >> 128-151 >> osd24 up in weight 1 up_from 150 up_thru 225 down_at 149 >> last_clean_interval 132-148 192.168.10.34:6800/10581 >> 192.168.10.34:6801/10581 192.168.10.34:6802/10581 >> >> --------------------------------------------------------------------------------------------------------- >> >> Many pgs are degraded, I ssh to every down and out osd host to see log >> (/var/log/ceph/osd.x.log), but there is nothing recorded in logs... >> Why the logs stopped logging anything? >> >> So I run "cosd -i x -c /etc/ceph/ceph.conf" on every down osd >> individually to make them up and in, and then I can read/write files >> form ceph.... > > At this point, did the osds go down again? >> >> Could anyone tell me what's going on in my environment? >> 1. OSD Stability problem? >> 2. Ceph didn't write logs unexpectedly. >> >> Thanks a lot! :) > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html