On Mon, Jan 11, 2016 at 11:33 AM, Zhi Zhang <zhang.david2011@xxxxxxxxx> wrote: > Hi Guys, > > Last week one of our clusters became unavailable because MDSes always > stuck on replay state. > > From the log, we found some clues: > > MDS wants to replay log from 31885100783 to 31935430656. The last > prefetch expects 4194304 bytes from one mds log. At this point, the > requested_pos is increased to write_pos which is 31935430656, so there > will be no prefetch any more. > > But this mds log (already the latest one) is only 4192831 bytes, so > finally read_pos can only reach 31935429183, which is less than > expected write_pos. MDS wants to read more for replay, but there is no > mds log can be read, hence MDS is stuck forever. > > See below mds logs: > ---------------------------------------------------------------------------------------- > 2016-01-08 13:26:34.468773 7f16a78a7700 10 mds.0.log replay start, > from 31885100783 to 31935430656 > > 2016-01-08 13:26:35.164194 7f16a50a2700 10 mds.0.journaler(ro) > _issue_read reading 31931236352~4194304, read pointers > 31889294570/31931236352/31935430656 > 2016-01-08 13:26:35.164219 7f16a50a2700 10 mds.0.objecter _op_submit > op 0x536ba80 > 2016-01-08 13:26:35.164235 7f16a50a2700 10 mds.0.objecter _calc_target > pgid 2.ccfa8253 acting [0,6] > 2016-01-08 13:26:35.164241 7f16a50a2700 20 mds.0.objecter _get_session > s=0x53b0f00 osd=0 2 > 2016-01-08 13:26:35.164243 7f16a50a2700 20 mds.0.objecter note: not > requesting commit > 2016-01-08 13:26:35.164256 7f16a50a2700 10 mds.0.objecter _op_submit > oid 200.00001dbd @2 @2 [read 0~4194304] tid 0 osd.0 > > 2016-01-08 13:26:35.172434 7f16a98ab700 10 mds.0.objecter ms_dispatch > 0x5370000 osd_op_reply(22 200.00001dbd [read 0~4192831] v0'0 uv42142 > ondisk = 0) v6 > 2016-01-08 13:26:35.172493 7f16a98ab700 10 mds.0.objecter in handle_osd_op_reply > > 2016-01-08 13:26:35.172832 7f16a78a7700 10 mds.0.journaler(ro) > _finish_read got 31931236352~4192831 > 2016-01-08 13:26:35.172852 7f16a78a7700 10 mds.0.journaler(ro) > _assimilate_prefetch 31931236352~4192831 > 2016-01-08 13:26:35.172896 7f16a78a7700 10 mds.0.journaler(ro) > _assimilate_prefetch read_buf now 31889339375~46089808, read pointers > 31889339375/31935429183/31935430656 > > 2016-01-08 13:26:42.541242 7f16a50a2700 10 mds.0.journaler(ro) > _is_readable read_buf.length() == 0, but need 12 for next entry; > fetch_len is 41943040 > 2016-01-08 13:26:42.541251 7f16a50a2700 10 mds.0.journaler(ro) > _is_readable: not readable, returning false > 2016-01-08 13:26:42.541261 7f16a50a2700 10 mds.0.journaler(ro) _prefetch > 2016-01-08 13:26:42.541272 7f16a50a2700 10 mds.0.journaler(ro) > _prefetch fetch_len=41943040 read_pos=31935429183 adjustment=1473 > target=31977373696 write_pos=31935430656 > 2016-01-08 13:26:42.541282 7f16a50a2700 10 mds.0.journaler(ro) > _is_readable read_buf.length() == 0, but need 12 for next entry; > fetch_len is 41943040 > 2016-01-08 13:26:42.541292 7f16a50a2700 10 mds.0.journaler(ro) > _is_readable: not readable, returning false > 2016-01-08 13:26:42.541302 7f16a50a2700 10 mds.0.journaler(ro) > wait_for_readable at 31935429183 onreadable 0x7f16a50a1d40 > ---------------------------------------------------------------------------------------- > this is like http://tracker.ceph.com/issues/13167. which version ceph are you using? Regards Yan, Zheng > > From mds log object, 200.00001dbd is the latest one whose size is > 4192831 bytes. The write_pos in 200.00000000 object is 31935430656. > > The strange thing is the modified time of this two objects. > 200.00000000 object's modified time is older than 200.00001dbd > object's. Does this mean mds update log meta info first, then write > log data? In our case, writing log data to 200.00001dbd fails somehow? > > ---------------------------------------------------------------------------------------- > [ceph@xxx /data/lib/ceph/osd/ceph-0/current/2.94_head/DIR_4/DIR_9/DIR_4]$ ls -lh > total 4.0K > -rw-r--r-- 1 root root 90 Jan 8 02:06 200.00000000__head_844F3494__2 > > [ceph@xxx /data/lib/ceph/osd/ceph-0/current/2.53_head/DIR_3/DIR_5/DIR_2]$ ls -l > total 4096 > -rw-r--r-- 1 root root 4192831 Jan 8 02:09 200.00001dbd__head_CCFA8253__2 > ---------------------------------------------------------------------------------------- > > > Regards, > Zhi Zhang (David) > Contact: zhang.david2011@xxxxxxxxx > zhangz.david@xxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html