Re: Ceph mds stuck on replay forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 11, 2016 at 11:33 AM, Zhi Zhang <zhang.david2011@xxxxxxxxx> wrote:
> Hi Guys,
>
> Last week one of our clusters became unavailable because MDSes always
> stuck on replay state.
>
> From the log, we found some clues:
>
> MDS wants to replay log from 31885100783 to 31935430656. The last
> prefetch expects 4194304 bytes from one mds log. At this point, the
> requested_pos is increased to write_pos which is 31935430656, so there
> will be no prefetch any more.
>
> But this mds log (already the latest one) is only 4192831 bytes, so
> finally read_pos can only reach 31935429183, which is less than
> expected write_pos. MDS wants to read more for replay, but there is no
> mds log can be read, hence MDS is stuck forever.
>
> See below mds logs:
> ----------------------------------------------------------------------------------------
> 2016-01-08 13:26:34.468773 7f16a78a7700 10 mds.0.log replay start,
> from 31885100783 to 31935430656
>
> 2016-01-08 13:26:35.164194 7f16a50a2700 10 mds.0.journaler(ro)
> _issue_read reading 31931236352~4194304, read pointers
> 31889294570/31931236352/31935430656
> 2016-01-08 13:26:35.164219 7f16a50a2700 10 mds.0.objecter _op_submit
> op 0x536ba80
> 2016-01-08 13:26:35.164235 7f16a50a2700 10 mds.0.objecter _calc_target
>  pgid 2.ccfa8253 acting [0,6]
> 2016-01-08 13:26:35.164241 7f16a50a2700 20 mds.0.objecter _get_session
> s=0x53b0f00 osd=0 2
> 2016-01-08 13:26:35.164243 7f16a50a2700 20 mds.0.objecter  note: not
> requesting commit
> 2016-01-08 13:26:35.164256 7f16a50a2700 10 mds.0.objecter _op_submit
> oid 200.00001dbd @2 @2 [read 0~4194304] tid 0 osd.0
>
> 2016-01-08 13:26:35.172434 7f16a98ab700 10 mds.0.objecter ms_dispatch
> 0x5370000 osd_op_reply(22 200.00001dbd [read 0~4192831] v0'0 uv42142
> ondisk = 0) v6
> 2016-01-08 13:26:35.172493 7f16a98ab700 10 mds.0.objecter in handle_osd_op_reply
>
> 2016-01-08 13:26:35.172832 7f16a78a7700 10 mds.0.journaler(ro)
> _finish_read got 31931236352~4192831
> 2016-01-08 13:26:35.172852 7f16a78a7700 10 mds.0.journaler(ro)
> _assimilate_prefetch 31931236352~4192831
> 2016-01-08 13:26:35.172896 7f16a78a7700 10 mds.0.journaler(ro)
> _assimilate_prefetch read_buf now 31889339375~46089808, read pointers
> 31889339375/31935429183/31935430656
>
> 2016-01-08 13:26:42.541242 7f16a50a2700 10 mds.0.journaler(ro)
> _is_readable read_buf.length() == 0, but need 12 for next entry;
> fetch_len is 41943040
> 2016-01-08 13:26:42.541251 7f16a50a2700 10 mds.0.journaler(ro)
> _is_readable: not readable, returning false
> 2016-01-08 13:26:42.541261 7f16a50a2700 10 mds.0.journaler(ro) _prefetch
> 2016-01-08 13:26:42.541272 7f16a50a2700 10 mds.0.journaler(ro)
> _prefetch fetch_len=41943040 read_pos=31935429183 adjustment=1473
> target=31977373696 write_pos=31935430656
> 2016-01-08 13:26:42.541282 7f16a50a2700 10 mds.0.journaler(ro)
> _is_readable read_buf.length() == 0, but need 12 for next entry;
> fetch_len is 41943040
> 2016-01-08 13:26:42.541292 7f16a50a2700 10 mds.0.journaler(ro)
> _is_readable: not readable, returning false
> 2016-01-08 13:26:42.541302 7f16a50a2700 10 mds.0.journaler(ro)
> wait_for_readable at 31935429183 onreadable 0x7f16a50a1d40
> ----------------------------------------------------------------------------------------
>

this is like http://tracker.ceph.com/issues/13167. which version ceph
are you using?

Regards
Yan, Zheng

>
> From mds log object, 200.00001dbd is the latest one whose size is
> 4192831 bytes. The write_pos in 200.00000000 object is 31935430656.
>
> The strange thing is the modified time of this two objects.
> 200.00000000 object's modified time is older than 200.00001dbd
> object's. Does this mean mds update log meta info first, then write
> log data? In our case, writing log data to 200.00001dbd fails somehow?
>
> ----------------------------------------------------------------------------------------
> [ceph@xxx /data/lib/ceph/osd/ceph-0/current/2.94_head/DIR_4/DIR_9/DIR_4]$ ls -lh
> total 4.0K
> -rw-r--r-- 1 root root 90 Jan  8 02:06 200.00000000__head_844F3494__2
>
> [ceph@xxx /data/lib/ceph/osd/ceph-0/current/2.53_head/DIR_3/DIR_5/DIR_2]$ ls -l
> total 4096
> -rw-r--r-- 1 root root 4192831 Jan  8 02:09 200.00001dbd__head_CCFA8253__2
> ----------------------------------------------------------------------------------------
>
>
> Regards,
> Zhi Zhang (David)
> Contact: zhang.david2011@xxxxxxxxx
>               zhangz.david@xxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux