Re: FileStore::read meta/pginfo_** error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/9/20 Samuel Just <samuelj@xxxxxxxxxxxxxxx>:
> The truncate followed by the write is acceptable since they are part of a
> transaction which *should* guarantee atomicity.  Are you using btrfs?
> -Sam
yes,we use the btrfs as underlayer fs,but the debug log showed the
write was interrupted by  stopping cosd process., but not atomicity.
> On 09/19/2011 02:54 AM, huang jun wrote:
>>
>> hi,all
>> we build cluster with 40 OSDs, and do nothing to it.
>> after we restart the whole cluster use "/etc/init.d/ceph -a restart",
>> recently we got the error like Bug #1486.
>> But it provides more detail debug log.
>> before we stop the cosd process of osd23:
>>
>> 2011-09-19 11:13:33.422941 7f98c3221700 -- 192.168.0.112:6804/6800<==
>> osd38 192.168.0.120:6801/5899 474 ==== pg_info(1 pgs e52) v1
>> ==== 316+0+0 (3413838934 0 0) 0x2f73e00 con 0x2c29640
>> 2011-09-19 11:13:33.422947 7f98c3221700 osd23 64 _dispatch 0x2f73e00
>> pg_info(1 pgs e52) v1
>> 2011-09-19 11:13:33.422969 7f98c3221700 osd23 64 handle_pg_info
>> pg_info(1 pgs e52) v1 from osd38
>> 2011-09-19 11:13:33.422981 7f98c3221700 osd23 64
>> require_same_or_newer_map 52 (i am 64) 0x2f73e00
>> 2011-09-19 11:13:33.422997 7f98c3221700 osd23 64 pg[2.456( empty n=0
>> ec=2 les/c 24/51 17/17/17) [38,23] r=1 active] handle_info 2.45
>> 6( empty n=0 ec=2 les/c 24/52 17/17/17) from osd38
>> 2011-09-19 11:13:33.423014 7f98c3221700 osd23 64 pg[2.456( empty n=0
>> ec=2 les/c 24/52 17/17/17) [38,23] r=1 active] write_info info
>> 309
>> 2011-09-19 11:13:33.423024 7f98c3221700 osd23 64 pg[2.456( empty n=0
>> ec=2 les/c 24/52 17/17/17) [38,23] r=1 active] write_info bigbl
>>  8
>> 2011-09-19 11:13:33.423030 7f98c3221700 filestore(/data/osd.23)
>> queue_transactions existing osr 0x3060c30/0x33c3b70
>> 2011-09-19 11:13:33.423035 7f98c3221700 filestore(/data/osd.23)
>> queue_transactions (trailing journal) 10200 0x2f986e0
>> 2011-09-19 11:13:33.423039 7f98c3221700 filestore(/data/osd.23)
>> _do_transaction on 0x2f986e0
>> 2011-09-19 11:13:33.423045 7f98c3221700 filestore(/data/osd.23)
>> collection_setattr /data/osd.23/current/2.456_head 'info' len 309
>> 2011-09-19 11:13:33.423071 7f98c3221700 filestore(/data/osd.23)
>> collection_setattr /data/osd.23/current/2.456_head 'info' len 309 =
>> 309
>> 2011-09-19 11:13:33.423079 7f98c3221700 filestore(/data/osd.23)
>> truncate meta/pginfo_2.456/0 size 0
>> 2011-09-19 11:13:33.423085 7f98c3221700 filestore(/data/osd.23)
>> lfn_get cid=meta oid=pginfo_2.456/0 pathname=/data/osd.23/current/me
>> ta/pginfo_2.456_0 lfn=pginfo_2.456_0 is_lfn=0
>> 2011-09-19 11:13:33.423106 7f98c3221700 filestore(/data/osd.23)
>> truncate meta/pginfo_2.456/0 size 0 = 0
>> 2011-09-19 11:13:33.423114 7f98c3221700 filestore(/data/osd.23) write
>> meta/pginfo_2.456/0 0~8
>> 2011-09-19 11:13:33.423120 7f98c3221700 filestore(/data/osd.23)
>> lfn_get cid=meta oid=pginfo_2.456/0 pathname=/data/osd.23/current/me
>> ta/pginfo_2.456_0 lfn=pginfo_2.456_0 is_lfn=0
>> 2011-09-19 11:13:33.423142 7f98c3221700 filestore(/data/osd.23)
>> queue_flusher ep 208 fd 21 0~8 qlen 1
>> 2011-09-19 11:13:33.423150 7f98c5a26700 filestore(/data/osd.23)
>> flusher_entry awoke
>> 2011-09-19 11:13:33.423156 7f98c5a26700 filestore(/data/osd.23)
>> flusher_entry flushing+closing 21 ep 208
>> 2011-09-19 11:13:33.423172 7f98c5a26700 filestore(/data/osd.23)
>> flusher_entry sleeping
>> 2011-09-19 11:13:33.423179 7f98c3221700 filestore(/data/osd.23) write
>> meta/pginfo_2.456/0 0~8 = 8
>> 2011-09-19 11:13:33.423201 7f98c3221700 -- 192.168.0.112:6804/6800<==
>> osd34 192.168.0.118:6801/5752 304 ==== pg_info(1 pgs e55) v1
>> ==== 316+0+0 (362667609 0 0) 0x36d0e00 con 0x2b96780
>> 2011-09-19 11:13:33.423207 7f98c3221700 osd23 64 _dispatch 0x36d0e00
>> pg_info(1 pgs e55) v1
>> 2011-09-19 11:13:33.423212 7f98c3221700 osd23 64 handle_pg_info
>> pg_info(1 pgs e55) v1 from osd34
>> 2011-09-19 11:13:33.423216 7f98c3221700 osd23 64
>> require_same_or_newer_map 55 (i am 64) 0x36d0e00
>> 2011-09-19 11:13:33.423231 7f98c3221700 osd23 64 pg[1.177( empty n=0
>> ec=2 les/c 24/54 14/14/14) [34,23] r=1 active] handle_info 1.17
>> 7( empty n=0 ec=2 les/c 24/55 14/14/14) from osd34
>> 2011-09-19 11:13:33.423245 7f98c3221700 osd23 64 pg[1.177( empty n=0
>> ec=2 les/c 24/55 14/14/14) [34,23] r=1 active] write_info info
>> 309
>> 2011-09-19 11:13:33.423255 7f98c3221700 osd23 64 pg[1.177( empty n=0
>> ec=2 les/c 24/55 14/14/14) [34,23] r=1 active] write_info bigbl
>>  8
>> 2011-09-19 11:13:33.423261 7f98c3221700 filestore(/data/osd.23)
>> queue_transactions existing osr 0x3443960/0x340db70
>> 2011-09-19 11:13:33.423266 7f98c3221700 filestore(/data/osd.23)
>> queue_transactions (trailing journal) 10201 0x2f98f00
>> 2011-09-19 11:13:33.423270 7f98c3221700 filestore(/data/osd.23)
>> _do_transaction on 0x2f98f00
>> 2011-09-19 11:13:33.423276 7f98c3221700 filestore(/data/osd.23)
>> collection_setattr /data/osd.23/current/1.177_head 'info' len 309
>> 2011-09-19 11:13:33.423296 7f98c3221700 filestore(/data/osd.23)
>> collection_setattr /data/osd.23/current/1.177_head 'info' len 309 =
>> 309
>> 2011-09-19 11:13:33.423303 7f98c3221700 filestore(/data/osd.23)
>> truncate meta/pginfo_1.177/0 size 0
>> 2011-09-19 11:13:33.423309 7f98c3221700 filestore(/data/osd.23)
>> lfn_get cid=meta oid=pginfo_1.177/0 pathname=/data/osd.23/current/me
>> ta/pginfo_1.177_0 lfn=pginfo_1.177_0 is_lfn=0
>> 2011-09-19 11:13:33.423331 7f98c3221700 filestore(/data/osd.23)
>> truncate meta/pginfo_1.177/0 size 0 = 0
>> 2011-09-19 11:13:33.423339 7f98c3221700 filestore(/data/osd.23) write
>> meta/pginfo_1.177/0 0~8
>> 2011-09-19 11:13:33.423344 7f98c3221700 filestore(/data/osd.23)
>> lfn_get cid=meta oid=pginfo_1.177/0 pathname=/data/osd.23/current/me
>> ta/pginfo_1.177_0 lfn=pginfo_1.177_0 is_lfn=0
>> *** Caught signal (Terminated) **
>>  in thread 0x7f98cdee0720. Shutting down.
>>
>> then start the cosd process,can not read the pginfo_1.177,so Terminated:
>>
>> 2011-09-19 11:13:35.836387 7fc8470a1720 osd23 64 _open_lock_pg 1.177
>> 2011-09-19 11:13:35.836391 7fc8470a1720 osd23 64 _get_pool 1 93 ->  94
>> 2011-09-19 11:13:35.836406 7fc8470a1720 osd23 64 pg[1.177( DNE empty
>> n=0 ec=0 les/c 0/0 0/0/0) [] r=0 mlcod 0'0 inactive] enter Initial
>> 2011-09-19 11:13:35.836412 7fc8470a1720 filestore(/data/osd.23)
>> collection_getattr /data/osd.23/current/1.177_head 'info'
>> 2011-09-19 11:13:35.836435 7fc8470a1720 filestore(/data/osd.23)
>> collection_getattr /data/osd.23/current/1.177_head 'info' = 309
>> 2011-09-19 11:13:35.836443 7fc8470a1720 filestore(/data/osd.23) read
>> meta/pginfo_1.177/0 0~0
>> 2011-09-19 11:13:35.836449 7fc8470a1720 filestore(/data/osd.23)
>> lfn_get cid=meta oid=pginfo_1.177/0
>> pathname=/data/osd.23/current/meta/pginfo_1.177_0 lfn=pginfo_1.177_0
>> is_lfn=0
>> 2011-09-19 11:13:35.836464 7fc8470a1720 filestore(/data/osd.23)
>> FileStore::read meta/pginfo_1.177/0 0~0/0
>> *** Caught signal (Aborted) ***
>>  in thread 0x7fc8470a1720
>> ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
>>  1: /bsd/bin/cosd() [0x63e102]
>>  2: (()+0xef60) [0x7fc846a80f60]
>>  3: (gsignal()+0x35) [0x7fc84563f165]
>>  4: (abort()+0x180) [0x7fc845641f70]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fc845ed3dc5]
>>  6: (()+0xcb166) [0x7fc845ed2166]
>>  7: (()+0xcb193) [0x7fc845ed2193]
>>  8: (()+0xcb28e) [0x7fc845ed228e]
>>  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x132)
>> [0x491f62]
>>  10: (void decode<unsigned int, PG::Interval>(std::map<unsigned int,
>> PG::Interval, std::less<unsigned int>,
>> std::allocator<std::pair<unsigned int const, PG::Interval>  >  >&,
>> ceph::buffer::list::iterator&)+0x31) [0x5a0b11]
>>  11: (PG::read_state(ObjectStore*)+0x2d3) [0x575213]
>>  12: (OSD::load_pgs()+0x1f5) [0x514f05]
>>  13: (OSD::init()+0x659) [0x51fd39]
>>  14: (main()+0x2401) [0x490321]
>>  15: (__libc_start_main()+0xfd) [0x7fc84562bc4d]
>>  16: /bsd/bin/cosd() [0x48cac9]
>>
>> it's a problem when we restart the osd between the truncate and write,
>> so the meta/pginfo size is zero.
>> why we should truncate before write in PG::write_info?
>> if we got the wrong data in PG::read_state,is it reationable to
>> terminate the cosd process? or can we get the data from peer?
>>
>> thanks!
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux