On 05/27/2011 06:16 PM, Gregory Farnum wrote:
This is an interesting one -- the invariant that assert is checking
isn't too complicated (that the object lives on the RecoveryWQ's
queue) and seems to hold everywhere the RecoveryWQ is called. And the
functions modifying the queue are always called under the workqueue
lock, and do maintenance if the xlist::item is on a different list.
Which makes me think that the problem must be from conflating the
RecoveryWQ lock and the PG lock in the few places that modify the
PG::recovery_item directly, rather than via RecoveryWQ functions.
Anybody more familiar than me with this have ideas?
Fyodor, based on the time stamps and output you've given us, I assume
you don't have more detailed logs?
-Greg
Greg, i got this crash again.
Let me tell you the configuration and what is happening:
Configuration:
6 osd servers. 4G RAM, 4*1T hdd (mdadmed to raid0), 2*1G etherchannel
ethernet, Ubuntu server 11.04/64 with kernel 2.6.39 (hand compiled)
mon+mds server 24G RAM, the same os.
On each OSD Journal placed on 1G tempfs. OSD data - on xfs in this case.
Configuration file:
[global]
max open files = 131072
log file = /var/log/ceph/$name.log
pid file = /var/run/ceph/$name.pid
[mon]
mon data = /mfs/mon$id
[mon.0]
mon addr = 10.5.51.230:6789
[mds]
keyring = /mfs/mds/keyring.$name
[mds.0]
host = mds0
[osd]
osd data = /$name
osd journal = /journal/$name
osd journal size = 950
journal dio = false
[osd.0]
host = osd0
cluster addr = 10.5.51.10
public addr = 10.5.51.140
[osd.1]
host = osd1
cluster addr = 10.5.51.11
public addr = 10.5.51.141
[osd.2]
host = osd2
cluster addr = 10.5.51.12
public addr = 10.5.51.142
[osd.3]
host = osd3
cluster addr = 10.5.51.13
public addr = 10.5.51.143
[osd.4]
host = osd4
cluster addr = 10.5.51.14
public addr = 10.5.51.144
[osd.5]
host = osd5
cluster addr = 10.5.51.15
public addr = 10.5.51.145
What happening:
osd2 was crashed, rebooted, osd data and journal created from scratch by
"cosd --mkfs -i 2 --monmap /tmp/monmap" and server started.
Additional - on osd2 enables "writeahaed", but I think it's not
principal in this case.
Well, server start rebalancing:
2011-05-27 15:12:49.323558 7f3b69de5740 ceph version 0.28.1.commit:
d66c6ca19bbde3c363b135b66072de44e67c6632. process: cosd. pid: 1694
2011-05-27 15:12:49.325331 7f3b69de5740 filestore(/osd.2) mount FIEMAP
ioctl is NOT supported
2011-05-27 15:12:49.325378 7f3b69de5740 filestore(/osd.2) mount did NOT
detect btrfs
2011-05-27 15:12:49.325467 7f3b69de5740 filestore(/osd.2) mount found
snaps <>
2011-05-27 15:12:49.325512 7f3b69de5740 filestore(/osd.2) mount:
WRITEAHEAD journal mode explicitly enabled in conf
2011-05-27 15:12:49.325526 7f3b69de5740 filestore(/osd.2) mount WARNING:
not btrfs or ext3; data may be lost
2011-05-27 15:12:49.325606 7f3b69de5740 journal _open /journal/osd.2 fd
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.325641 7f3b69de5740 journal read_entry 4096 : seq 1
203 bytes
2011-05-27 15:12:49.325698 7f3b69de5740 journal _open /journal/osd.2 fd
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.544716 7f3b59656700 -- 10.5.51.12:6801/1694 >>
10.5.51.14:6801/5070 pipe(0x1239d20 sd=27 pgs=0 cs=0 l=0).accept we
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544798 7f3b59c5c700 -- 10.5.51.12:6801/1694 >>
10.5.51.13:6801/5165 pipe(0x104b950 sd=14 pgs=0 cs=0 l=0).accept we
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544864 7f3b59757700 -- 10.5.51.12:6801/1694 >>
10.5.51.15:6801/1574 pipe(0x11e7cd0 sd=16 pgs=0 cs=0 l=0).accept we
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544909 7f3b59959700 -- 10.5.51.12:6801/1694 >>
10.5.51.10:6801/6148 pipe(0x11d7d30 sd=15 pgs=0 cs=0 l=0).accept we
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:13:23.015637 7f3b64579700 journal check_for_full at
66404352 : JOURNAL FULL 66404352 >= 851967 (max_size 996147200 start
67256320)
2011-05-27 15:13:25.586081 7f3b5dc6b700 journal throttle: waited for bytes
2011-05-27 15:13:25.601789 7f3b5d46a700 journal throttle: waited for bytes
[...] and after 2 hours:
2011-05-27 17:30:21.355034 7f3b64579700 journal check_for_full at
415199232 : JOURNAL FULL 415199232 >= 778239 (max_size 996147200 start
415977472)
2011-05-27 17:30:23.441445 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:36.362877 7f3b64579700 journal check_for_full at
414326784 : JOURNAL FULL 414326784 >= 872447 (max_size 996147200 start
415199232)
2011-05-27 17:30:38.391372 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:50.373936 7f3b64579700 journal check_for_full at
414314496 : JOURNAL FULL 414314496 >= 12287 (max_size 996147200 start
414326784)
./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*)
[with T = PG*]', in thread '0x7f3b5cc69700'
./include/xlist.h: 107: FAILED assert(i->_list == this)
ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
3: (ThreadPool::worker()+0x10a) [0x65799a]
4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
5: (()+0x6d8c) [0x7f3b697b5d8c]
6: (clone()+0x6d) [0x7f3b6866804d]
ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
3: (ThreadPool::worker()+0x10a) [0x65799a]
4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
5: (()+0x6d8c) [0x7f3b697b5d8c]
6: (clone()+0x6d) [0x7f3b6866804d]
*** Caught signal (Aborted) **
in thread 0x7f3b5cc69700
ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
1: /usr/bin/cosd() [0x6729f9]
2: (()+0xfc60) [0x7f3b697bec60]
3: (gsignal()+0x35) [0x7f3b685b5d05]
4: (abort()+0x186) [0x7f3b685b9ab6]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3b68e6c6dd]
6: (()+0xb9926) [0x7f3b68e6a926]
7: (()+0xb9953) [0x7f3b68e6a953]
8: (()+0xb9a5e) [0x7f3b68e6aa5e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x362) [0x655e32]
10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
12: (ThreadPool::worker()+0x10a) [0x65799a]
13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
14: (()+0x6d8c) [0x7f3b697b5d8c]
15: (clone()+0x6d) [0x7f3b6866804d]
I.e. it's not "easy reproduced" bug. While I had less data in the
cluster - I not seen this error.
I think that I do not have enough space for "full" log for 2-3 hours. Sorry.
WBR,
Fyodor.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html