Re: [ceph-users] data corruption with hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just
building your branch. I'm not sure what the difference between your
branch and 0.94.6 is, I don't see any commits against
osd/ReplicatedPG.cc in the last 5 months other than the one you did
today.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Mar 17, 2016 at 11:38 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> Yep, let me pull and build that branch. I tried installing the dbg
> packages and running it in gdb, but it didn't load the symbols.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>>> Also, is this ceph_test_rados rewriting objects quickly? I think that
>>> the issue is with rewriting objects so if we can tailor the
>>> ceph_test_rados to do that, it might be easier to reproduce.
>>
>> It's doing lots of overwrites, yeah.
>>
>> I was albe to reproduce--thanks!  It looks like it's specific to
>> hammer.  The code was rewritten for jewel so it doesn't affect the
>> latest.  The problem is that maybe_handle_cache may proxy the read and
>> also still try to handle the same request locally (if it doesn't trigger a
>> promote).
>>
>> Here's my proposed fix:
>>
>>         https://github.com/ceph/ceph/pull/8187
>>
>> Do you mind testing this branch?
>>
>> It doesn't appear to be directly related to flipping between writeback and
>> forward, although it may be that we are seeing two unrelated issues.  I
>> seemed to be able to trigger it more easily when I flipped modes, but the
>> bug itself was a simple issue in the writeback mode logic.  :/
>>
>> Anyway, please see if this fixes it for you (esp with the RBD workload).
>>
>> Thanks!
>> sage
>>
>>
>>
>>
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>>> > I'll  miss the Ceph community as well. There was a few things I really
>>> > wanted to work in with Ceph.
>>> >
>>> > I got this:
>>> >
>>> > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028)
>>> > dirty exists
>>> > 1038:  left oid 13 (ObjNum 1028 snap 0 seq_num 1028)
>>> > 1040:  finishing write tid 1 to nodez23350-256
>>> > 1040:  finishing write tid 2 to nodez23350-256
>>> > 1040:  finishing write tid 3 to nodez23350-256
>>> > 1040:  finishing write tid 4 to nodez23350-256
>>> > 1040:  finishing write tid 6 to nodez23350-256
>>> > 1035: done (4 left)
>>> > 1037: done (3 left)
>>> > 1038: done (2 left)
>>> > 1043: read oid 430 snap -1
>>> > 1043:  expect (ObjNum 429 snap 0 seq_num 429)
>>> > 1040:  finishing write tid 7 to nodez23350-256
>>> > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029)
>>> > dirty exists
>>> > 1040:  left oid 256 (ObjNum 1029 snap 0 seq_num 1029)
>>> > 1042:  expect (ObjNum 664 snap 0 seq_num 664)
>>> > 1043: Error: oid 430 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time
>>> > 2016-03-17 10:47:19.085414
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7fa1d472191d]
>>> > 4: (()+0x72519) [0x7fa1d46fc519]
>>> > 5: (()+0x13c178) [0x7fa1d47c6178]
>>> > 6: (()+0x80a4) [0x7fa1d425a0a4]
>>> > 7: (clone()+0x6d) [0x7fa1d2bd504d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > I had to toggle writeback/forward and min_read_recency_for_promote a
>>> > few times to get it, but I don't know if it is because I only have one
>>> > job running. Even with six jobs running, it is not easy to trigger
>>> > with ceph_test_rados, but it is very instant in the RBD VMs.
>>> >
>>> > Here are the six run crashes (I have about the last 2000 lines of each
>>> > if needed):
>>> >
>>> > nodev:
>>> > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num
>>> > 1014) dirty exists
>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>> > 1016:  finishing write tid 1 to nodev21799-1016
>>> > 1016:  finishing write tid 2 to nodev21799-1016
>>> > 1016:  finishing write tid 3 to nodev21799-1016
>>> > 1016:  finishing write tid 4 to nodev21799-1016
>>> > 1016:  finishing write tid 6 to nodev21799-1016
>>> > 1016:  finishing write tid 7 to nodev21799-1016
>>> > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num
>>> > 1015) dirty exists
>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>> > 1017:  finishing write tid 1 to nodev21799-1017
>>> > 1017:  finishing write tid 2 to nodev21799-1017
>>> > 1017:  finishing write tid 3 to nodev21799-1017
>>> > 1017:  finishing write tid 5 to nodev21799-1017
>>> > 1017:  finishing write tid 6 to nodev21799-1017
>>> > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num
>>> > 1016) dirty exists
>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>> > 1018:  finishing write tid 1 to nodev21799-1018
>>> > 1018:  finishing write tid 2 to nodev21799-1018
>>> > 1018:  finishing write tid 3 to nodev21799-1018
>>> > 1018:  finishing write tid 4 to nodev21799-1018
>>> > 1018:  finishing write tid 6 to nodev21799-1018
>>> > 1018:  finishing write tid 7 to nodev21799-1018
>>> > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num
>>> > 1017) dirty exists
>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>> > 1019:  finishing write tid 1 to nodev21799-1019
>>> > 1019:  finishing write tid 2 to nodev21799-1019
>>> > 1019:  finishing write tid 3 to nodev21799-1019
>>> > 1019:  finishing write tid 5 to nodev21799-1019
>>> > 1019:  finishing write tid 6 to nodev21799-1019
>>> > update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018)
>>> > dirty exists
>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>> > 1021:  finishing write tid 1 to nodev21799-1021
>>> > 1020:  finishing write tid 1 to nodev21799-1020
>>> > 1020:  finishing write tid 2 to nodev21799-1020
>>> > 1020:  finishing write tid 3 to nodev21799-1020
>>> > 1020:  finishing write tid 5 to nodev21799-1020
>>> > 1020:  finishing write tid 6 to nodev21799-1020
>>> > update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num
>>> > 1019) dirty exists
>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>> > 1021:  finishing write tid 2 to nodev21799-1021
>>> > 1021:  finishing write tid 3 to nodev21799-1021
>>> > 1021:  finishing write tid 5 to nodev21799-1021
>>> > 1021:  finishing write tid 6 to nodev21799-1021
>>> > update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num
>>> > 1020) dirty exists
>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>> > 1022:  finishing write tid 1 to nodev21799-1022
>>> > 1022:  finishing write tid 2 to nodev21799-1022
>>> > 1022:  finishing write tid 3 to nodev21799-1022
>>> > 1022:  finishing write tid 5 to nodev21799-1022
>>> > 1022:  finishing write tid 6 to nodev21799-1022
>>> > update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num
>>> > 1021) dirty exists
>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>> > 1023:  finishing write tid 1 to nodev21799-1023
>>> > 1023:  finishing write tid 2 to nodev21799-1023
>>> > 1023:  finishing write tid 3 to nodev21799-1023
>>> > 1023:  finishing write tid 5 to nodev21799-1023
>>> > 1023:  finishing write tid 6 to nodev21799-1023
>>> > update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num
>>> > 1022) dirty exists
>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>> > 1024:  finishing write tid 1 to nodev21799-1024
>>> > 1025: Error: oid 219 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time
>>> > 2016-03-17 10:53:43.493575
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7f0e015dd91d]
>>> > 4: (()+0x72519) [0x7f0e015b8519]
>>> > 5: (()+0x13c178) [0x7f0e01682178]
>>> > 6: (()+0x80a4) [0x7f0e011160a4]
>>> > 7: (clone()+0x6d) [0x7f0dffa9104d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > nodew:
>>> > 1117:  expect (ObjNum 8 snap 0 seq_num 8)
>>> > 1120:  expect (ObjNum 95 snap 0 seq_num 95)
>>> > 1121:  expect (ObjNum 994 snap 0 seq_num 994)
>>> > 1113:  expect (ObjNum 362 snap 0 seq_num 362)
>>> > 1118:  expect (ObjNum 179 snap 0 seq_num 179)
>>> > 1115:  expect (ObjNum 943 snap 0 seq_num 943)
>>> > 1119:  expect (ObjNum 250 snap 0 seq_num 250)
>>> > 1124:  finishing write tid 1 to nodew21820-361
>>> > 1124:  finishing write tid 2 to nodew21820-361
>>> > 1124:  finishing write tid 3 to nodew21820-361
>>> > 1124:  finishing write tid 4 to nodew21820-361
>>> > 1124:  finishing write tid 6 to nodew21820-361
>>> > 1124:  finishing write tid 7 to nodew21820-361
>>> > update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061)
>>> > dirty exists
>>> > 1124:  left oid 361 (ObjNum 1061 snap 0 seq_num 1061)
>>> > 1125:  finishing write tid 1 to nodew21820-486
>>> > 1125:  finishing write tid 2 to nodew21820-486
>>> > 1125:  finishing write tid 3 to nodew21820-486
>>> > 1125:  finishing write tid 5 to nodew21820-486
>>> > 1125:  finishing write tid 6 to nodew21820-486
>>> > update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062)
>>> > dirty exists
>>> > 1125:  left oid 486 (ObjNum 1062 snap 0 seq_num 1062)
>>> > 1126:  expect (ObjNum 289 snap 0 seq_num 289)
>>> > 1127:  finishing write tid 1 to nodew21820-765
>>> > 1127:  finishing write tid 2 to nodew21820-765
>>> > 1127:  finishing write tid 3 to nodew21820-765
>>> > 1127:  finishing write tid 5 to nodew21820-765
>>> > 1127:  finishing write tid 6 to nodew21820-765
>>> > update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063)
>>> > dirty exists
>>> > 1127:  left oid 765 (ObjNum 1063 snap 0 seq_num 1063)
>>> > 1128:  finishing write tid 1 to nodew21820-40
>>> > 1128:  finishing write tid 2 to nodew21820-40
>>> > 1128:  finishing write tid 3 to nodew21820-40
>>> > 1128:  finishing write tid 5 to nodew21820-40
>>> > 1128:  finishing write tid 6 to nodew21820-40
>>> > update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064)
>>> > dirty exists
>>> > 1128:  left oid 40 (ObjNum 1064 snap 0 seq_num 1064)
>>> > 1129:  expect (ObjNum 616 snap 0 seq_num 616)
>>> > 1110: done (14 left)
>>> > 1113: done (13 left)
>>> > 1115: done (12 left)
>>> > 1117: done (11 left)
>>> > 1118: done (10 left)
>>> > 1119: done (9 left)
>>> > 1120: done (8 left)
>>> > 1121: done (7 left)
>>> > 1124: done (6 left)
>>> > 1125: done (5 left)
>>> > 1126: done (4 left)
>>> > 1127: done (3 left)
>>> > 1128: done (2 left)
>>> > 1129: done (1 left)
>>> > 1131: read oid 29 snap -1
>>> > 1131:  expect (ObjNum 28 snap 0 seq_num 28)
>>> > 1132: read oid 764 snap -1
>>> > 1132:  expect (ObjNum 763 snap 0 seq_num 763)
>>> > 1133: read oid 469 snap -1
>>> > 1133:  expect (ObjNum 468 snap 0 seq_num 468)
>>> > 1134: write oid 243 current snap is 0
>>> > 1134:  seq_num 1065 ranges
>>> > {483354=596553,1514502=531232,2509844=632287,3283353=1}
>>> > 1134:  writing nodew21820-243 from 483354 to 1079907 tid 1
>>> > 1134:  writing nodew21820-243 from 1514502 to 2045734 tid 2
>>> > 1134:  writing nodew21820-243 from 2509844 to 3142131 tid 3
>>> > 1134:  writing nodew21820-243 from 3283353 to 3283354 tid 4
>>> > 1135: read oid 569 snap -1
>>> > 1135:  expect (ObjNum 568 snap 0 seq_num 568)
>>> > 1133: Error: oid 469 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time
>>> > 2016-03-17 11:00:02.124951
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7fae7a8ca91d]
>>> > 4: (()+0x72519) [0x7fae7a8a5519]
>>> > 5: (()+0x13c178) [0x7fae7a96f178]
>>> > 6: (()+0x80a4) [0x7fae7a4030a4]
>>> > 7: (clone()+0x6d) [0x7fae78d7e04d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > nodex:
>>> >
>>> > 1024:  finishing write tid 1 to nodex22014-1024
>>> > 1025:  expect (ObjNum 75 snap 0 seq_num 75)
>>> > 1024:  finishing write tid 2 to nodex22014-1024
>>> > 1024:  finishing write tid 3 to nodex22014-1024
>>> > 1024:  finishing write tid 5 to nodex22014-1024
>>> > 1024:  finishing write tid 6 to nodex22014-1024
>>> > update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023)
>>> > dirty exists
>>> > 1024:  left oid 1024 (ObjNum 1023 snap 0 seq_num 1023)
>>> > 982: done (44 left)
>>> > 983: done (43 left)
>>> > 984: done (42 left)
>>> > 985: done (41 left)
>>> > 986: done (40 left)
>>> > 987: done (39 left)
>>> > 988: done (38 left)
>>> > 989: done (37 left)
>>> > 990: done (36 left)
>>> > 991: done (35 left)
>>> > 992: done (34 left)
>>> > 993: done (33 left)
>>> > 994: done (32 left)
>>> > 995: done (31 left)
>>> > 996: done (30 left)
>>> > 997: done (29 left)
>>> > 998: done (28 left)
>>> > 999: done (27 left)
>>> > 1000: done (26 left)
>>> > 1001: done (25 left)
>>> > 1002: done (24 left)
>>> > 1003: done (23 left)
>>> > 1004: done (22 left)
>>> > 1005: done (21 left)
>>> > 1006: done (20 left)
>>> > 1007: done (19 left)
>>> > 1008: done (18 left)
>>> > 1009: done (17 left)
>>> > 1010: done (16 left)
>>> > 1011: done (15 left)
>>> > 1012: done (14 left)
>>> > 1013: done (13 left)
>>> > 1014: done (12 left)
>>> > 1015: done (11 left)
>>> > 1016: done (10 left)
>>> > 1017: done (9 left)
>>> > 1018: done (8 left)
>>> > 1019: done (7 left)
>>> > 1020: done (6 left)
>>> > 1021: done (5 left)
>>> > 1022: done (4 left)
>>> > 1023: done (3 left)
>>> > 1024: done (2 left)
>>> > 1025: done (1 left)
>>> > 1026: done (0 left)
>>> > 1027: delete oid 101 current snap is 0
>>> > 1027: done (0 left)
>>> > 1028: read oid 156 snap -1
>>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>>> > 1029: read oid 691 snap -1
>>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>>> > 1030: read oid 282 snap -1
>>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>>> > 1031: read oid 979 snap -1
>>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>>> > 1032: read oid 203 snap -1
>>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>>> > 1033: setattr oid 464 current snap is 0
>>> > 1032: Error: oid 203 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time
>>> > 2016-03-17 10:53:44.291343
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7faff721191d]
>>> > 4: (()+0x72519) [0x7faff71ec519]
>>> > 5: (()+0x13c178) [0x7faff72b6178]
>>> > 6: (()+0x80a4) [0x7faff6d4a0a4]
>>> > 7: (clone()+0x6d) [0x7faff56c504d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > nodey:
>>> >
>>> > 974: done (52 left)
>>> > 975: done (51 left)
>>> > 976: done (50 left)
>>> > 977: done (49 left)
>>> > 978: done (48 left)
>>> > 979: done (47 left)
>>> > 980: done (46 left)
>>> > 981: done (45 left)
>>> > 982: done (44 left)
>>> > 983: done (43 left)
>>> > 984: done (42 left)
>>> > 985: done (41 left)
>>> > 986: done (40 left)
>>> > 987: done (39 left)
>>> > 988: done (38 left)
>>> > 989: done (37 left)
>>> > 990: done (36 left)
>>> > 991: done (35 left)
>>> > 992: done (34 left)
>>> > 993: done (33 left)
>>> > 994: done (32 left)
>>> > 995: done (31 left)
>>> > 996: done (30 left)
>>> > 997: done (29 left)
>>> > 998: done (28 left)
>>> > 999: done (27 left)
>>> > 1000: done (26 left)
>>> > 1001: done (25 left)
>>> > 1002: done (24 left)
>>> > 1003: done (23 left)
>>> > 1004: done (22 left)
>>> > 1005: done (21 left)
>>> > 1006: done (20 left)
>>> > 1007: done (19 left)
>>> > 1008: done (18 left)
>>> > 1009: done (17 left)
>>> > 1010: done (16 left)
>>> > 1011: done (15 left)
>>> > 1012: done (14 left)
>>> > 1013: done (13 left)
>>> > 1014: done (12 left)
>>> > 1015: done (11 left)
>>> > 1016: done (10 left)
>>> > 1017: done (9 left)
>>> > 1018: done (8 left)
>>> > 1019: done (7 left)
>>> > 1020: done (6 left)
>>> > 1021: done (5 left)
>>> > 1022: done (4 left)
>>> > 1023: done (3 left)
>>> > 1024: done (2 left)
>>> > 1025: done (1 left)
>>> > 1026: done (0 left)
>>> > 1027: delete oid 101 current snap is 0
>>> > 1027: done (0 left)
>>> > 1028: read oid 156 snap -1
>>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>>> > 1029: read oid 691 snap -1
>>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>>> > 1030: read oid 282 snap -1
>>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>>> > 1031: read oid 979 snap -1
>>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>>> > 1032: read oid 203 snap -1
>>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>>> > 1033: setattr oid 464 current snap is 0
>>> > 1028: Error: oid 156 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time
>>> > 2016-03-17 10:53:57.082571
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7f5602f8d91d]
>>> > 4: (()+0x72519) [0x7f5602f68519]
>>> > 5: (()+0x13c178) [0x7f5603032178]
>>> > 6: (()+0x80a4) [0x7f5602ac60a4]
>>> > 7: (clone()+0x6d) [0x7f560144104d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > nodez:
>>> >
>>> > 1014: done (11 left)
>>> > 1026: delete oid 717 current snap is 0
>>> > 1015:  finishing write tid 2 to nodez24249-1015
>>> > 1015:  finishing write tid 4 to nodez24249-1015
>>> > 1015:  finishing write tid 5 to nodez24249-1015
>>> > update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num
>>> > 1014) dirty exists
>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>> > 1016:  finishing write tid 1 to nodez24249-1016
>>> > 1016:  finishing write tid 2 to nodez24249-1016
>>> > 1016:  finishing write tid 3 to nodez24249-1016
>>> > 1016:  finishing write tid 4 to nodez24249-1016
>>> > 1016:  finishing write tid 6 to nodez24249-1016
>>> > 1016:  finishing write tid 7 to nodez24249-1016
>>> > update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num
>>> > 1015) dirty exists
>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>> > 1017:  finishing write tid 1 to nodez24249-1017
>>> > 1017:  finishing write tid 2 to nodez24249-1017
>>> > 1017:  finishing write tid 3 to nodez24249-1017
>>> > 1017:  finishing write tid 5 to nodez24249-1017
>>> > 1017:  finishing write tid 6 to nodez24249-1017
>>> > update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num
>>> > 1016) dirty exists
>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>> > 1018:  finishing write tid 1 to nodez24249-1018
>>> > 1018:  finishing write tid 2 to nodez24249-1018
>>> > 1018:  finishing write tid 3 to nodez24249-1018
>>> > 1018:  finishing write tid 4 to nodez24249-1018
>>> > 1018:  finishing write tid 6 to nodez24249-1018
>>> > 1018:  finishing write tid 7 to nodez24249-1018
>>> > update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num
>>> > 1017) dirty exists
>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>> > 1019:  finishing write tid 1 to nodez24249-1019
>>> > 1019:  finishing write tid 2 to nodez24249-1019
>>> > 1019:  finishing write tid 3 to nodez24249-1019
>>> > 1019:  finishing write tid 5 to nodez24249-1019
>>> > 1019:  finishing write tid 6 to nodez24249-1019
>>> > update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018)
>>> > dirty exists
>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>> > 1020:  finishing write tid 1 to nodez24249-1020
>>> > 1020:  finishing write tid 2 to nodez24249-1020
>>> > 1020:  finishing write tid 3 to nodez24249-1020
>>> > 1020:  finishing write tid 5 to nodez24249-1020
>>> > 1020:  finishing write tid 6 to nodez24249-1020
>>> > update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019)
>>> > dirty exists
>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>> > 1021:  finishing write tid 1 to nodez24249-1021
>>> > 1021:  finishing write tid 2 to nodez24249-1021
>>> > 1021:  finishing write tid 3 to nodez24249-1021
>>> > 1021:  finishing write tid 5 to nodez24249-1021
>>> > 1021:  finishing write tid 6 to nodez24249-1021
>>> > update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num
>>> > 1020) dirty exists
>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>> > 1022:  finishing write tid 1 to nodez24249-1022
>>> > 1022:  finishing write tid 2 to nodez24249-1022
>>> > 1022:  finishing write tid 3 to nodez24249-1022
>>> > 1022:  finishing write tid 5 to nodez24249-1022
>>> > 1022:  finishing write tid 6 to nodez24249-1022
>>> > update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021)
>>> > dirty exists
>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>> > 1023:  finishing write tid 1 to nodez24249-1023
>>> > 1023:  finishing write tid 2 to nodez24249-1023
>>> > 1023:  finishing write tid 3 to nodez24249-1023
>>> > 1023:  finishing write tid 5 to nodez24249-1023
>>> > 1023:  finishing write tid 6 to nodez24249-1023
>>> > update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num
>>> > 1022) dirty exists
>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>> > 1024:  finishing write tid 1 to nodez24249-1024
>>> > 1025: Error: oid 230 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time
>>> > 2016-03-17 10:53:41.757921
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> >  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> >  2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> >  3: (()+0x9791d) [0x7fd9d088d91d]
>>> >  4: (()+0x72519) [0x7fd9d0868519]
>>> >  5: (()+0x13c178) [0x7fd9d0932178]
>>> >  6: (()+0x80a4) [0x7fd9d03c60a4]
>>> >  7: (clone()+0x6d) [0x7fd9ced4104d]
>>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> >
>>> > nodezz:
>>> >
>>> > 1015:  finishing write tid 1 to nodezz25161-1015
>>> > 1015:  finishing write tid 2 to nodezz25161-1015
>>> > 1015:  finishing write tid 4 to nodezz25161-1015
>>> > 1015:  finishing write tid 5 to nodezz25161-1015
>>> > update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014)
>>> > dirty exists
>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>> > 1016:  finishing write tid 1 to nodezz25161-1016
>>> > 1016:  finishing write tid 2 to nodezz25161-1016
>>> > 1016:  finishing write tid 3 to nodezz25161-1016
>>> > 1016:  finishing write tid 4 to nodezz25161-1016
>>> > 1016:  finishing write tid 6 to nodezz25161-1016
>>> > 1016:  finishing write tid 7 to nodezz25161-1016
>>> > update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num
>>> > 1015) dirty exists
>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>> > 1017:  finishing write tid 1 to nodezz25161-1017
>>> > 1017:  finishing write tid 2 to nodezz25161-1017
>>> > 1017:  finishing write tid 3 to nodezz25161-1017
>>> > 1017:  finishing write tid 5 to nodezz25161-1017
>>> > 1017:  finishing write tid 6 to nodezz25161-1017
>>> > update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num
>>> > 1016) dirty exists
>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>> > 1018:  finishing write tid 1 to nodezz25161-1018
>>> > 1018:  finishing write tid 2 to nodezz25161-1018
>>> > 1018:  finishing write tid 3 to nodezz25161-1018
>>> > 1018:  finishing write tid 4 to nodezz25161-1018
>>> > 1018:  finishing write tid 6 to nodezz25161-1018
>>> > 1018:  finishing write tid 7 to nodezz25161-1018
>>> > update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num
>>> > 1017) dirty exists
>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>> > 1019:  finishing write tid 1 to nodezz25161-1019
>>> > 1019:  finishing write tid 2 to nodezz25161-1019
>>> > 1019:  finishing write tid 3 to nodezz25161-1019
>>> > 1019:  finishing write tid 5 to nodezz25161-1019
>>> > 1019:  finishing write tid 6 to nodezz25161-1019
>>> > update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num
>>> > 1018) dirty exists
>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>> > 1020:  finishing write tid 1 to nodezz25161-1020
>>> > 1020:  finishing write tid 2 to nodezz25161-1020
>>> > 1020:  finishing write tid 3 to nodezz25161-1020
>>> > 1020:  finishing write tid 5 to nodezz25161-1020
>>> > 1020:  finishing write tid 6 to nodezz25161-1020
>>> > update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num
>>> > 1019) dirty exists
>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>> > 1021:  finishing write tid 1 to nodezz25161-1021
>>> > 1021:  finishing write tid 2 to nodezz25161-1021
>>> > 1021:  finishing write tid 3 to nodezz25161-1021
>>> > 1021:  finishing write tid 5 to nodezz25161-1021
>>> > 1021:  finishing write tid 6 to nodezz25161-1021
>>> > update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020)
>>> > dirty exists
>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>> > 1022:  finishing write tid 1 to nodezz25161-1022
>>> > 1022:  finishing write tid 2 to nodezz25161-1022
>>> > 1022:  finishing write tid 3 to nodezz25161-1022
>>> > 1022:  finishing write tid 5 to nodezz25161-1022
>>> > 1022:  finishing write tid 6 to nodezz25161-1022
>>> > update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021)
>>> > dirty exists
>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>> > 1023:  finishing write tid 1 to nodezz25161-1023
>>> > 1023:  finishing write tid 2 to nodezz25161-1023
>>> > 1023:  finishing write tid 3 to nodezz25161-1023
>>> > 1023:  finishing write tid 5 to nodezz25161-1023
>>> > 1023:  finishing write tid 6 to nodezz25161-1023
>>> > update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num
>>> > 1022) dirty exists
>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>> > 1024:  finishing write tid 1 to nodezz25161-1024
>>> > 1024:  finishing write tid 2 to nodezz25161-1024
>>> > 1025: Error: oid 219 read returned error code -2
>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time
>>> > 2016-03-17 10:53:53.071338
>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x76) [0x4db956]
>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>> > 3: (()+0x9791d) [0x7fbb30ff191d]
>>> > 4: (()+0x72519) [0x7fbb30fcc519]
>>> > 5: (()+0x13c178) [0x7fbb31096178]
>>> > 6: (()+0x80a4) [0x7fbb30b2a0a4]
>>> > 7: (clone()+0x6d) [0x7fbb2f4a504d]
>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> > needed to interpret this.
>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>> > Aborted
>>> > ----------------
>>> > Robert LeBlanc
>>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>> >
>>> >
>>> > On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>>> >> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>>> >>> -----BEGIN PGP SIGNED MESSAGE-----
>>> >>> Hash: SHA256
>>> >>>
>>> >>> I'm having trouble finding documentation about using ceph_test_rados. Can I
>>> >>> run this on the existing cluster and will that provide useful info? It seems
>>> >>>  running it in the build will not have the caching set up (vstart.sh).
>>> >>>
>>> >>> I have accepted a job with another company and only have until Wednesday to
>>> >>> help with getting information about this bug. My new job will not be using C
>>> >>> eph, so I won't be able to provide any additional info after Tuesday. I want
>>> >>>  to leave the company on a good trajectory for upgrading, so any input you c
>>> >>> an provide will be helpful.
>>> >>
>>> >> I'm sorry to hear it!  You'll be missed.  :)
>>> >>
>>> >>> I've found:
>>> >>>
>>> >>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50
>>> >>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000
>>> >>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600
>>> >>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op
>>> >>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0
>>> >>>
>>> >>> Is that enough if I change --pool to the cached pool and do the toggling whi
>>> >>> le ceph_test_rados is running? I think this will run for 10 minutes.
>>> >>
>>> >> Precisely.  You can probably drop copy_from and snap ops from the list
>>> >> since your workload wasn't exercising those.
>>> >>
>>> >> Thanks!
>>> >> sage
>>> >>
>>> >>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux