Re: data corruption with hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yep, let me pull and build that branch. I tried installing the dbg
packages and running it in gdb, but it didn't load the symbols.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>> Also, is this ceph_test_rados rewriting objects quickly? I think that
>> the issue is with rewriting objects so if we can tailor the
>> ceph_test_rados to do that, it might be easier to reproduce.
>
> It's doing lots of overwrites, yeah.
>
> I was albe to reproduce--thanks!  It looks like it's specific to
> hammer.  The code was rewritten for jewel so it doesn't affect the
> latest.  The problem is that maybe_handle_cache may proxy the read and
> also still try to handle the same request locally (if it doesn't trigger a
> promote).
>
> Here's my proposed fix:
>
>         https://github.com/ceph/ceph/pull/8187
>
> Do you mind testing this branch?
>
> It doesn't appear to be directly related to flipping between writeback and
> forward, although it may be that we are seeing two unrelated issues.  I
> seemed to be able to trigger it more easily when I flipped modes, but the
> bug itself was a simple issue in the writeback mode logic.  :/
>
> Anyway, please see if this fixes it for you (esp with the RBD workload).
>
> Thanks!
> sage
>
>
>
>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>> > I'll  miss the Ceph community as well. There was a few things I really
>> > wanted to work in with Ceph.
>> >
>> > I got this:
>> >
>> > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028)
>> > dirty exists
>> > 1038:  left oid 13 (ObjNum 1028 snap 0 seq_num 1028)
>> > 1040:  finishing write tid 1 to nodez23350-256
>> > 1040:  finishing write tid 2 to nodez23350-256
>> > 1040:  finishing write tid 3 to nodez23350-256
>> > 1040:  finishing write tid 4 to nodez23350-256
>> > 1040:  finishing write tid 6 to nodez23350-256
>> > 1035: done (4 left)
>> > 1037: done (3 left)
>> > 1038: done (2 left)
>> > 1043: read oid 430 snap -1
>> > 1043:  expect (ObjNum 429 snap 0 seq_num 429)
>> > 1040:  finishing write tid 7 to nodez23350-256
>> > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029)
>> > dirty exists
>> > 1040:  left oid 256 (ObjNum 1029 snap 0 seq_num 1029)
>> > 1042:  expect (ObjNum 664 snap 0 seq_num 664)
>> > 1043: Error: oid 430 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time
>> > 2016-03-17 10:47:19.085414
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7fa1d472191d]
>> > 4: (()+0x72519) [0x7fa1d46fc519]
>> > 5: (()+0x13c178) [0x7fa1d47c6178]
>> > 6: (()+0x80a4) [0x7fa1d425a0a4]
>> > 7: (clone()+0x6d) [0x7fa1d2bd504d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > I had to toggle writeback/forward and min_read_recency_for_promote a
>> > few times to get it, but I don't know if it is because I only have one
>> > job running. Even with six jobs running, it is not easy to trigger
>> > with ceph_test_rados, but it is very instant in the RBD VMs.
>> >
>> > Here are the six run crashes (I have about the last 2000 lines of each
>> > if needed):
>> >
>> > nodev:
>> > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num
>> > 1014) dirty exists
>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>> > 1016:  finishing write tid 1 to nodev21799-1016
>> > 1016:  finishing write tid 2 to nodev21799-1016
>> > 1016:  finishing write tid 3 to nodev21799-1016
>> > 1016:  finishing write tid 4 to nodev21799-1016
>> > 1016:  finishing write tid 6 to nodev21799-1016
>> > 1016:  finishing write tid 7 to nodev21799-1016
>> > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num
>> > 1015) dirty exists
>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>> > 1017:  finishing write tid 1 to nodev21799-1017
>> > 1017:  finishing write tid 2 to nodev21799-1017
>> > 1017:  finishing write tid 3 to nodev21799-1017
>> > 1017:  finishing write tid 5 to nodev21799-1017
>> > 1017:  finishing write tid 6 to nodev21799-1017
>> > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num
>> > 1016) dirty exists
>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>> > 1018:  finishing write tid 1 to nodev21799-1018
>> > 1018:  finishing write tid 2 to nodev21799-1018
>> > 1018:  finishing write tid 3 to nodev21799-1018
>> > 1018:  finishing write tid 4 to nodev21799-1018
>> > 1018:  finishing write tid 6 to nodev21799-1018
>> > 1018:  finishing write tid 7 to nodev21799-1018
>> > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num
>> > 1017) dirty exists
>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>> > 1019:  finishing write tid 1 to nodev21799-1019
>> > 1019:  finishing write tid 2 to nodev21799-1019
>> > 1019:  finishing write tid 3 to nodev21799-1019
>> > 1019:  finishing write tid 5 to nodev21799-1019
>> > 1019:  finishing write tid 6 to nodev21799-1019
>> > update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018)
>> > dirty exists
>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>> > 1021:  finishing write tid 1 to nodev21799-1021
>> > 1020:  finishing write tid 1 to nodev21799-1020
>> > 1020:  finishing write tid 2 to nodev21799-1020
>> > 1020:  finishing write tid 3 to nodev21799-1020
>> > 1020:  finishing write tid 5 to nodev21799-1020
>> > 1020:  finishing write tid 6 to nodev21799-1020
>> > update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num
>> > 1019) dirty exists
>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>> > 1021:  finishing write tid 2 to nodev21799-1021
>> > 1021:  finishing write tid 3 to nodev21799-1021
>> > 1021:  finishing write tid 5 to nodev21799-1021
>> > 1021:  finishing write tid 6 to nodev21799-1021
>> > update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num
>> > 1020) dirty exists
>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>> > 1022:  finishing write tid 1 to nodev21799-1022
>> > 1022:  finishing write tid 2 to nodev21799-1022
>> > 1022:  finishing write tid 3 to nodev21799-1022
>> > 1022:  finishing write tid 5 to nodev21799-1022
>> > 1022:  finishing write tid 6 to nodev21799-1022
>> > update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num
>> > 1021) dirty exists
>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>> > 1023:  finishing write tid 1 to nodev21799-1023
>> > 1023:  finishing write tid 2 to nodev21799-1023
>> > 1023:  finishing write tid 3 to nodev21799-1023
>> > 1023:  finishing write tid 5 to nodev21799-1023
>> > 1023:  finishing write tid 6 to nodev21799-1023
>> > update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num
>> > 1022) dirty exists
>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>> > 1024:  finishing write tid 1 to nodev21799-1024
>> > 1025: Error: oid 219 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time
>> > 2016-03-17 10:53:43.493575
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7f0e015dd91d]
>> > 4: (()+0x72519) [0x7f0e015b8519]
>> > 5: (()+0x13c178) [0x7f0e01682178]
>> > 6: (()+0x80a4) [0x7f0e011160a4]
>> > 7: (clone()+0x6d) [0x7f0dffa9104d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > nodew:
>> > 1117:  expect (ObjNum 8 snap 0 seq_num 8)
>> > 1120:  expect (ObjNum 95 snap 0 seq_num 95)
>> > 1121:  expect (ObjNum 994 snap 0 seq_num 994)
>> > 1113:  expect (ObjNum 362 snap 0 seq_num 362)
>> > 1118:  expect (ObjNum 179 snap 0 seq_num 179)
>> > 1115:  expect (ObjNum 943 snap 0 seq_num 943)
>> > 1119:  expect (ObjNum 250 snap 0 seq_num 250)
>> > 1124:  finishing write tid 1 to nodew21820-361
>> > 1124:  finishing write tid 2 to nodew21820-361
>> > 1124:  finishing write tid 3 to nodew21820-361
>> > 1124:  finishing write tid 4 to nodew21820-361
>> > 1124:  finishing write tid 6 to nodew21820-361
>> > 1124:  finishing write tid 7 to nodew21820-361
>> > update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061)
>> > dirty exists
>> > 1124:  left oid 361 (ObjNum 1061 snap 0 seq_num 1061)
>> > 1125:  finishing write tid 1 to nodew21820-486
>> > 1125:  finishing write tid 2 to nodew21820-486
>> > 1125:  finishing write tid 3 to nodew21820-486
>> > 1125:  finishing write tid 5 to nodew21820-486
>> > 1125:  finishing write tid 6 to nodew21820-486
>> > update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062)
>> > dirty exists
>> > 1125:  left oid 486 (ObjNum 1062 snap 0 seq_num 1062)
>> > 1126:  expect (ObjNum 289 snap 0 seq_num 289)
>> > 1127:  finishing write tid 1 to nodew21820-765
>> > 1127:  finishing write tid 2 to nodew21820-765
>> > 1127:  finishing write tid 3 to nodew21820-765
>> > 1127:  finishing write tid 5 to nodew21820-765
>> > 1127:  finishing write tid 6 to nodew21820-765
>> > update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063)
>> > dirty exists
>> > 1127:  left oid 765 (ObjNum 1063 snap 0 seq_num 1063)
>> > 1128:  finishing write tid 1 to nodew21820-40
>> > 1128:  finishing write tid 2 to nodew21820-40
>> > 1128:  finishing write tid 3 to nodew21820-40
>> > 1128:  finishing write tid 5 to nodew21820-40
>> > 1128:  finishing write tid 6 to nodew21820-40
>> > update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064)
>> > dirty exists
>> > 1128:  left oid 40 (ObjNum 1064 snap 0 seq_num 1064)
>> > 1129:  expect (ObjNum 616 snap 0 seq_num 616)
>> > 1110: done (14 left)
>> > 1113: done (13 left)
>> > 1115: done (12 left)
>> > 1117: done (11 left)
>> > 1118: done (10 left)
>> > 1119: done (9 left)
>> > 1120: done (8 left)
>> > 1121: done (7 left)
>> > 1124: done (6 left)
>> > 1125: done (5 left)
>> > 1126: done (4 left)
>> > 1127: done (3 left)
>> > 1128: done (2 left)
>> > 1129: done (1 left)
>> > 1131: read oid 29 snap -1
>> > 1131:  expect (ObjNum 28 snap 0 seq_num 28)
>> > 1132: read oid 764 snap -1
>> > 1132:  expect (ObjNum 763 snap 0 seq_num 763)
>> > 1133: read oid 469 snap -1
>> > 1133:  expect (ObjNum 468 snap 0 seq_num 468)
>> > 1134: write oid 243 current snap is 0
>> > 1134:  seq_num 1065 ranges
>> > {483354=596553,1514502=531232,2509844=632287,3283353=1}
>> > 1134:  writing nodew21820-243 from 483354 to 1079907 tid 1
>> > 1134:  writing nodew21820-243 from 1514502 to 2045734 tid 2
>> > 1134:  writing nodew21820-243 from 2509844 to 3142131 tid 3
>> > 1134:  writing nodew21820-243 from 3283353 to 3283354 tid 4
>> > 1135: read oid 569 snap -1
>> > 1135:  expect (ObjNum 568 snap 0 seq_num 568)
>> > 1133: Error: oid 469 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time
>> > 2016-03-17 11:00:02.124951
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7fae7a8ca91d]
>> > 4: (()+0x72519) [0x7fae7a8a5519]
>> > 5: (()+0x13c178) [0x7fae7a96f178]
>> > 6: (()+0x80a4) [0x7fae7a4030a4]
>> > 7: (clone()+0x6d) [0x7fae78d7e04d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > nodex:
>> >
>> > 1024:  finishing write tid 1 to nodex22014-1024
>> > 1025:  expect (ObjNum 75 snap 0 seq_num 75)
>> > 1024:  finishing write tid 2 to nodex22014-1024
>> > 1024:  finishing write tid 3 to nodex22014-1024
>> > 1024:  finishing write tid 5 to nodex22014-1024
>> > 1024:  finishing write tid 6 to nodex22014-1024
>> > update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023)
>> > dirty exists
>> > 1024:  left oid 1024 (ObjNum 1023 snap 0 seq_num 1023)
>> > 982: done (44 left)
>> > 983: done (43 left)
>> > 984: done (42 left)
>> > 985: done (41 left)
>> > 986: done (40 left)
>> > 987: done (39 left)
>> > 988: done (38 left)
>> > 989: done (37 left)
>> > 990: done (36 left)
>> > 991: done (35 left)
>> > 992: done (34 left)
>> > 993: done (33 left)
>> > 994: done (32 left)
>> > 995: done (31 left)
>> > 996: done (30 left)
>> > 997: done (29 left)
>> > 998: done (28 left)
>> > 999: done (27 left)
>> > 1000: done (26 left)
>> > 1001: done (25 left)
>> > 1002: done (24 left)
>> > 1003: done (23 left)
>> > 1004: done (22 left)
>> > 1005: done (21 left)
>> > 1006: done (20 left)
>> > 1007: done (19 left)
>> > 1008: done (18 left)
>> > 1009: done (17 left)
>> > 1010: done (16 left)
>> > 1011: done (15 left)
>> > 1012: done (14 left)
>> > 1013: done (13 left)
>> > 1014: done (12 left)
>> > 1015: done (11 left)
>> > 1016: done (10 left)
>> > 1017: done (9 left)
>> > 1018: done (8 left)
>> > 1019: done (7 left)
>> > 1020: done (6 left)
>> > 1021: done (5 left)
>> > 1022: done (4 left)
>> > 1023: done (3 left)
>> > 1024: done (2 left)
>> > 1025: done (1 left)
>> > 1026: done (0 left)
>> > 1027: delete oid 101 current snap is 0
>> > 1027: done (0 left)
>> > 1028: read oid 156 snap -1
>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>> > 1029: read oid 691 snap -1
>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>> > 1030: read oid 282 snap -1
>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>> > 1031: read oid 979 snap -1
>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>> > 1032: read oid 203 snap -1
>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>> > 1033: setattr oid 464 current snap is 0
>> > 1032: Error: oid 203 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time
>> > 2016-03-17 10:53:44.291343
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7faff721191d]
>> > 4: (()+0x72519) [0x7faff71ec519]
>> > 5: (()+0x13c178) [0x7faff72b6178]
>> > 6: (()+0x80a4) [0x7faff6d4a0a4]
>> > 7: (clone()+0x6d) [0x7faff56c504d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > nodey:
>> >
>> > 974: done (52 left)
>> > 975: done (51 left)
>> > 976: done (50 left)
>> > 977: done (49 left)
>> > 978: done (48 left)
>> > 979: done (47 left)
>> > 980: done (46 left)
>> > 981: done (45 left)
>> > 982: done (44 left)
>> > 983: done (43 left)
>> > 984: done (42 left)
>> > 985: done (41 left)
>> > 986: done (40 left)
>> > 987: done (39 left)
>> > 988: done (38 left)
>> > 989: done (37 left)
>> > 990: done (36 left)
>> > 991: done (35 left)
>> > 992: done (34 left)
>> > 993: done (33 left)
>> > 994: done (32 left)
>> > 995: done (31 left)
>> > 996: done (30 left)
>> > 997: done (29 left)
>> > 998: done (28 left)
>> > 999: done (27 left)
>> > 1000: done (26 left)
>> > 1001: done (25 left)
>> > 1002: done (24 left)
>> > 1003: done (23 left)
>> > 1004: done (22 left)
>> > 1005: done (21 left)
>> > 1006: done (20 left)
>> > 1007: done (19 left)
>> > 1008: done (18 left)
>> > 1009: done (17 left)
>> > 1010: done (16 left)
>> > 1011: done (15 left)
>> > 1012: done (14 left)
>> > 1013: done (13 left)
>> > 1014: done (12 left)
>> > 1015: done (11 left)
>> > 1016: done (10 left)
>> > 1017: done (9 left)
>> > 1018: done (8 left)
>> > 1019: done (7 left)
>> > 1020: done (6 left)
>> > 1021: done (5 left)
>> > 1022: done (4 left)
>> > 1023: done (3 left)
>> > 1024: done (2 left)
>> > 1025: done (1 left)
>> > 1026: done (0 left)
>> > 1027: delete oid 101 current snap is 0
>> > 1027: done (0 left)
>> > 1028: read oid 156 snap -1
>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>> > 1029: read oid 691 snap -1
>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>> > 1030: read oid 282 snap -1
>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>> > 1031: read oid 979 snap -1
>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>> > 1032: read oid 203 snap -1
>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>> > 1033: setattr oid 464 current snap is 0
>> > 1028: Error: oid 156 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time
>> > 2016-03-17 10:53:57.082571
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7f5602f8d91d]
>> > 4: (()+0x72519) [0x7f5602f68519]
>> > 5: (()+0x13c178) [0x7f5603032178]
>> > 6: (()+0x80a4) [0x7f5602ac60a4]
>> > 7: (clone()+0x6d) [0x7f560144104d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > nodez:
>> >
>> > 1014: done (11 left)
>> > 1026: delete oid 717 current snap is 0
>> > 1015:  finishing write tid 2 to nodez24249-1015
>> > 1015:  finishing write tid 4 to nodez24249-1015
>> > 1015:  finishing write tid 5 to nodez24249-1015
>> > update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num
>> > 1014) dirty exists
>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>> > 1016:  finishing write tid 1 to nodez24249-1016
>> > 1016:  finishing write tid 2 to nodez24249-1016
>> > 1016:  finishing write tid 3 to nodez24249-1016
>> > 1016:  finishing write tid 4 to nodez24249-1016
>> > 1016:  finishing write tid 6 to nodez24249-1016
>> > 1016:  finishing write tid 7 to nodez24249-1016
>> > update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num
>> > 1015) dirty exists
>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>> > 1017:  finishing write tid 1 to nodez24249-1017
>> > 1017:  finishing write tid 2 to nodez24249-1017
>> > 1017:  finishing write tid 3 to nodez24249-1017
>> > 1017:  finishing write tid 5 to nodez24249-1017
>> > 1017:  finishing write tid 6 to nodez24249-1017
>> > update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num
>> > 1016) dirty exists
>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>> > 1018:  finishing write tid 1 to nodez24249-1018
>> > 1018:  finishing write tid 2 to nodez24249-1018
>> > 1018:  finishing write tid 3 to nodez24249-1018
>> > 1018:  finishing write tid 4 to nodez24249-1018
>> > 1018:  finishing write tid 6 to nodez24249-1018
>> > 1018:  finishing write tid 7 to nodez24249-1018
>> > update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num
>> > 1017) dirty exists
>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>> > 1019:  finishing write tid 1 to nodez24249-1019
>> > 1019:  finishing write tid 2 to nodez24249-1019
>> > 1019:  finishing write tid 3 to nodez24249-1019
>> > 1019:  finishing write tid 5 to nodez24249-1019
>> > 1019:  finishing write tid 6 to nodez24249-1019
>> > update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018)
>> > dirty exists
>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>> > 1020:  finishing write tid 1 to nodez24249-1020
>> > 1020:  finishing write tid 2 to nodez24249-1020
>> > 1020:  finishing write tid 3 to nodez24249-1020
>> > 1020:  finishing write tid 5 to nodez24249-1020
>> > 1020:  finishing write tid 6 to nodez24249-1020
>> > update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019)
>> > dirty exists
>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>> > 1021:  finishing write tid 1 to nodez24249-1021
>> > 1021:  finishing write tid 2 to nodez24249-1021
>> > 1021:  finishing write tid 3 to nodez24249-1021
>> > 1021:  finishing write tid 5 to nodez24249-1021
>> > 1021:  finishing write tid 6 to nodez24249-1021
>> > update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num
>> > 1020) dirty exists
>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>> > 1022:  finishing write tid 1 to nodez24249-1022
>> > 1022:  finishing write tid 2 to nodez24249-1022
>> > 1022:  finishing write tid 3 to nodez24249-1022
>> > 1022:  finishing write tid 5 to nodez24249-1022
>> > 1022:  finishing write tid 6 to nodez24249-1022
>> > update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021)
>> > dirty exists
>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>> > 1023:  finishing write tid 1 to nodez24249-1023
>> > 1023:  finishing write tid 2 to nodez24249-1023
>> > 1023:  finishing write tid 3 to nodez24249-1023
>> > 1023:  finishing write tid 5 to nodez24249-1023
>> > 1023:  finishing write tid 6 to nodez24249-1023
>> > update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num
>> > 1022) dirty exists
>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>> > 1024:  finishing write tid 1 to nodez24249-1024
>> > 1025: Error: oid 230 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time
>> > 2016-03-17 10:53:41.757921
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> >  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> >  2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> >  3: (()+0x9791d) [0x7fd9d088d91d]
>> >  4: (()+0x72519) [0x7fd9d0868519]
>> >  5: (()+0x13c178) [0x7fd9d0932178]
>> >  6: (()+0x80a4) [0x7fd9d03c60a4]
>> >  7: (clone()+0x6d) [0x7fd9ced4104d]
>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> >
>> > nodezz:
>> >
>> > 1015:  finishing write tid 1 to nodezz25161-1015
>> > 1015:  finishing write tid 2 to nodezz25161-1015
>> > 1015:  finishing write tid 4 to nodezz25161-1015
>> > 1015:  finishing write tid 5 to nodezz25161-1015
>> > update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014)
>> > dirty exists
>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>> > 1016:  finishing write tid 1 to nodezz25161-1016
>> > 1016:  finishing write tid 2 to nodezz25161-1016
>> > 1016:  finishing write tid 3 to nodezz25161-1016
>> > 1016:  finishing write tid 4 to nodezz25161-1016
>> > 1016:  finishing write tid 6 to nodezz25161-1016
>> > 1016:  finishing write tid 7 to nodezz25161-1016
>> > update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num
>> > 1015) dirty exists
>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>> > 1017:  finishing write tid 1 to nodezz25161-1017
>> > 1017:  finishing write tid 2 to nodezz25161-1017
>> > 1017:  finishing write tid 3 to nodezz25161-1017
>> > 1017:  finishing write tid 5 to nodezz25161-1017
>> > 1017:  finishing write tid 6 to nodezz25161-1017
>> > update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num
>> > 1016) dirty exists
>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>> > 1018:  finishing write tid 1 to nodezz25161-1018
>> > 1018:  finishing write tid 2 to nodezz25161-1018
>> > 1018:  finishing write tid 3 to nodezz25161-1018
>> > 1018:  finishing write tid 4 to nodezz25161-1018
>> > 1018:  finishing write tid 6 to nodezz25161-1018
>> > 1018:  finishing write tid 7 to nodezz25161-1018
>> > update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num
>> > 1017) dirty exists
>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>> > 1019:  finishing write tid 1 to nodezz25161-1019
>> > 1019:  finishing write tid 2 to nodezz25161-1019
>> > 1019:  finishing write tid 3 to nodezz25161-1019
>> > 1019:  finishing write tid 5 to nodezz25161-1019
>> > 1019:  finishing write tid 6 to nodezz25161-1019
>> > update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num
>> > 1018) dirty exists
>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>> > 1020:  finishing write tid 1 to nodezz25161-1020
>> > 1020:  finishing write tid 2 to nodezz25161-1020
>> > 1020:  finishing write tid 3 to nodezz25161-1020
>> > 1020:  finishing write tid 5 to nodezz25161-1020
>> > 1020:  finishing write tid 6 to nodezz25161-1020
>> > update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num
>> > 1019) dirty exists
>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>> > 1021:  finishing write tid 1 to nodezz25161-1021
>> > 1021:  finishing write tid 2 to nodezz25161-1021
>> > 1021:  finishing write tid 3 to nodezz25161-1021
>> > 1021:  finishing write tid 5 to nodezz25161-1021
>> > 1021:  finishing write tid 6 to nodezz25161-1021
>> > update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020)
>> > dirty exists
>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>> > 1022:  finishing write tid 1 to nodezz25161-1022
>> > 1022:  finishing write tid 2 to nodezz25161-1022
>> > 1022:  finishing write tid 3 to nodezz25161-1022
>> > 1022:  finishing write tid 5 to nodezz25161-1022
>> > 1022:  finishing write tid 6 to nodezz25161-1022
>> > update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021)
>> > dirty exists
>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>> > 1023:  finishing write tid 1 to nodezz25161-1023
>> > 1023:  finishing write tid 2 to nodezz25161-1023
>> > 1023:  finishing write tid 3 to nodezz25161-1023
>> > 1023:  finishing write tid 5 to nodezz25161-1023
>> > 1023:  finishing write tid 6 to nodezz25161-1023
>> > update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num
>> > 1022) dirty exists
>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>> > 1024:  finishing write tid 1 to nodezz25161-1024
>> > 1024:  finishing write tid 2 to nodezz25161-1024
>> > 1025: Error: oid 219 read returned error code -2
>> > ./test/osd/RadosModel.h: In function 'virtual void
>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time
>> > 2016-03-17 10:53:53.071338
>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x76) [0x4db956]
>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>> > 3: (()+0x9791d) [0x7fbb30ff191d]
>> > 4: (()+0x72519) [0x7fbb30fcc519]
>> > 5: (()+0x13c178) [0x7fbb31096178]
>> > 6: (()+0x80a4) [0x7fbb30b2a0a4]
>> > 7: (clone()+0x6d) [0x7fbb2f4a504d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>> > Aborted
>> > ----------------
>> > Robert LeBlanc
>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >
>> >
>> > On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> >> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>> >>> -----BEGIN PGP SIGNED MESSAGE-----
>> >>> Hash: SHA256
>> >>>
>> >>> I'm having trouble finding documentation about using ceph_test_rados. Can I
>> >>> run this on the existing cluster and will that provide useful info? It seems
>> >>>  running it in the build will not have the caching set up (vstart.sh).
>> >>>
>> >>> I have accepted a job with another company and only have until Wednesday to
>> >>> help with getting information about this bug. My new job will not be using C
>> >>> eph, so I won't be able to provide any additional info after Tuesday. I want
>> >>>  to leave the company on a good trajectory for upgrading, so any input you c
>> >>> an provide will be helpful.
>> >>
>> >> I'm sorry to hear it!  You'll be missed.  :)
>> >>
>> >>> I've found:
>> >>>
>> >>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50
>> >>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000
>> >>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600
>> >>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op
>> >>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0
>> >>>
>> >>> Is that enough if I change --pool to the cached pool and do the toggling whi
>> >>> le ceph_test_rados is running? I think this will run for 10 minutes.
>> >>
>> >> Precisely.  You can probably drop copy_from and snap ops from the list
>> >> since your workload wasn't exercising those.
>> >>
>> >> Thanks!
>> >> sage
>> >>
>> >>
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux