Re: [ceph-users] data corruption with hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, is this ceph_test_rados rewriting objects quickly? I think that
the issue is with rewriting objects so if we can tailor the
ceph_test_rados to do that, it might be easier to reproduce.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> I'll  miss the Ceph community as well. There was a few things I really
> wanted to work in with Ceph.
>
> I got this:
>
> update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028)
> dirty exists
> 1038:  left oid 13 (ObjNum 1028 snap 0 seq_num 1028)
> 1040:  finishing write tid 1 to nodez23350-256
> 1040:  finishing write tid 2 to nodez23350-256
> 1040:  finishing write tid 3 to nodez23350-256
> 1040:  finishing write tid 4 to nodez23350-256
> 1040:  finishing write tid 6 to nodez23350-256
> 1035: done (4 left)
> 1037: done (3 left)
> 1038: done (2 left)
> 1043: read oid 430 snap -1
> 1043:  expect (ObjNum 429 snap 0 seq_num 429)
> 1040:  finishing write tid 7 to nodez23350-256
> update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029)
> dirty exists
> 1040:  left oid 256 (ObjNum 1029 snap 0 seq_num 1029)
> 1042:  expect (ObjNum 664 snap 0 seq_num 664)
> 1043: Error: oid 430 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time
> 2016-03-17 10:47:19.085414
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7fa1d472191d]
> 4: (()+0x72519) [0x7fa1d46fc519]
> 5: (()+0x13c178) [0x7fa1d47c6178]
> 6: (()+0x80a4) [0x7fa1d425a0a4]
> 7: (clone()+0x6d) [0x7fa1d2bd504d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> I had to toggle writeback/forward and min_read_recency_for_promote a
> few times to get it, but I don't know if it is because I only have one
> job running. Even with six jobs running, it is not easy to trigger
> with ceph_test_rados, but it is very instant in the RBD VMs.
>
> Here are the six run crashes (I have about the last 2000 lines of each
> if needed):
>
> nodev:
> update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num
> 1014) dirty exists
> 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
> 1016:  finishing write tid 1 to nodev21799-1016
> 1016:  finishing write tid 2 to nodev21799-1016
> 1016:  finishing write tid 3 to nodev21799-1016
> 1016:  finishing write tid 4 to nodev21799-1016
> 1016:  finishing write tid 6 to nodev21799-1016
> 1016:  finishing write tid 7 to nodev21799-1016
> update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num
> 1015) dirty exists
> 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
> 1017:  finishing write tid 1 to nodev21799-1017
> 1017:  finishing write tid 2 to nodev21799-1017
> 1017:  finishing write tid 3 to nodev21799-1017
> 1017:  finishing write tid 5 to nodev21799-1017
> 1017:  finishing write tid 6 to nodev21799-1017
> update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num
> 1016) dirty exists
> 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
> 1018:  finishing write tid 1 to nodev21799-1018
> 1018:  finishing write tid 2 to nodev21799-1018
> 1018:  finishing write tid 3 to nodev21799-1018
> 1018:  finishing write tid 4 to nodev21799-1018
> 1018:  finishing write tid 6 to nodev21799-1018
> 1018:  finishing write tid 7 to nodev21799-1018
> update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num
> 1017) dirty exists
> 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
> 1019:  finishing write tid 1 to nodev21799-1019
> 1019:  finishing write tid 2 to nodev21799-1019
> 1019:  finishing write tid 3 to nodev21799-1019
> 1019:  finishing write tid 5 to nodev21799-1019
> 1019:  finishing write tid 6 to nodev21799-1019
> update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018)
> dirty exists
> 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
> 1021:  finishing write tid 1 to nodev21799-1021
> 1020:  finishing write tid 1 to nodev21799-1020
> 1020:  finishing write tid 2 to nodev21799-1020
> 1020:  finishing write tid 3 to nodev21799-1020
> 1020:  finishing write tid 5 to nodev21799-1020
> 1020:  finishing write tid 6 to nodev21799-1020
> update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num
> 1019) dirty exists
> 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
> 1021:  finishing write tid 2 to nodev21799-1021
> 1021:  finishing write tid 3 to nodev21799-1021
> 1021:  finishing write tid 5 to nodev21799-1021
> 1021:  finishing write tid 6 to nodev21799-1021
> update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num
> 1020) dirty exists
> 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
> 1022:  finishing write tid 1 to nodev21799-1022
> 1022:  finishing write tid 2 to nodev21799-1022
> 1022:  finishing write tid 3 to nodev21799-1022
> 1022:  finishing write tid 5 to nodev21799-1022
> 1022:  finishing write tid 6 to nodev21799-1022
> update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num
> 1021) dirty exists
> 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
> 1023:  finishing write tid 1 to nodev21799-1023
> 1023:  finishing write tid 2 to nodev21799-1023
> 1023:  finishing write tid 3 to nodev21799-1023
> 1023:  finishing write tid 5 to nodev21799-1023
> 1023:  finishing write tid 6 to nodev21799-1023
> update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num
> 1022) dirty exists
> 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
> 1024:  finishing write tid 1 to nodev21799-1024
> 1025: Error: oid 219 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time
> 2016-03-17 10:53:43.493575
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7f0e015dd91d]
> 4: (()+0x72519) [0x7f0e015b8519]
> 5: (()+0x13c178) [0x7f0e01682178]
> 6: (()+0x80a4) [0x7f0e011160a4]
> 7: (clone()+0x6d) [0x7f0dffa9104d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> nodew:
> 1117:  expect (ObjNum 8 snap 0 seq_num 8)
> 1120:  expect (ObjNum 95 snap 0 seq_num 95)
> 1121:  expect (ObjNum 994 snap 0 seq_num 994)
> 1113:  expect (ObjNum 362 snap 0 seq_num 362)
> 1118:  expect (ObjNum 179 snap 0 seq_num 179)
> 1115:  expect (ObjNum 943 snap 0 seq_num 943)
> 1119:  expect (ObjNum 250 snap 0 seq_num 250)
> 1124:  finishing write tid 1 to nodew21820-361
> 1124:  finishing write tid 2 to nodew21820-361
> 1124:  finishing write tid 3 to nodew21820-361
> 1124:  finishing write tid 4 to nodew21820-361
> 1124:  finishing write tid 6 to nodew21820-361
> 1124:  finishing write tid 7 to nodew21820-361
> update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061)
> dirty exists
> 1124:  left oid 361 (ObjNum 1061 snap 0 seq_num 1061)
> 1125:  finishing write tid 1 to nodew21820-486
> 1125:  finishing write tid 2 to nodew21820-486
> 1125:  finishing write tid 3 to nodew21820-486
> 1125:  finishing write tid 5 to nodew21820-486
> 1125:  finishing write tid 6 to nodew21820-486
> update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062)
> dirty exists
> 1125:  left oid 486 (ObjNum 1062 snap 0 seq_num 1062)
> 1126:  expect (ObjNum 289 snap 0 seq_num 289)
> 1127:  finishing write tid 1 to nodew21820-765
> 1127:  finishing write tid 2 to nodew21820-765
> 1127:  finishing write tid 3 to nodew21820-765
> 1127:  finishing write tid 5 to nodew21820-765
> 1127:  finishing write tid 6 to nodew21820-765
> update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063)
> dirty exists
> 1127:  left oid 765 (ObjNum 1063 snap 0 seq_num 1063)
> 1128:  finishing write tid 1 to nodew21820-40
> 1128:  finishing write tid 2 to nodew21820-40
> 1128:  finishing write tid 3 to nodew21820-40
> 1128:  finishing write tid 5 to nodew21820-40
> 1128:  finishing write tid 6 to nodew21820-40
> update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064)
> dirty exists
> 1128:  left oid 40 (ObjNum 1064 snap 0 seq_num 1064)
> 1129:  expect (ObjNum 616 snap 0 seq_num 616)
> 1110: done (14 left)
> 1113: done (13 left)
> 1115: done (12 left)
> 1117: done (11 left)
> 1118: done (10 left)
> 1119: done (9 left)
> 1120: done (8 left)
> 1121: done (7 left)
> 1124: done (6 left)
> 1125: done (5 left)
> 1126: done (4 left)
> 1127: done (3 left)
> 1128: done (2 left)
> 1129: done (1 left)
> 1131: read oid 29 snap -1
> 1131:  expect (ObjNum 28 snap 0 seq_num 28)
> 1132: read oid 764 snap -1
> 1132:  expect (ObjNum 763 snap 0 seq_num 763)
> 1133: read oid 469 snap -1
> 1133:  expect (ObjNum 468 snap 0 seq_num 468)
> 1134: write oid 243 current snap is 0
> 1134:  seq_num 1065 ranges
> {483354=596553,1514502=531232,2509844=632287,3283353=1}
> 1134:  writing nodew21820-243 from 483354 to 1079907 tid 1
> 1134:  writing nodew21820-243 from 1514502 to 2045734 tid 2
> 1134:  writing nodew21820-243 from 2509844 to 3142131 tid 3
> 1134:  writing nodew21820-243 from 3283353 to 3283354 tid 4
> 1135: read oid 569 snap -1
> 1135:  expect (ObjNum 568 snap 0 seq_num 568)
> 1133: Error: oid 469 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time
> 2016-03-17 11:00:02.124951
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7fae7a8ca91d]
> 4: (()+0x72519) [0x7fae7a8a5519]
> 5: (()+0x13c178) [0x7fae7a96f178]
> 6: (()+0x80a4) [0x7fae7a4030a4]
> 7: (clone()+0x6d) [0x7fae78d7e04d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> nodex:
>
> 1024:  finishing write tid 1 to nodex22014-1024
> 1025:  expect (ObjNum 75 snap 0 seq_num 75)
> 1024:  finishing write tid 2 to nodex22014-1024
> 1024:  finishing write tid 3 to nodex22014-1024
> 1024:  finishing write tid 5 to nodex22014-1024
> 1024:  finishing write tid 6 to nodex22014-1024
> update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023)
> dirty exists
> 1024:  left oid 1024 (ObjNum 1023 snap 0 seq_num 1023)
> 982: done (44 left)
> 983: done (43 left)
> 984: done (42 left)
> 985: done (41 left)
> 986: done (40 left)
> 987: done (39 left)
> 988: done (38 left)
> 989: done (37 left)
> 990: done (36 left)
> 991: done (35 left)
> 992: done (34 left)
> 993: done (33 left)
> 994: done (32 left)
> 995: done (31 left)
> 996: done (30 left)
> 997: done (29 left)
> 998: done (28 left)
> 999: done (27 left)
> 1000: done (26 left)
> 1001: done (25 left)
> 1002: done (24 left)
> 1003: done (23 left)
> 1004: done (22 left)
> 1005: done (21 left)
> 1006: done (20 left)
> 1007: done (19 left)
> 1008: done (18 left)
> 1009: done (17 left)
> 1010: done (16 left)
> 1011: done (15 left)
> 1012: done (14 left)
> 1013: done (13 left)
> 1014: done (12 left)
> 1015: done (11 left)
> 1016: done (10 left)
> 1017: done (9 left)
> 1018: done (8 left)
> 1019: done (7 left)
> 1020: done (6 left)
> 1021: done (5 left)
> 1022: done (4 left)
> 1023: done (3 left)
> 1024: done (2 left)
> 1025: done (1 left)
> 1026: done (0 left)
> 1027: delete oid 101 current snap is 0
> 1027: done (0 left)
> 1028: read oid 156 snap -1
> 1028:  expect (ObjNum 155 snap 0 seq_num 155)
> 1029: read oid 691 snap -1
> 1029:  expect (ObjNum 690 snap 0 seq_num 690)
> 1030: read oid 282 snap -1
> 1030:  expect (ObjNum 281 snap 0 seq_num 281)
> 1031: read oid 979 snap -1
> 1031:  expect (ObjNum 978 snap 0 seq_num 978)
> 1032: read oid 203 snap -1
> 1032:  expect (ObjNum 202 snap 0 seq_num 202)
> 1033: setattr oid 464 current snap is 0
> 1032: Error: oid 203 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time
> 2016-03-17 10:53:44.291343
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7faff721191d]
> 4: (()+0x72519) [0x7faff71ec519]
> 5: (()+0x13c178) [0x7faff72b6178]
> 6: (()+0x80a4) [0x7faff6d4a0a4]
> 7: (clone()+0x6d) [0x7faff56c504d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> nodey:
>
> 974: done (52 left)
> 975: done (51 left)
> 976: done (50 left)
> 977: done (49 left)
> 978: done (48 left)
> 979: done (47 left)
> 980: done (46 left)
> 981: done (45 left)
> 982: done (44 left)
> 983: done (43 left)
> 984: done (42 left)
> 985: done (41 left)
> 986: done (40 left)
> 987: done (39 left)
> 988: done (38 left)
> 989: done (37 left)
> 990: done (36 left)
> 991: done (35 left)
> 992: done (34 left)
> 993: done (33 left)
> 994: done (32 left)
> 995: done (31 left)
> 996: done (30 left)
> 997: done (29 left)
> 998: done (28 left)
> 999: done (27 left)
> 1000: done (26 left)
> 1001: done (25 left)
> 1002: done (24 left)
> 1003: done (23 left)
> 1004: done (22 left)
> 1005: done (21 left)
> 1006: done (20 left)
> 1007: done (19 left)
> 1008: done (18 left)
> 1009: done (17 left)
> 1010: done (16 left)
> 1011: done (15 left)
> 1012: done (14 left)
> 1013: done (13 left)
> 1014: done (12 left)
> 1015: done (11 left)
> 1016: done (10 left)
> 1017: done (9 left)
> 1018: done (8 left)
> 1019: done (7 left)
> 1020: done (6 left)
> 1021: done (5 left)
> 1022: done (4 left)
> 1023: done (3 left)
> 1024: done (2 left)
> 1025: done (1 left)
> 1026: done (0 left)
> 1027: delete oid 101 current snap is 0
> 1027: done (0 left)
> 1028: read oid 156 snap -1
> 1028:  expect (ObjNum 155 snap 0 seq_num 155)
> 1029: read oid 691 snap -1
> 1029:  expect (ObjNum 690 snap 0 seq_num 690)
> 1030: read oid 282 snap -1
> 1030:  expect (ObjNum 281 snap 0 seq_num 281)
> 1031: read oid 979 snap -1
> 1031:  expect (ObjNum 978 snap 0 seq_num 978)
> 1032: read oid 203 snap -1
> 1032:  expect (ObjNum 202 snap 0 seq_num 202)
> 1033: setattr oid 464 current snap is 0
> 1028: Error: oid 156 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time
> 2016-03-17 10:53:57.082571
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7f5602f8d91d]
> 4: (()+0x72519) [0x7f5602f68519]
> 5: (()+0x13c178) [0x7f5603032178]
> 6: (()+0x80a4) [0x7f5602ac60a4]
> 7: (clone()+0x6d) [0x7f560144104d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> nodez:
>
> 1014: done (11 left)
> 1026: delete oid 717 current snap is 0
> 1015:  finishing write tid 2 to nodez24249-1015
> 1015:  finishing write tid 4 to nodez24249-1015
> 1015:  finishing write tid 5 to nodez24249-1015
> update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num
> 1014) dirty exists
> 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
> 1016:  finishing write tid 1 to nodez24249-1016
> 1016:  finishing write tid 2 to nodez24249-1016
> 1016:  finishing write tid 3 to nodez24249-1016
> 1016:  finishing write tid 4 to nodez24249-1016
> 1016:  finishing write tid 6 to nodez24249-1016
> 1016:  finishing write tid 7 to nodez24249-1016
> update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num
> 1015) dirty exists
> 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
> 1017:  finishing write tid 1 to nodez24249-1017
> 1017:  finishing write tid 2 to nodez24249-1017
> 1017:  finishing write tid 3 to nodez24249-1017
> 1017:  finishing write tid 5 to nodez24249-1017
> 1017:  finishing write tid 6 to nodez24249-1017
> update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num
> 1016) dirty exists
> 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
> 1018:  finishing write tid 1 to nodez24249-1018
> 1018:  finishing write tid 2 to nodez24249-1018
> 1018:  finishing write tid 3 to nodez24249-1018
> 1018:  finishing write tid 4 to nodez24249-1018
> 1018:  finishing write tid 6 to nodez24249-1018
> 1018:  finishing write tid 7 to nodez24249-1018
> update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num
> 1017) dirty exists
> 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
> 1019:  finishing write tid 1 to nodez24249-1019
> 1019:  finishing write tid 2 to nodez24249-1019
> 1019:  finishing write tid 3 to nodez24249-1019
> 1019:  finishing write tid 5 to nodez24249-1019
> 1019:  finishing write tid 6 to nodez24249-1019
> update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018)
> dirty exists
> 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
> 1020:  finishing write tid 1 to nodez24249-1020
> 1020:  finishing write tid 2 to nodez24249-1020
> 1020:  finishing write tid 3 to nodez24249-1020
> 1020:  finishing write tid 5 to nodez24249-1020
> 1020:  finishing write tid 6 to nodez24249-1020
> update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019)
> dirty exists
> 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
> 1021:  finishing write tid 1 to nodez24249-1021
> 1021:  finishing write tid 2 to nodez24249-1021
> 1021:  finishing write tid 3 to nodez24249-1021
> 1021:  finishing write tid 5 to nodez24249-1021
> 1021:  finishing write tid 6 to nodez24249-1021
> update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num
> 1020) dirty exists
> 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
> 1022:  finishing write tid 1 to nodez24249-1022
> 1022:  finishing write tid 2 to nodez24249-1022
> 1022:  finishing write tid 3 to nodez24249-1022
> 1022:  finishing write tid 5 to nodez24249-1022
> 1022:  finishing write tid 6 to nodez24249-1022
> update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021)
> dirty exists
> 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
> 1023:  finishing write tid 1 to nodez24249-1023
> 1023:  finishing write tid 2 to nodez24249-1023
> 1023:  finishing write tid 3 to nodez24249-1023
> 1023:  finishing write tid 5 to nodez24249-1023
> 1023:  finishing write tid 6 to nodez24249-1023
> update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num
> 1022) dirty exists
> 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
> 1024:  finishing write tid 1 to nodez24249-1024
> 1025: Error: oid 230 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time
> 2016-03-17 10:53:41.757921
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
>  2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>  3: (()+0x9791d) [0x7fd9d088d91d]
>  4: (()+0x72519) [0x7fd9d0868519]
>  5: (()+0x13c178) [0x7fd9d0932178]
>  6: (()+0x80a4) [0x7fd9d03c60a4]
>  7: (clone()+0x6d) [0x7fd9ced4104d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
>
> nodezz:
>
> 1015:  finishing write tid 1 to nodezz25161-1015
> 1015:  finishing write tid 2 to nodezz25161-1015
> 1015:  finishing write tid 4 to nodezz25161-1015
> 1015:  finishing write tid 5 to nodezz25161-1015
> update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014)
> dirty exists
> 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
> 1016:  finishing write tid 1 to nodezz25161-1016
> 1016:  finishing write tid 2 to nodezz25161-1016
> 1016:  finishing write tid 3 to nodezz25161-1016
> 1016:  finishing write tid 4 to nodezz25161-1016
> 1016:  finishing write tid 6 to nodezz25161-1016
> 1016:  finishing write tid 7 to nodezz25161-1016
> update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num
> 1015) dirty exists
> 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
> 1017:  finishing write tid 1 to nodezz25161-1017
> 1017:  finishing write tid 2 to nodezz25161-1017
> 1017:  finishing write tid 3 to nodezz25161-1017
> 1017:  finishing write tid 5 to nodezz25161-1017
> 1017:  finishing write tid 6 to nodezz25161-1017
> update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num
> 1016) dirty exists
> 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
> 1018:  finishing write tid 1 to nodezz25161-1018
> 1018:  finishing write tid 2 to nodezz25161-1018
> 1018:  finishing write tid 3 to nodezz25161-1018
> 1018:  finishing write tid 4 to nodezz25161-1018
> 1018:  finishing write tid 6 to nodezz25161-1018
> 1018:  finishing write tid 7 to nodezz25161-1018
> update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num
> 1017) dirty exists
> 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
> 1019:  finishing write tid 1 to nodezz25161-1019
> 1019:  finishing write tid 2 to nodezz25161-1019
> 1019:  finishing write tid 3 to nodezz25161-1019
> 1019:  finishing write tid 5 to nodezz25161-1019
> 1019:  finishing write tid 6 to nodezz25161-1019
> update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num
> 1018) dirty exists
> 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
> 1020:  finishing write tid 1 to nodezz25161-1020
> 1020:  finishing write tid 2 to nodezz25161-1020
> 1020:  finishing write tid 3 to nodezz25161-1020
> 1020:  finishing write tid 5 to nodezz25161-1020
> 1020:  finishing write tid 6 to nodezz25161-1020
> update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num
> 1019) dirty exists
> 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
> 1021:  finishing write tid 1 to nodezz25161-1021
> 1021:  finishing write tid 2 to nodezz25161-1021
> 1021:  finishing write tid 3 to nodezz25161-1021
> 1021:  finishing write tid 5 to nodezz25161-1021
> 1021:  finishing write tid 6 to nodezz25161-1021
> update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020)
> dirty exists
> 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
> 1022:  finishing write tid 1 to nodezz25161-1022
> 1022:  finishing write tid 2 to nodezz25161-1022
> 1022:  finishing write tid 3 to nodezz25161-1022
> 1022:  finishing write tid 5 to nodezz25161-1022
> 1022:  finishing write tid 6 to nodezz25161-1022
> update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021)
> dirty exists
> 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
> 1023:  finishing write tid 1 to nodezz25161-1023
> 1023:  finishing write tid 2 to nodezz25161-1023
> 1023:  finishing write tid 3 to nodezz25161-1023
> 1023:  finishing write tid 5 to nodezz25161-1023
> 1023:  finishing write tid 6 to nodezz25161-1023
> update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num
> 1022) dirty exists
> 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
> 1024:  finishing write tid 1 to nodezz25161-1024
> 1024:  finishing write tid 2 to nodezz25161-1024
> 1025: Error: oid 219 read returned error code -2
> ./test/osd/RadosModel.h: In function 'virtual void
> ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time
> 2016-03-17 10:53:53.071338
> ./test/osd/RadosModel.h: 1109: FAILED assert(0)
> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x76) [0x4db956]
> 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
> 3: (()+0x9791d) [0x7fbb30ff191d]
> 4: (()+0x72519) [0x7fbb30fcc519]
> 5: (()+0x13c178) [0x7fbb31096178]
> 6: (()+0x80a4) [0x7fbb30b2a0a4]
> 7: (clone()+0x6d) [0x7fbb2f4a504d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> I'm having trouble finding documentation about using ceph_test_rados. Can I
>>> run this on the existing cluster and will that provide useful info? It seems
>>>  running it in the build will not have the caching set up (vstart.sh).
>>>
>>> I have accepted a job with another company and only have until Wednesday to
>>> help with getting information about this bug. My new job will not be using C
>>> eph, so I won't be able to provide any additional info after Tuesday. I want
>>>  to leave the company on a good trajectory for upgrading, so any input you c
>>> an provide will be helpful.
>>
>> I'm sorry to hear it!  You'll be missed.  :)
>>
>>> I've found:
>>>
>>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50
>>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000
>>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600
>>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op
>>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0
>>>
>>> Is that enough if I change --pool to the cached pool and do the toggling whi
>>> le ceph_test_rados is running? I think this will run for 10 minutes.
>>
>> Precisely.  You can probably drop copy_from and snap ops from the list
>> since your workload wasn't exercising those.
>>
>> Thanks!
>> sage
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux