Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just building your branch. I'm not sure what the difference between your branch and 0.94.6 is, I don't see any commits against osd/ReplicatedPG.cc in the last 5 months other than the one you did today. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:38 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > Yep, let me pull and build that branch. I tried installing the dbg > packages and running it in gdb, but it didn't load the symbols. > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> On Thu, 17 Mar 2016, Robert LeBlanc wrote: >>> Also, is this ceph_test_rados rewriting objects quickly? I think that >>> the issue is with rewriting objects so if we can tailor the >>> ceph_test_rados to do that, it might be easier to reproduce. >> >> It's doing lots of overwrites, yeah. >> >> I was albe to reproduce--thanks! It looks like it's specific to >> hammer. The code was rewritten for jewel so it doesn't affect the >> latest. The problem is that maybe_handle_cache may proxy the read and >> also still try to handle the same request locally (if it doesn't trigger a >> promote). >> >> Here's my proposed fix: >> >> https://github.com/ceph/ceph/pull/8187 >> >> Do you mind testing this branch? >> >> It doesn't appear to be directly related to flipping between writeback and >> forward, although it may be that we are seeing two unrelated issues. I >> seemed to be able to trigger it more easily when I flipped modes, but the >> bug itself was a simple issue in the writeback mode logic. :/ >> >> Anyway, please see if this fixes it for you (esp with the RBD workload). >> >> Thanks! >> sage >> >> >> >> >>> ---------------- >>> Robert LeBlanc >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> >>> >>> On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >>> > I'll miss the Ceph community as well. There was a few things I really >>> > wanted to work in with Ceph. >>> > >>> > I got this: >>> > >>> > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028) >>> > dirty exists >>> > 1038: left oid 13 (ObjNum 1028 snap 0 seq_num 1028) >>> > 1040: finishing write tid 1 to nodez23350-256 >>> > 1040: finishing write tid 2 to nodez23350-256 >>> > 1040: finishing write tid 3 to nodez23350-256 >>> > 1040: finishing write tid 4 to nodez23350-256 >>> > 1040: finishing write tid 6 to nodez23350-256 >>> > 1035: done (4 left) >>> > 1037: done (3 left) >>> > 1038: done (2 left) >>> > 1043: read oid 430 snap -1 >>> > 1043: expect (ObjNum 429 snap 0 seq_num 429) >>> > 1040: finishing write tid 7 to nodez23350-256 >>> > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029) >>> > dirty exists >>> > 1040: left oid 256 (ObjNum 1029 snap 0 seq_num 1029) >>> > 1042: expect (ObjNum 664 snap 0 seq_num 664) >>> > 1043: Error: oid 430 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time >>> > 2016-03-17 10:47:19.085414 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7fa1d472191d] >>> > 4: (()+0x72519) [0x7fa1d46fc519] >>> > 5: (()+0x13c178) [0x7fa1d47c6178] >>> > 6: (()+0x80a4) [0x7fa1d425a0a4] >>> > 7: (clone()+0x6d) [0x7fa1d2bd504d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > I had to toggle writeback/forward and min_read_recency_for_promote a >>> > few times to get it, but I don't know if it is because I only have one >>> > job running. Even with six jobs running, it is not easy to trigger >>> > with ceph_test_rados, but it is very instant in the RBD VMs. >>> > >>> > Here are the six run crashes (I have about the last 2000 lines of each >>> > if needed): >>> > >>> > nodev: >>> > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num >>> > 1014) dirty exists >>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>> > 1016: finishing write tid 1 to nodev21799-1016 >>> > 1016: finishing write tid 2 to nodev21799-1016 >>> > 1016: finishing write tid 3 to nodev21799-1016 >>> > 1016: finishing write tid 4 to nodev21799-1016 >>> > 1016: finishing write tid 6 to nodev21799-1016 >>> > 1016: finishing write tid 7 to nodev21799-1016 >>> > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num >>> > 1015) dirty exists >>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>> > 1017: finishing write tid 1 to nodev21799-1017 >>> > 1017: finishing write tid 2 to nodev21799-1017 >>> > 1017: finishing write tid 3 to nodev21799-1017 >>> > 1017: finishing write tid 5 to nodev21799-1017 >>> > 1017: finishing write tid 6 to nodev21799-1017 >>> > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num >>> > 1016) dirty exists >>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>> > 1018: finishing write tid 1 to nodev21799-1018 >>> > 1018: finishing write tid 2 to nodev21799-1018 >>> > 1018: finishing write tid 3 to nodev21799-1018 >>> > 1018: finishing write tid 4 to nodev21799-1018 >>> > 1018: finishing write tid 6 to nodev21799-1018 >>> > 1018: finishing write tid 7 to nodev21799-1018 >>> > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num >>> > 1017) dirty exists >>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>> > 1019: finishing write tid 1 to nodev21799-1019 >>> > 1019: finishing write tid 2 to nodev21799-1019 >>> > 1019: finishing write tid 3 to nodev21799-1019 >>> > 1019: finishing write tid 5 to nodev21799-1019 >>> > 1019: finishing write tid 6 to nodev21799-1019 >>> > update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018) >>> > dirty exists >>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>> > 1021: finishing write tid 1 to nodev21799-1021 >>> > 1020: finishing write tid 1 to nodev21799-1020 >>> > 1020: finishing write tid 2 to nodev21799-1020 >>> > 1020: finishing write tid 3 to nodev21799-1020 >>> > 1020: finishing write tid 5 to nodev21799-1020 >>> > 1020: finishing write tid 6 to nodev21799-1020 >>> > update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num >>> > 1019) dirty exists >>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>> > 1021: finishing write tid 2 to nodev21799-1021 >>> > 1021: finishing write tid 3 to nodev21799-1021 >>> > 1021: finishing write tid 5 to nodev21799-1021 >>> > 1021: finishing write tid 6 to nodev21799-1021 >>> > update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num >>> > 1020) dirty exists >>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>> > 1022: finishing write tid 1 to nodev21799-1022 >>> > 1022: finishing write tid 2 to nodev21799-1022 >>> > 1022: finishing write tid 3 to nodev21799-1022 >>> > 1022: finishing write tid 5 to nodev21799-1022 >>> > 1022: finishing write tid 6 to nodev21799-1022 >>> > update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num >>> > 1021) dirty exists >>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>> > 1023: finishing write tid 1 to nodev21799-1023 >>> > 1023: finishing write tid 2 to nodev21799-1023 >>> > 1023: finishing write tid 3 to nodev21799-1023 >>> > 1023: finishing write tid 5 to nodev21799-1023 >>> > 1023: finishing write tid 6 to nodev21799-1023 >>> > update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num >>> > 1022) dirty exists >>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>> > 1024: finishing write tid 1 to nodev21799-1024 >>> > 1025: Error: oid 219 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time >>> > 2016-03-17 10:53:43.493575 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7f0e015dd91d] >>> > 4: (()+0x72519) [0x7f0e015b8519] >>> > 5: (()+0x13c178) [0x7f0e01682178] >>> > 6: (()+0x80a4) [0x7f0e011160a4] >>> > 7: (clone()+0x6d) [0x7f0dffa9104d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > nodew: >>> > 1117: expect (ObjNum 8 snap 0 seq_num 8) >>> > 1120: expect (ObjNum 95 snap 0 seq_num 95) >>> > 1121: expect (ObjNum 994 snap 0 seq_num 994) >>> > 1113: expect (ObjNum 362 snap 0 seq_num 362) >>> > 1118: expect (ObjNum 179 snap 0 seq_num 179) >>> > 1115: expect (ObjNum 943 snap 0 seq_num 943) >>> > 1119: expect (ObjNum 250 snap 0 seq_num 250) >>> > 1124: finishing write tid 1 to nodew21820-361 >>> > 1124: finishing write tid 2 to nodew21820-361 >>> > 1124: finishing write tid 3 to nodew21820-361 >>> > 1124: finishing write tid 4 to nodew21820-361 >>> > 1124: finishing write tid 6 to nodew21820-361 >>> > 1124: finishing write tid 7 to nodew21820-361 >>> > update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061) >>> > dirty exists >>> > 1124: left oid 361 (ObjNum 1061 snap 0 seq_num 1061) >>> > 1125: finishing write tid 1 to nodew21820-486 >>> > 1125: finishing write tid 2 to nodew21820-486 >>> > 1125: finishing write tid 3 to nodew21820-486 >>> > 1125: finishing write tid 5 to nodew21820-486 >>> > 1125: finishing write tid 6 to nodew21820-486 >>> > update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062) >>> > dirty exists >>> > 1125: left oid 486 (ObjNum 1062 snap 0 seq_num 1062) >>> > 1126: expect (ObjNum 289 snap 0 seq_num 289) >>> > 1127: finishing write tid 1 to nodew21820-765 >>> > 1127: finishing write tid 2 to nodew21820-765 >>> > 1127: finishing write tid 3 to nodew21820-765 >>> > 1127: finishing write tid 5 to nodew21820-765 >>> > 1127: finishing write tid 6 to nodew21820-765 >>> > update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063) >>> > dirty exists >>> > 1127: left oid 765 (ObjNum 1063 snap 0 seq_num 1063) >>> > 1128: finishing write tid 1 to nodew21820-40 >>> > 1128: finishing write tid 2 to nodew21820-40 >>> > 1128: finishing write tid 3 to nodew21820-40 >>> > 1128: finishing write tid 5 to nodew21820-40 >>> > 1128: finishing write tid 6 to nodew21820-40 >>> > update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064) >>> > dirty exists >>> > 1128: left oid 40 (ObjNum 1064 snap 0 seq_num 1064) >>> > 1129: expect (ObjNum 616 snap 0 seq_num 616) >>> > 1110: done (14 left) >>> > 1113: done (13 left) >>> > 1115: done (12 left) >>> > 1117: done (11 left) >>> > 1118: done (10 left) >>> > 1119: done (9 left) >>> > 1120: done (8 left) >>> > 1121: done (7 left) >>> > 1124: done (6 left) >>> > 1125: done (5 left) >>> > 1126: done (4 left) >>> > 1127: done (3 left) >>> > 1128: done (2 left) >>> > 1129: done (1 left) >>> > 1131: read oid 29 snap -1 >>> > 1131: expect (ObjNum 28 snap 0 seq_num 28) >>> > 1132: read oid 764 snap -1 >>> > 1132: expect (ObjNum 763 snap 0 seq_num 763) >>> > 1133: read oid 469 snap -1 >>> > 1133: expect (ObjNum 468 snap 0 seq_num 468) >>> > 1134: write oid 243 current snap is 0 >>> > 1134: seq_num 1065 ranges >>> > {483354=596553,1514502=531232,2509844=632287,3283353=1} >>> > 1134: writing nodew21820-243 from 483354 to 1079907 tid 1 >>> > 1134: writing nodew21820-243 from 1514502 to 2045734 tid 2 >>> > 1134: writing nodew21820-243 from 2509844 to 3142131 tid 3 >>> > 1134: writing nodew21820-243 from 3283353 to 3283354 tid 4 >>> > 1135: read oid 569 snap -1 >>> > 1135: expect (ObjNum 568 snap 0 seq_num 568) >>> > 1133: Error: oid 469 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time >>> > 2016-03-17 11:00:02.124951 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7fae7a8ca91d] >>> > 4: (()+0x72519) [0x7fae7a8a5519] >>> > 5: (()+0x13c178) [0x7fae7a96f178] >>> > 6: (()+0x80a4) [0x7fae7a4030a4] >>> > 7: (clone()+0x6d) [0x7fae78d7e04d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > nodex: >>> > >>> > 1024: finishing write tid 1 to nodex22014-1024 >>> > 1025: expect (ObjNum 75 snap 0 seq_num 75) >>> > 1024: finishing write tid 2 to nodex22014-1024 >>> > 1024: finishing write tid 3 to nodex22014-1024 >>> > 1024: finishing write tid 5 to nodex22014-1024 >>> > 1024: finishing write tid 6 to nodex22014-1024 >>> > update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023) >>> > dirty exists >>> > 1024: left oid 1024 (ObjNum 1023 snap 0 seq_num 1023) >>> > 982: done (44 left) >>> > 983: done (43 left) >>> > 984: done (42 left) >>> > 985: done (41 left) >>> > 986: done (40 left) >>> > 987: done (39 left) >>> > 988: done (38 left) >>> > 989: done (37 left) >>> > 990: done (36 left) >>> > 991: done (35 left) >>> > 992: done (34 left) >>> > 993: done (33 left) >>> > 994: done (32 left) >>> > 995: done (31 left) >>> > 996: done (30 left) >>> > 997: done (29 left) >>> > 998: done (28 left) >>> > 999: done (27 left) >>> > 1000: done (26 left) >>> > 1001: done (25 left) >>> > 1002: done (24 left) >>> > 1003: done (23 left) >>> > 1004: done (22 left) >>> > 1005: done (21 left) >>> > 1006: done (20 left) >>> > 1007: done (19 left) >>> > 1008: done (18 left) >>> > 1009: done (17 left) >>> > 1010: done (16 left) >>> > 1011: done (15 left) >>> > 1012: done (14 left) >>> > 1013: done (13 left) >>> > 1014: done (12 left) >>> > 1015: done (11 left) >>> > 1016: done (10 left) >>> > 1017: done (9 left) >>> > 1018: done (8 left) >>> > 1019: done (7 left) >>> > 1020: done (6 left) >>> > 1021: done (5 left) >>> > 1022: done (4 left) >>> > 1023: done (3 left) >>> > 1024: done (2 left) >>> > 1025: done (1 left) >>> > 1026: done (0 left) >>> > 1027: delete oid 101 current snap is 0 >>> > 1027: done (0 left) >>> > 1028: read oid 156 snap -1 >>> > 1028: expect (ObjNum 155 snap 0 seq_num 155) >>> > 1029: read oid 691 snap -1 >>> > 1029: expect (ObjNum 690 snap 0 seq_num 690) >>> > 1030: read oid 282 snap -1 >>> > 1030: expect (ObjNum 281 snap 0 seq_num 281) >>> > 1031: read oid 979 snap -1 >>> > 1031: expect (ObjNum 978 snap 0 seq_num 978) >>> > 1032: read oid 203 snap -1 >>> > 1032: expect (ObjNum 202 snap 0 seq_num 202) >>> > 1033: setattr oid 464 current snap is 0 >>> > 1032: Error: oid 203 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time >>> > 2016-03-17 10:53:44.291343 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7faff721191d] >>> > 4: (()+0x72519) [0x7faff71ec519] >>> > 5: (()+0x13c178) [0x7faff72b6178] >>> > 6: (()+0x80a4) [0x7faff6d4a0a4] >>> > 7: (clone()+0x6d) [0x7faff56c504d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > nodey: >>> > >>> > 974: done (52 left) >>> > 975: done (51 left) >>> > 976: done (50 left) >>> > 977: done (49 left) >>> > 978: done (48 left) >>> > 979: done (47 left) >>> > 980: done (46 left) >>> > 981: done (45 left) >>> > 982: done (44 left) >>> > 983: done (43 left) >>> > 984: done (42 left) >>> > 985: done (41 left) >>> > 986: done (40 left) >>> > 987: done (39 left) >>> > 988: done (38 left) >>> > 989: done (37 left) >>> > 990: done (36 left) >>> > 991: done (35 left) >>> > 992: done (34 left) >>> > 993: done (33 left) >>> > 994: done (32 left) >>> > 995: done (31 left) >>> > 996: done (30 left) >>> > 997: done (29 left) >>> > 998: done (28 left) >>> > 999: done (27 left) >>> > 1000: done (26 left) >>> > 1001: done (25 left) >>> > 1002: done (24 left) >>> > 1003: done (23 left) >>> > 1004: done (22 left) >>> > 1005: done (21 left) >>> > 1006: done (20 left) >>> > 1007: done (19 left) >>> > 1008: done (18 left) >>> > 1009: done (17 left) >>> > 1010: done (16 left) >>> > 1011: done (15 left) >>> > 1012: done (14 left) >>> > 1013: done (13 left) >>> > 1014: done (12 left) >>> > 1015: done (11 left) >>> > 1016: done (10 left) >>> > 1017: done (9 left) >>> > 1018: done (8 left) >>> > 1019: done (7 left) >>> > 1020: done (6 left) >>> > 1021: done (5 left) >>> > 1022: done (4 left) >>> > 1023: done (3 left) >>> > 1024: done (2 left) >>> > 1025: done (1 left) >>> > 1026: done (0 left) >>> > 1027: delete oid 101 current snap is 0 >>> > 1027: done (0 left) >>> > 1028: read oid 156 snap -1 >>> > 1028: expect (ObjNum 155 snap 0 seq_num 155) >>> > 1029: read oid 691 snap -1 >>> > 1029: expect (ObjNum 690 snap 0 seq_num 690) >>> > 1030: read oid 282 snap -1 >>> > 1030: expect (ObjNum 281 snap 0 seq_num 281) >>> > 1031: read oid 979 snap -1 >>> > 1031: expect (ObjNum 978 snap 0 seq_num 978) >>> > 1032: read oid 203 snap -1 >>> > 1032: expect (ObjNum 202 snap 0 seq_num 202) >>> > 1033: setattr oid 464 current snap is 0 >>> > 1028: Error: oid 156 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time >>> > 2016-03-17 10:53:57.082571 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7f5602f8d91d] >>> > 4: (()+0x72519) [0x7f5602f68519] >>> > 5: (()+0x13c178) [0x7f5603032178] >>> > 6: (()+0x80a4) [0x7f5602ac60a4] >>> > 7: (clone()+0x6d) [0x7f560144104d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > nodez: >>> > >>> > 1014: done (11 left) >>> > 1026: delete oid 717 current snap is 0 >>> > 1015: finishing write tid 2 to nodez24249-1015 >>> > 1015: finishing write tid 4 to nodez24249-1015 >>> > 1015: finishing write tid 5 to nodez24249-1015 >>> > update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num >>> > 1014) dirty exists >>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>> > 1016: finishing write tid 1 to nodez24249-1016 >>> > 1016: finishing write tid 2 to nodez24249-1016 >>> > 1016: finishing write tid 3 to nodez24249-1016 >>> > 1016: finishing write tid 4 to nodez24249-1016 >>> > 1016: finishing write tid 6 to nodez24249-1016 >>> > 1016: finishing write tid 7 to nodez24249-1016 >>> > update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num >>> > 1015) dirty exists >>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>> > 1017: finishing write tid 1 to nodez24249-1017 >>> > 1017: finishing write tid 2 to nodez24249-1017 >>> > 1017: finishing write tid 3 to nodez24249-1017 >>> > 1017: finishing write tid 5 to nodez24249-1017 >>> > 1017: finishing write tid 6 to nodez24249-1017 >>> > update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num >>> > 1016) dirty exists >>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>> > 1018: finishing write tid 1 to nodez24249-1018 >>> > 1018: finishing write tid 2 to nodez24249-1018 >>> > 1018: finishing write tid 3 to nodez24249-1018 >>> > 1018: finishing write tid 4 to nodez24249-1018 >>> > 1018: finishing write tid 6 to nodez24249-1018 >>> > 1018: finishing write tid 7 to nodez24249-1018 >>> > update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num >>> > 1017) dirty exists >>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>> > 1019: finishing write tid 1 to nodez24249-1019 >>> > 1019: finishing write tid 2 to nodez24249-1019 >>> > 1019: finishing write tid 3 to nodez24249-1019 >>> > 1019: finishing write tid 5 to nodez24249-1019 >>> > 1019: finishing write tid 6 to nodez24249-1019 >>> > update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018) >>> > dirty exists >>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>> > 1020: finishing write tid 1 to nodez24249-1020 >>> > 1020: finishing write tid 2 to nodez24249-1020 >>> > 1020: finishing write tid 3 to nodez24249-1020 >>> > 1020: finishing write tid 5 to nodez24249-1020 >>> > 1020: finishing write tid 6 to nodez24249-1020 >>> > update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019) >>> > dirty exists >>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>> > 1021: finishing write tid 1 to nodez24249-1021 >>> > 1021: finishing write tid 2 to nodez24249-1021 >>> > 1021: finishing write tid 3 to nodez24249-1021 >>> > 1021: finishing write tid 5 to nodez24249-1021 >>> > 1021: finishing write tid 6 to nodez24249-1021 >>> > update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num >>> > 1020) dirty exists >>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>> > 1022: finishing write tid 1 to nodez24249-1022 >>> > 1022: finishing write tid 2 to nodez24249-1022 >>> > 1022: finishing write tid 3 to nodez24249-1022 >>> > 1022: finishing write tid 5 to nodez24249-1022 >>> > 1022: finishing write tid 6 to nodez24249-1022 >>> > update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021) >>> > dirty exists >>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>> > 1023: finishing write tid 1 to nodez24249-1023 >>> > 1023: finishing write tid 2 to nodez24249-1023 >>> > 1023: finishing write tid 3 to nodez24249-1023 >>> > 1023: finishing write tid 5 to nodez24249-1023 >>> > 1023: finishing write tid 6 to nodez24249-1023 >>> > update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num >>> > 1022) dirty exists >>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>> > 1024: finishing write tid 1 to nodez24249-1024 >>> > 1025: Error: oid 230 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time >>> > 2016-03-17 10:53:41.757921 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7fd9d088d91d] >>> > 4: (()+0x72519) [0x7fd9d0868519] >>> > 5: (()+0x13c178) [0x7fd9d0932178] >>> > 6: (()+0x80a4) [0x7fd9d03c60a4] >>> > 7: (clone()+0x6d) [0x7fd9ced4104d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > >>> > nodezz: >>> > >>> > 1015: finishing write tid 1 to nodezz25161-1015 >>> > 1015: finishing write tid 2 to nodezz25161-1015 >>> > 1015: finishing write tid 4 to nodezz25161-1015 >>> > 1015: finishing write tid 5 to nodezz25161-1015 >>> > update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014) >>> > dirty exists >>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>> > 1016: finishing write tid 1 to nodezz25161-1016 >>> > 1016: finishing write tid 2 to nodezz25161-1016 >>> > 1016: finishing write tid 3 to nodezz25161-1016 >>> > 1016: finishing write tid 4 to nodezz25161-1016 >>> > 1016: finishing write tid 6 to nodezz25161-1016 >>> > 1016: finishing write tid 7 to nodezz25161-1016 >>> > update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num >>> > 1015) dirty exists >>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>> > 1017: finishing write tid 1 to nodezz25161-1017 >>> > 1017: finishing write tid 2 to nodezz25161-1017 >>> > 1017: finishing write tid 3 to nodezz25161-1017 >>> > 1017: finishing write tid 5 to nodezz25161-1017 >>> > 1017: finishing write tid 6 to nodezz25161-1017 >>> > update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num >>> > 1016) dirty exists >>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>> > 1018: finishing write tid 1 to nodezz25161-1018 >>> > 1018: finishing write tid 2 to nodezz25161-1018 >>> > 1018: finishing write tid 3 to nodezz25161-1018 >>> > 1018: finishing write tid 4 to nodezz25161-1018 >>> > 1018: finishing write tid 6 to nodezz25161-1018 >>> > 1018: finishing write tid 7 to nodezz25161-1018 >>> > update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num >>> > 1017) dirty exists >>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>> > 1019: finishing write tid 1 to nodezz25161-1019 >>> > 1019: finishing write tid 2 to nodezz25161-1019 >>> > 1019: finishing write tid 3 to nodezz25161-1019 >>> > 1019: finishing write tid 5 to nodezz25161-1019 >>> > 1019: finishing write tid 6 to nodezz25161-1019 >>> > update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num >>> > 1018) dirty exists >>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>> > 1020: finishing write tid 1 to nodezz25161-1020 >>> > 1020: finishing write tid 2 to nodezz25161-1020 >>> > 1020: finishing write tid 3 to nodezz25161-1020 >>> > 1020: finishing write tid 5 to nodezz25161-1020 >>> > 1020: finishing write tid 6 to nodezz25161-1020 >>> > update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num >>> > 1019) dirty exists >>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>> > 1021: finishing write tid 1 to nodezz25161-1021 >>> > 1021: finishing write tid 2 to nodezz25161-1021 >>> > 1021: finishing write tid 3 to nodezz25161-1021 >>> > 1021: finishing write tid 5 to nodezz25161-1021 >>> > 1021: finishing write tid 6 to nodezz25161-1021 >>> > update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020) >>> > dirty exists >>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>> > 1022: finishing write tid 1 to nodezz25161-1022 >>> > 1022: finishing write tid 2 to nodezz25161-1022 >>> > 1022: finishing write tid 3 to nodezz25161-1022 >>> > 1022: finishing write tid 5 to nodezz25161-1022 >>> > 1022: finishing write tid 6 to nodezz25161-1022 >>> > update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021) >>> > dirty exists >>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>> > 1023: finishing write tid 1 to nodezz25161-1023 >>> > 1023: finishing write tid 2 to nodezz25161-1023 >>> > 1023: finishing write tid 3 to nodezz25161-1023 >>> > 1023: finishing write tid 5 to nodezz25161-1023 >>> > 1023: finishing write tid 6 to nodezz25161-1023 >>> > update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num >>> > 1022) dirty exists >>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>> > 1024: finishing write tid 1 to nodezz25161-1024 >>> > 1024: finishing write tid 2 to nodezz25161-1024 >>> > 1025: Error: oid 219 read returned error code -2 >>> > ./test/osd/RadosModel.h: In function 'virtual void >>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time >>> > 2016-03-17 10:53:53.071338 >>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x76) [0x4db956] >>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>> > 3: (()+0x9791d) [0x7fbb30ff191d] >>> > 4: (()+0x72519) [0x7fbb30fcc519] >>> > 5: (()+0x13c178) [0x7fbb31096178] >>> > 6: (()+0x80a4) [0x7fbb30b2a0a4] >>> > 7: (clone()+0x6d) [0x7fbb2f4a504d] >>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> > needed to interpret this. >>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>> > Aborted >>> > ---------------- >>> > Robert LeBlanc >>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> > >>> > >>> > On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> >> On Thu, 17 Mar 2016, Robert LeBlanc wrote: >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> >>> Hash: SHA256 >>> >>> >>> >>> I'm having trouble finding documentation about using ceph_test_rados. Can I >>> >>> run this on the existing cluster and will that provide useful info? It seems >>> >>> running it in the build will not have the caching set up (vstart.sh). >>> >>> >>> >>> I have accepted a job with another company and only have until Wednesday to >>> >>> help with getting information about this bug. My new job will not be using C >>> >>> eph, so I won't be able to provide any additional info after Tuesday. I want >>> >>> to leave the company on a good trajectory for upgrading, so any input you c >>> >>> an provide will be helpful. >>> >> >>> >> I'm sorry to hear it! You'll be missed. :) >>> >> >>> >>> I've found: >>> >>> >>> >>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50 >>> >>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000 >>> >>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600 >>> >>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op >>> >>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0 >>> >>> >>> >>> Is that enough if I change --pool to the cached pool and do the toggling whi >>> >>> le ceph_test_rados is running? I think this will run for 10 minutes. >>> >> >>> >> Precisely. You can probably drop copy_from and snap ops from the list >>> >> since your workload wasn't exercising those. >>> >> >>> >> Thanks! >>> >> sage >>> >> >>> >> >>> >>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com