-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Sage, You patch seems to have resolved the issue for us. We can't reproduce the problem with ceph_test_rados or our VM test. I also figured out that those are all backports that were cherry-picked so it was showing the original commit date. There was quite a bit of work on ReplicatedPG.cc since 0.94.6 so it probably only makes sense to wait for 0.94.7 for this fix. Thanks for looking into this so quick! As a work around for 0.94.6, our testing shows that min_read_recency_for_promote 1 does not have the corruption as it keeps the original behavior. Something for people to be aware of with 0.94.6 and using cache tiers. Hopefully there is a way to detect this in a unittest. -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.3.6 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJW6wILCRDmVDuy+mK58QAAcVQP/0t8jGZuwmwg2RIwkgjQ Kb3mIxvsmnA9BQ4dICJB3Wu6FPT1/V34t0ThASehWyVSJyiUkdf+pxhXbDaQ vOr4OOyTwCB2Ly6jaLEgAiyGTL45uOnMYcSttXPG95lilTb+oGUcqBdQzRbw yJHG18UiEgMvKnttFjTLbd1FjICIY7xkkP7lrdHvaqe200aqQmb+g8CHTVj/ HqzYm/gTs84c2vK+x/nV8OFxY9Yf5WAV+O7uozeWC3SAc2VMlQgi8rdng51N B+andt/SXgGq9VCDqdmEzcEpBN+2wK6usZQCZJmMXRmW4BXYVK4yAdfgKJOB MEUN2cDA1i7bMIUcDrh1hnqwEfizkbqOWXpgrgAkQYhtlbp/gvEucl5nYMUy kv9jNYg/KFQn9tzZqKWmvHj3sjl6DmOlN+A9XA2fGppOiiKk0s4dVKRDFwSJ LNxUIZm4CtAekaQ4KymE/hK6RhRU2REQl7qSMF+wtw73nhA9gzqP32Ag46yd WoeGpOngWRnMaejQfkuTSjiDSLvbCd7X5LM/WXH4dJHtHNSSA2qK3c4Nvvqp yDhvFLdvybtJvWj0+hHczpcP0VlFZH9s7uGWz0+cNabkRnm41EC2+XD6sJ5+ kinZO+CgjbC2AQPdoEKMuvRwBgnftH0YuZJFl0sQPkgBg23r+eCfIxfW/9v/ iLgk =6It+ -----END PGP SIGNATURE----- ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:55 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just > building your branch. I'm not sure what the difference between your > branch and 0.94.6 is, I don't see any commits against > osd/ReplicatedPG.cc in the last 5 months other than the one you did > today. > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Mar 17, 2016 at 11:38 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >> Yep, let me pull and build that branch. I tried installing the dbg >> packages and running it in gdb, but it didn't load the symbols. >> ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> On Thu, 17 Mar 2016, Robert LeBlanc wrote: >>>> Also, is this ceph_test_rados rewriting objects quickly? I think that >>>> the issue is with rewriting objects so if we can tailor the >>>> ceph_test_rados to do that, it might be easier to reproduce. >>> >>> It's doing lots of overwrites, yeah. >>> >>> I was albe to reproduce--thanks! It looks like it's specific to >>> hammer. The code was rewritten for jewel so it doesn't affect the >>> latest. The problem is that maybe_handle_cache may proxy the read and >>> also still try to handle the same request locally (if it doesn't trigger a >>> promote). >>> >>> Here's my proposed fix: >>> >>> https://github.com/ceph/ceph/pull/8187 >>> >>> Do you mind testing this branch? >>> >>> It doesn't appear to be directly related to flipping between writeback and >>> forward, although it may be that we are seeing two unrelated issues. I >>> seemed to be able to trigger it more easily when I flipped modes, but the >>> bug itself was a simple issue in the writeback mode logic. :/ >>> >>> Anyway, please see if this fixes it for you (esp with the RBD workload). >>> >>> Thanks! >>> sage >>> >>> >>> >>> >>>> ---------------- >>>> Robert LeBlanc >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>> >>>> >>>> On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >>>> > I'll miss the Ceph community as well. There was a few things I really >>>> > wanted to work in with Ceph. >>>> > >>>> > I got this: >>>> > >>>> > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028) >>>> > dirty exists >>>> > 1038: left oid 13 (ObjNum 1028 snap 0 seq_num 1028) >>>> > 1040: finishing write tid 1 to nodez23350-256 >>>> > 1040: finishing write tid 2 to nodez23350-256 >>>> > 1040: finishing write tid 3 to nodez23350-256 >>>> > 1040: finishing write tid 4 to nodez23350-256 >>>> > 1040: finishing write tid 6 to nodez23350-256 >>>> > 1035: done (4 left) >>>> > 1037: done (3 left) >>>> > 1038: done (2 left) >>>> > 1043: read oid 430 snap -1 >>>> > 1043: expect (ObjNum 429 snap 0 seq_num 429) >>>> > 1040: finishing write tid 7 to nodez23350-256 >>>> > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029) >>>> > dirty exists >>>> > 1040: left oid 256 (ObjNum 1029 snap 0 seq_num 1029) >>>> > 1042: expect (ObjNum 664 snap 0 seq_num 664) >>>> > 1043: Error: oid 430 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time >>>> > 2016-03-17 10:47:19.085414 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7fa1d472191d] >>>> > 4: (()+0x72519) [0x7fa1d46fc519] >>>> > 5: (()+0x13c178) [0x7fa1d47c6178] >>>> > 6: (()+0x80a4) [0x7fa1d425a0a4] >>>> > 7: (clone()+0x6d) [0x7fa1d2bd504d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > I had to toggle writeback/forward and min_read_recency_for_promote a >>>> > few times to get it, but I don't know if it is because I only have one >>>> > job running. Even with six jobs running, it is not easy to trigger >>>> > with ceph_test_rados, but it is very instant in the RBD VMs. >>>> > >>>> > Here are the six run crashes (I have about the last 2000 lines of each >>>> > if needed): >>>> > >>>> > nodev: >>>> > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num >>>> > 1014) dirty exists >>>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>>> > 1016: finishing write tid 1 to nodev21799-1016 >>>> > 1016: finishing write tid 2 to nodev21799-1016 >>>> > 1016: finishing write tid 3 to nodev21799-1016 >>>> > 1016: finishing write tid 4 to nodev21799-1016 >>>> > 1016: finishing write tid 6 to nodev21799-1016 >>>> > 1016: finishing write tid 7 to nodev21799-1016 >>>> > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num >>>> > 1015) dirty exists >>>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>>> > 1017: finishing write tid 1 to nodev21799-1017 >>>> > 1017: finishing write tid 2 to nodev21799-1017 >>>> > 1017: finishing write tid 3 to nodev21799-1017 >>>> > 1017: finishing write tid 5 to nodev21799-1017 >>>> > 1017: finishing write tid 6 to nodev21799-1017 >>>> > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num >>>> > 1016) dirty exists >>>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>>> > 1018: finishing write tid 1 to nodev21799-1018 >>>> > 1018: finishing write tid 2 to nodev21799-1018 >>>> > 1018: finishing write tid 3 to nodev21799-1018 >>>> > 1018: finishing write tid 4 to nodev21799-1018 >>>> > 1018: finishing write tid 6 to nodev21799-1018 >>>> > 1018: finishing write tid 7 to nodev21799-1018 >>>> > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num >>>> > 1017) dirty exists >>>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>>> > 1019: finishing write tid 1 to nodev21799-1019 >>>> > 1019: finishing write tid 2 to nodev21799-1019 >>>> > 1019: finishing write tid 3 to nodev21799-1019 >>>> > 1019: finishing write tid 5 to nodev21799-1019 >>>> > 1019: finishing write tid 6 to nodev21799-1019 >>>> > update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018) >>>> > dirty exists >>>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>>> > 1021: finishing write tid 1 to nodev21799-1021 >>>> > 1020: finishing write tid 1 to nodev21799-1020 >>>> > 1020: finishing write tid 2 to nodev21799-1020 >>>> > 1020: finishing write tid 3 to nodev21799-1020 >>>> > 1020: finishing write tid 5 to nodev21799-1020 >>>> > 1020: finishing write tid 6 to nodev21799-1020 >>>> > update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num >>>> > 1019) dirty exists >>>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>>> > 1021: finishing write tid 2 to nodev21799-1021 >>>> > 1021: finishing write tid 3 to nodev21799-1021 >>>> > 1021: finishing write tid 5 to nodev21799-1021 >>>> > 1021: finishing write tid 6 to nodev21799-1021 >>>> > update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num >>>> > 1020) dirty exists >>>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>>> > 1022: finishing write tid 1 to nodev21799-1022 >>>> > 1022: finishing write tid 2 to nodev21799-1022 >>>> > 1022: finishing write tid 3 to nodev21799-1022 >>>> > 1022: finishing write tid 5 to nodev21799-1022 >>>> > 1022: finishing write tid 6 to nodev21799-1022 >>>> > update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num >>>> > 1021) dirty exists >>>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>>> > 1023: finishing write tid 1 to nodev21799-1023 >>>> > 1023: finishing write tid 2 to nodev21799-1023 >>>> > 1023: finishing write tid 3 to nodev21799-1023 >>>> > 1023: finishing write tid 5 to nodev21799-1023 >>>> > 1023: finishing write tid 6 to nodev21799-1023 >>>> > update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num >>>> > 1022) dirty exists >>>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>>> > 1024: finishing write tid 1 to nodev21799-1024 >>>> > 1025: Error: oid 219 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time >>>> > 2016-03-17 10:53:43.493575 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7f0e015dd91d] >>>> > 4: (()+0x72519) [0x7f0e015b8519] >>>> > 5: (()+0x13c178) [0x7f0e01682178] >>>> > 6: (()+0x80a4) [0x7f0e011160a4] >>>> > 7: (clone()+0x6d) [0x7f0dffa9104d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > nodew: >>>> > 1117: expect (ObjNum 8 snap 0 seq_num 8) >>>> > 1120: expect (ObjNum 95 snap 0 seq_num 95) >>>> > 1121: expect (ObjNum 994 snap 0 seq_num 994) >>>> > 1113: expect (ObjNum 362 snap 0 seq_num 362) >>>> > 1118: expect (ObjNum 179 snap 0 seq_num 179) >>>> > 1115: expect (ObjNum 943 snap 0 seq_num 943) >>>> > 1119: expect (ObjNum 250 snap 0 seq_num 250) >>>> > 1124: finishing write tid 1 to nodew21820-361 >>>> > 1124: finishing write tid 2 to nodew21820-361 >>>> > 1124: finishing write tid 3 to nodew21820-361 >>>> > 1124: finishing write tid 4 to nodew21820-361 >>>> > 1124: finishing write tid 6 to nodew21820-361 >>>> > 1124: finishing write tid 7 to nodew21820-361 >>>> > update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061) >>>> > dirty exists >>>> > 1124: left oid 361 (ObjNum 1061 snap 0 seq_num 1061) >>>> > 1125: finishing write tid 1 to nodew21820-486 >>>> > 1125: finishing write tid 2 to nodew21820-486 >>>> > 1125: finishing write tid 3 to nodew21820-486 >>>> > 1125: finishing write tid 5 to nodew21820-486 >>>> > 1125: finishing write tid 6 to nodew21820-486 >>>> > update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062) >>>> > dirty exists >>>> > 1125: left oid 486 (ObjNum 1062 snap 0 seq_num 1062) >>>> > 1126: expect (ObjNum 289 snap 0 seq_num 289) >>>> > 1127: finishing write tid 1 to nodew21820-765 >>>> > 1127: finishing write tid 2 to nodew21820-765 >>>> > 1127: finishing write tid 3 to nodew21820-765 >>>> > 1127: finishing write tid 5 to nodew21820-765 >>>> > 1127: finishing write tid 6 to nodew21820-765 >>>> > update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063) >>>> > dirty exists >>>> > 1127: left oid 765 (ObjNum 1063 snap 0 seq_num 1063) >>>> > 1128: finishing write tid 1 to nodew21820-40 >>>> > 1128: finishing write tid 2 to nodew21820-40 >>>> > 1128: finishing write tid 3 to nodew21820-40 >>>> > 1128: finishing write tid 5 to nodew21820-40 >>>> > 1128: finishing write tid 6 to nodew21820-40 >>>> > update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064) >>>> > dirty exists >>>> > 1128: left oid 40 (ObjNum 1064 snap 0 seq_num 1064) >>>> > 1129: expect (ObjNum 616 snap 0 seq_num 616) >>>> > 1110: done (14 left) >>>> > 1113: done (13 left) >>>> > 1115: done (12 left) >>>> > 1117: done (11 left) >>>> > 1118: done (10 left) >>>> > 1119: done (9 left) >>>> > 1120: done (8 left) >>>> > 1121: done (7 left) >>>> > 1124: done (6 left) >>>> > 1125: done (5 left) >>>> > 1126: done (4 left) >>>> > 1127: done (3 left) >>>> > 1128: done (2 left) >>>> > 1129: done (1 left) >>>> > 1131: read oid 29 snap -1 >>>> > 1131: expect (ObjNum 28 snap 0 seq_num 28) >>>> > 1132: read oid 764 snap -1 >>>> > 1132: expect (ObjNum 763 snap 0 seq_num 763) >>>> > 1133: read oid 469 snap -1 >>>> > 1133: expect (ObjNum 468 snap 0 seq_num 468) >>>> > 1134: write oid 243 current snap is 0 >>>> > 1134: seq_num 1065 ranges >>>> > {483354=596553,1514502=531232,2509844=632287,3283353=1} >>>> > 1134: writing nodew21820-243 from 483354 to 1079907 tid 1 >>>> > 1134: writing nodew21820-243 from 1514502 to 2045734 tid 2 >>>> > 1134: writing nodew21820-243 from 2509844 to 3142131 tid 3 >>>> > 1134: writing nodew21820-243 from 3283353 to 3283354 tid 4 >>>> > 1135: read oid 569 snap -1 >>>> > 1135: expect (ObjNum 568 snap 0 seq_num 568) >>>> > 1133: Error: oid 469 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time >>>> > 2016-03-17 11:00:02.124951 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7fae7a8ca91d] >>>> > 4: (()+0x72519) [0x7fae7a8a5519] >>>> > 5: (()+0x13c178) [0x7fae7a96f178] >>>> > 6: (()+0x80a4) [0x7fae7a4030a4] >>>> > 7: (clone()+0x6d) [0x7fae78d7e04d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > nodex: >>>> > >>>> > 1024: finishing write tid 1 to nodex22014-1024 >>>> > 1025: expect (ObjNum 75 snap 0 seq_num 75) >>>> > 1024: finishing write tid 2 to nodex22014-1024 >>>> > 1024: finishing write tid 3 to nodex22014-1024 >>>> > 1024: finishing write tid 5 to nodex22014-1024 >>>> > 1024: finishing write tid 6 to nodex22014-1024 >>>> > update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023) >>>> > dirty exists >>>> > 1024: left oid 1024 (ObjNum 1023 snap 0 seq_num 1023) >>>> > 982: done (44 left) >>>> > 983: done (43 left) >>>> > 984: done (42 left) >>>> > 985: done (41 left) >>>> > 986: done (40 left) >>>> > 987: done (39 left) >>>> > 988: done (38 left) >>>> > 989: done (37 left) >>>> > 990: done (36 left) >>>> > 991: done (35 left) >>>> > 992: done (34 left) >>>> > 993: done (33 left) >>>> > 994: done (32 left) >>>> > 995: done (31 left) >>>> > 996: done (30 left) >>>> > 997: done (29 left) >>>> > 998: done (28 left) >>>> > 999: done (27 left) >>>> > 1000: done (26 left) >>>> > 1001: done (25 left) >>>> > 1002: done (24 left) >>>> > 1003: done (23 left) >>>> > 1004: done (22 left) >>>> > 1005: done (21 left) >>>> > 1006: done (20 left) >>>> > 1007: done (19 left) >>>> > 1008: done (18 left) >>>> > 1009: done (17 left) >>>> > 1010: done (16 left) >>>> > 1011: done (15 left) >>>> > 1012: done (14 left) >>>> > 1013: done (13 left) >>>> > 1014: done (12 left) >>>> > 1015: done (11 left) >>>> > 1016: done (10 left) >>>> > 1017: done (9 left) >>>> > 1018: done (8 left) >>>> > 1019: done (7 left) >>>> > 1020: done (6 left) >>>> > 1021: done (5 left) >>>> > 1022: done (4 left) >>>> > 1023: done (3 left) >>>> > 1024: done (2 left) >>>> > 1025: done (1 left) >>>> > 1026: done (0 left) >>>> > 1027: delete oid 101 current snap is 0 >>>> > 1027: done (0 left) >>>> > 1028: read oid 156 snap -1 >>>> > 1028: expect (ObjNum 155 snap 0 seq_num 155) >>>> > 1029: read oid 691 snap -1 >>>> > 1029: expect (ObjNum 690 snap 0 seq_num 690) >>>> > 1030: read oid 282 snap -1 >>>> > 1030: expect (ObjNum 281 snap 0 seq_num 281) >>>> > 1031: read oid 979 snap -1 >>>> > 1031: expect (ObjNum 978 snap 0 seq_num 978) >>>> > 1032: read oid 203 snap -1 >>>> > 1032: expect (ObjNum 202 snap 0 seq_num 202) >>>> > 1033: setattr oid 464 current snap is 0 >>>> > 1032: Error: oid 203 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time >>>> > 2016-03-17 10:53:44.291343 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7faff721191d] >>>> > 4: (()+0x72519) [0x7faff71ec519] >>>> > 5: (()+0x13c178) [0x7faff72b6178] >>>> > 6: (()+0x80a4) [0x7faff6d4a0a4] >>>> > 7: (clone()+0x6d) [0x7faff56c504d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > nodey: >>>> > >>>> > 974: done (52 left) >>>> > 975: done (51 left) >>>> > 976: done (50 left) >>>> > 977: done (49 left) >>>> > 978: done (48 left) >>>> > 979: done (47 left) >>>> > 980: done (46 left) >>>> > 981: done (45 left) >>>> > 982: done (44 left) >>>> > 983: done (43 left) >>>> > 984: done (42 left) >>>> > 985: done (41 left) >>>> > 986: done (40 left) >>>> > 987: done (39 left) >>>> > 988: done (38 left) >>>> > 989: done (37 left) >>>> > 990: done (36 left) >>>> > 991: done (35 left) >>>> > 992: done (34 left) >>>> > 993: done (33 left) >>>> > 994: done (32 left) >>>> > 995: done (31 left) >>>> > 996: done (30 left) >>>> > 997: done (29 left) >>>> > 998: done (28 left) >>>> > 999: done (27 left) >>>> > 1000: done (26 left) >>>> > 1001: done (25 left) >>>> > 1002: done (24 left) >>>> > 1003: done (23 left) >>>> > 1004: done (22 left) >>>> > 1005: done (21 left) >>>> > 1006: done (20 left) >>>> > 1007: done (19 left) >>>> > 1008: done (18 left) >>>> > 1009: done (17 left) >>>> > 1010: done (16 left) >>>> > 1011: done (15 left) >>>> > 1012: done (14 left) >>>> > 1013: done (13 left) >>>> > 1014: done (12 left) >>>> > 1015: done (11 left) >>>> > 1016: done (10 left) >>>> > 1017: done (9 left) >>>> > 1018: done (8 left) >>>> > 1019: done (7 left) >>>> > 1020: done (6 left) >>>> > 1021: done (5 left) >>>> > 1022: done (4 left) >>>> > 1023: done (3 left) >>>> > 1024: done (2 left) >>>> > 1025: done (1 left) >>>> > 1026: done (0 left) >>>> > 1027: delete oid 101 current snap is 0 >>>> > 1027: done (0 left) >>>> > 1028: read oid 156 snap -1 >>>> > 1028: expect (ObjNum 155 snap 0 seq_num 155) >>>> > 1029: read oid 691 snap -1 >>>> > 1029: expect (ObjNum 690 snap 0 seq_num 690) >>>> > 1030: read oid 282 snap -1 >>>> > 1030: expect (ObjNum 281 snap 0 seq_num 281) >>>> > 1031: read oid 979 snap -1 >>>> > 1031: expect (ObjNum 978 snap 0 seq_num 978) >>>> > 1032: read oid 203 snap -1 >>>> > 1032: expect (ObjNum 202 snap 0 seq_num 202) >>>> > 1033: setattr oid 464 current snap is 0 >>>> > 1028: Error: oid 156 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time >>>> > 2016-03-17 10:53:57.082571 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7f5602f8d91d] >>>> > 4: (()+0x72519) [0x7f5602f68519] >>>> > 5: (()+0x13c178) [0x7f5603032178] >>>> > 6: (()+0x80a4) [0x7f5602ac60a4] >>>> > 7: (clone()+0x6d) [0x7f560144104d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > nodez: >>>> > >>>> > 1014: done (11 left) >>>> > 1026: delete oid 717 current snap is 0 >>>> > 1015: finishing write tid 2 to nodez24249-1015 >>>> > 1015: finishing write tid 4 to nodez24249-1015 >>>> > 1015: finishing write tid 5 to nodez24249-1015 >>>> > update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num >>>> > 1014) dirty exists >>>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>>> > 1016: finishing write tid 1 to nodez24249-1016 >>>> > 1016: finishing write tid 2 to nodez24249-1016 >>>> > 1016: finishing write tid 3 to nodez24249-1016 >>>> > 1016: finishing write tid 4 to nodez24249-1016 >>>> > 1016: finishing write tid 6 to nodez24249-1016 >>>> > 1016: finishing write tid 7 to nodez24249-1016 >>>> > update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num >>>> > 1015) dirty exists >>>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>>> > 1017: finishing write tid 1 to nodez24249-1017 >>>> > 1017: finishing write tid 2 to nodez24249-1017 >>>> > 1017: finishing write tid 3 to nodez24249-1017 >>>> > 1017: finishing write tid 5 to nodez24249-1017 >>>> > 1017: finishing write tid 6 to nodez24249-1017 >>>> > update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num >>>> > 1016) dirty exists >>>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>>> > 1018: finishing write tid 1 to nodez24249-1018 >>>> > 1018: finishing write tid 2 to nodez24249-1018 >>>> > 1018: finishing write tid 3 to nodez24249-1018 >>>> > 1018: finishing write tid 4 to nodez24249-1018 >>>> > 1018: finishing write tid 6 to nodez24249-1018 >>>> > 1018: finishing write tid 7 to nodez24249-1018 >>>> > update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num >>>> > 1017) dirty exists >>>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>>> > 1019: finishing write tid 1 to nodez24249-1019 >>>> > 1019: finishing write tid 2 to nodez24249-1019 >>>> > 1019: finishing write tid 3 to nodez24249-1019 >>>> > 1019: finishing write tid 5 to nodez24249-1019 >>>> > 1019: finishing write tid 6 to nodez24249-1019 >>>> > update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018) >>>> > dirty exists >>>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>>> > 1020: finishing write tid 1 to nodez24249-1020 >>>> > 1020: finishing write tid 2 to nodez24249-1020 >>>> > 1020: finishing write tid 3 to nodez24249-1020 >>>> > 1020: finishing write tid 5 to nodez24249-1020 >>>> > 1020: finishing write tid 6 to nodez24249-1020 >>>> > update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019) >>>> > dirty exists >>>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>>> > 1021: finishing write tid 1 to nodez24249-1021 >>>> > 1021: finishing write tid 2 to nodez24249-1021 >>>> > 1021: finishing write tid 3 to nodez24249-1021 >>>> > 1021: finishing write tid 5 to nodez24249-1021 >>>> > 1021: finishing write tid 6 to nodez24249-1021 >>>> > update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num >>>> > 1020) dirty exists >>>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>>> > 1022: finishing write tid 1 to nodez24249-1022 >>>> > 1022: finishing write tid 2 to nodez24249-1022 >>>> > 1022: finishing write tid 3 to nodez24249-1022 >>>> > 1022: finishing write tid 5 to nodez24249-1022 >>>> > 1022: finishing write tid 6 to nodez24249-1022 >>>> > update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021) >>>> > dirty exists >>>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>>> > 1023: finishing write tid 1 to nodez24249-1023 >>>> > 1023: finishing write tid 2 to nodez24249-1023 >>>> > 1023: finishing write tid 3 to nodez24249-1023 >>>> > 1023: finishing write tid 5 to nodez24249-1023 >>>> > 1023: finishing write tid 6 to nodez24249-1023 >>>> > update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num >>>> > 1022) dirty exists >>>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>>> > 1024: finishing write tid 1 to nodez24249-1024 >>>> > 1025: Error: oid 230 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time >>>> > 2016-03-17 10:53:41.757921 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7fd9d088d91d] >>>> > 4: (()+0x72519) [0x7fd9d0868519] >>>> > 5: (()+0x13c178) [0x7fd9d0932178] >>>> > 6: (()+0x80a4) [0x7fd9d03c60a4] >>>> > 7: (clone()+0x6d) [0x7fd9ced4104d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > >>>> > nodezz: >>>> > >>>> > 1015: finishing write tid 1 to nodezz25161-1015 >>>> > 1015: finishing write tid 2 to nodezz25161-1015 >>>> > 1015: finishing write tid 4 to nodezz25161-1015 >>>> > 1015: finishing write tid 5 to nodezz25161-1015 >>>> > update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014) >>>> > dirty exists >>>> > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) >>>> > 1016: finishing write tid 1 to nodezz25161-1016 >>>> > 1016: finishing write tid 2 to nodezz25161-1016 >>>> > 1016: finishing write tid 3 to nodezz25161-1016 >>>> > 1016: finishing write tid 4 to nodezz25161-1016 >>>> > 1016: finishing write tid 6 to nodezz25161-1016 >>>> > 1016: finishing write tid 7 to nodezz25161-1016 >>>> > update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num >>>> > 1015) dirty exists >>>> > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) >>>> > 1017: finishing write tid 1 to nodezz25161-1017 >>>> > 1017: finishing write tid 2 to nodezz25161-1017 >>>> > 1017: finishing write tid 3 to nodezz25161-1017 >>>> > 1017: finishing write tid 5 to nodezz25161-1017 >>>> > 1017: finishing write tid 6 to nodezz25161-1017 >>>> > update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num >>>> > 1016) dirty exists >>>> > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) >>>> > 1018: finishing write tid 1 to nodezz25161-1018 >>>> > 1018: finishing write tid 2 to nodezz25161-1018 >>>> > 1018: finishing write tid 3 to nodezz25161-1018 >>>> > 1018: finishing write tid 4 to nodezz25161-1018 >>>> > 1018: finishing write tid 6 to nodezz25161-1018 >>>> > 1018: finishing write tid 7 to nodezz25161-1018 >>>> > update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num >>>> > 1017) dirty exists >>>> > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) >>>> > 1019: finishing write tid 1 to nodezz25161-1019 >>>> > 1019: finishing write tid 2 to nodezz25161-1019 >>>> > 1019: finishing write tid 3 to nodezz25161-1019 >>>> > 1019: finishing write tid 5 to nodezz25161-1019 >>>> > 1019: finishing write tid 6 to nodezz25161-1019 >>>> > update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num >>>> > 1018) dirty exists >>>> > 1019: left oid 1019 (ObjNum 1018 snap 0 seq_num 1018) >>>> > 1020: finishing write tid 1 to nodezz25161-1020 >>>> > 1020: finishing write tid 2 to nodezz25161-1020 >>>> > 1020: finishing write tid 3 to nodezz25161-1020 >>>> > 1020: finishing write tid 5 to nodezz25161-1020 >>>> > 1020: finishing write tid 6 to nodezz25161-1020 >>>> > update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num >>>> > 1019) dirty exists >>>> > 1020: left oid 1020 (ObjNum 1019 snap 0 seq_num 1019) >>>> > 1021: finishing write tid 1 to nodezz25161-1021 >>>> > 1021: finishing write tid 2 to nodezz25161-1021 >>>> > 1021: finishing write tid 3 to nodezz25161-1021 >>>> > 1021: finishing write tid 5 to nodezz25161-1021 >>>> > 1021: finishing write tid 6 to nodezz25161-1021 >>>> > update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020) >>>> > dirty exists >>>> > 1021: left oid 1021 (ObjNum 1020 snap 0 seq_num 1020) >>>> > 1022: finishing write tid 1 to nodezz25161-1022 >>>> > 1022: finishing write tid 2 to nodezz25161-1022 >>>> > 1022: finishing write tid 3 to nodezz25161-1022 >>>> > 1022: finishing write tid 5 to nodezz25161-1022 >>>> > 1022: finishing write tid 6 to nodezz25161-1022 >>>> > update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021) >>>> > dirty exists >>>> > 1022: left oid 1022 (ObjNum 1021 snap 0 seq_num 1021) >>>> > 1023: finishing write tid 1 to nodezz25161-1023 >>>> > 1023: finishing write tid 2 to nodezz25161-1023 >>>> > 1023: finishing write tid 3 to nodezz25161-1023 >>>> > 1023: finishing write tid 5 to nodezz25161-1023 >>>> > 1023: finishing write tid 6 to nodezz25161-1023 >>>> > update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num >>>> > 1022) dirty exists >>>> > 1023: left oid 1023 (ObjNum 1022 snap 0 seq_num 1022) >>>> > 1024: finishing write tid 1 to nodezz25161-1024 >>>> > 1024: finishing write tid 2 to nodezz25161-1024 >>>> > 1025: Error: oid 219 read returned error code -2 >>>> > ./test/osd/RadosModel.h: In function 'virtual void >>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time >>>> > 2016-03-17 10:53:53.071338 >>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0) >>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x76) [0x4db956] >>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] >>>> > 3: (()+0x9791d) [0x7fbb30ff191d] >>>> > 4: (()+0x72519) [0x7fbb30fcc519] >>>> > 5: (()+0x13c178) [0x7fbb31096178] >>>> > 6: (()+0x80a4) [0x7fbb30b2a0a4] >>>> > 7: (clone()+0x6d) [0x7fbb2f4a504d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> > needed to interpret this. >>>> > terminate called after throwing an instance of 'ceph::FailedAssertion' >>>> > Aborted >>>> > ---------------- >>>> > Robert LeBlanc >>>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>> > >>>> > >>>> > On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>>> >> On Thu, 17 Mar 2016, Robert LeBlanc wrote: >>>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>>> >>> Hash: SHA256 >>>> >>> >>>> >>> I'm having trouble finding documentation about using ceph_test_rados. Can I >>>> >>> run this on the existing cluster and will that provide useful info? It seems >>>> >>> running it in the build will not have the caching set up (vstart.sh). >>>> >>> >>>> >>> I have accepted a job with another company and only have until Wednesday to >>>> >>> help with getting information about this bug. My new job will not be using C >>>> >>> eph, so I won't be able to provide any additional info after Tuesday. I want >>>> >>> to leave the company on a good trajectory for upgrading, so any input you c >>>> >>> an provide will be helpful. >>>> >> >>>> >> I'm sorry to hear it! You'll be missed. :) >>>> >> >>>> >>> I've found: >>>> >>> >>>> >>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50 >>>> >>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000 >>>> >>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600 >>>> >>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op >>>> >>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0 >>>> >>> >>>> >>> Is that enough if I change --pool to the cached pool and do the toggling whi >>>> >>> le ceph_test_rados is running? I think this will run for 10 minutes. >>>> >> >>>> >> Precisely. You can probably drop copy_from and snap ops from the list >>>> >> since your workload wasn't exercising those. >>>> >> >>>> >> Thanks! >>>> >> sage >>>> >> >>>> >> >>>> >>>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com