Re: [ceph-users] data corruption with hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Sage,

You patch seems to have resolved the issue for us. We can't reproduce
the problem with ceph_test_rados or our VM test. I also figured out
that those are all backports that were cherry-picked so it was showing
the original commit date. There was quite a bit of work on
ReplicatedPG.cc since 0.94.6 so it probably only makes sense to wait
for 0.94.7 for this fix.

Thanks for looking into this so quick!

As a work around for 0.94.6, our testing shows that
min_read_recency_for_promote 1 does not have the corruption as it
keeps the original behavior. Something for people to be aware of with
0.94.6 and using cache tiers.

Hopefully there is a way to detect this in a unittest.
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.3.6
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJW6wILCRDmVDuy+mK58QAAcVQP/0t8jGZuwmwg2RIwkgjQ
Kb3mIxvsmnA9BQ4dICJB3Wu6FPT1/V34t0ThASehWyVSJyiUkdf+pxhXbDaQ
vOr4OOyTwCB2Ly6jaLEgAiyGTL45uOnMYcSttXPG95lilTb+oGUcqBdQzRbw
yJHG18UiEgMvKnttFjTLbd1FjICIY7xkkP7lrdHvaqe200aqQmb+g8CHTVj/
HqzYm/gTs84c2vK+x/nV8OFxY9Yf5WAV+O7uozeWC3SAc2VMlQgi8rdng51N
B+andt/SXgGq9VCDqdmEzcEpBN+2wK6usZQCZJmMXRmW4BXYVK4yAdfgKJOB
MEUN2cDA1i7bMIUcDrh1hnqwEfizkbqOWXpgrgAkQYhtlbp/gvEucl5nYMUy
kv9jNYg/KFQn9tzZqKWmvHj3sjl6DmOlN+A9XA2fGppOiiKk0s4dVKRDFwSJ
LNxUIZm4CtAekaQ4KymE/hK6RhRU2REQl7qSMF+wtw73nhA9gzqP32Ag46yd
WoeGpOngWRnMaejQfkuTSjiDSLvbCd7X5LM/WXH4dJHtHNSSA2qK3c4Nvvqp
yDhvFLdvybtJvWj0+hHczpcP0VlFZH9s7uGWz0+cNabkRnm41EC2+XD6sJ5+
kinZO+CgjbC2AQPdoEKMuvRwBgnftH0YuZJFl0sQPkgBg23r+eCfIxfW/9v/
iLgk
=6It+
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Mar 17, 2016 at 11:55 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just
> building your branch. I'm not sure what the difference between your
> branch and 0.94.6 is, I don't see any commits against
> osd/ReplicatedPG.cc in the last 5 months other than the one you did
> today.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Thu, Mar 17, 2016 at 11:38 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>> Yep, let me pull and build that branch. I tried installing the dbg
>> packages and running it in gdb, but it didn't load the symbols.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>>> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>>>> Also, is this ceph_test_rados rewriting objects quickly? I think that
>>>> the issue is with rewriting objects so if we can tailor the
>>>> ceph_test_rados to do that, it might be easier to reproduce.
>>>
>>> It's doing lots of overwrites, yeah.
>>>
>>> I was albe to reproduce--thanks!  It looks like it's specific to
>>> hammer.  The code was rewritten for jewel so it doesn't affect the
>>> latest.  The problem is that maybe_handle_cache may proxy the read and
>>> also still try to handle the same request locally (if it doesn't trigger a
>>> promote).
>>>
>>> Here's my proposed fix:
>>>
>>>         https://github.com/ceph/ceph/pull/8187
>>>
>>> Do you mind testing this branch?
>>>
>>> It doesn't appear to be directly related to flipping between writeback and
>>> forward, although it may be that we are seeing two unrelated issues.  I
>>> seemed to be able to trigger it more easily when I flipped modes, but the
>>> bug itself was a simple issue in the writeback mode logic.  :/
>>>
>>> Anyway, please see if this fixes it for you (esp with the RBD workload).
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>
>>>
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>>>> > I'll  miss the Ceph community as well. There was a few things I really
>>>> > wanted to work in with Ceph.
>>>> >
>>>> > I got this:
>>>> >
>>>> > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028)
>>>> > dirty exists
>>>> > 1038:  left oid 13 (ObjNum 1028 snap 0 seq_num 1028)
>>>> > 1040:  finishing write tid 1 to nodez23350-256
>>>> > 1040:  finishing write tid 2 to nodez23350-256
>>>> > 1040:  finishing write tid 3 to nodez23350-256
>>>> > 1040:  finishing write tid 4 to nodez23350-256
>>>> > 1040:  finishing write tid 6 to nodez23350-256
>>>> > 1035: done (4 left)
>>>> > 1037: done (3 left)
>>>> > 1038: done (2 left)
>>>> > 1043: read oid 430 snap -1
>>>> > 1043:  expect (ObjNum 429 snap 0 seq_num 429)
>>>> > 1040:  finishing write tid 7 to nodez23350-256
>>>> > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029)
>>>> > dirty exists
>>>> > 1040:  left oid 256 (ObjNum 1029 snap 0 seq_num 1029)
>>>> > 1042:  expect (ObjNum 664 snap 0 seq_num 664)
>>>> > 1043: Error: oid 430 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time
>>>> > 2016-03-17 10:47:19.085414
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7fa1d472191d]
>>>> > 4: (()+0x72519) [0x7fa1d46fc519]
>>>> > 5: (()+0x13c178) [0x7fa1d47c6178]
>>>> > 6: (()+0x80a4) [0x7fa1d425a0a4]
>>>> > 7: (clone()+0x6d) [0x7fa1d2bd504d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > I had to toggle writeback/forward and min_read_recency_for_promote a
>>>> > few times to get it, but I don't know if it is because I only have one
>>>> > job running. Even with six jobs running, it is not easy to trigger
>>>> > with ceph_test_rados, but it is very instant in the RBD VMs.
>>>> >
>>>> > Here are the six run crashes (I have about the last 2000 lines of each
>>>> > if needed):
>>>> >
>>>> > nodev:
>>>> > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num
>>>> > 1014) dirty exists
>>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>>> > 1016:  finishing write tid 1 to nodev21799-1016
>>>> > 1016:  finishing write tid 2 to nodev21799-1016
>>>> > 1016:  finishing write tid 3 to nodev21799-1016
>>>> > 1016:  finishing write tid 4 to nodev21799-1016
>>>> > 1016:  finishing write tid 6 to nodev21799-1016
>>>> > 1016:  finishing write tid 7 to nodev21799-1016
>>>> > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num
>>>> > 1015) dirty exists
>>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>>> > 1017:  finishing write tid 1 to nodev21799-1017
>>>> > 1017:  finishing write tid 2 to nodev21799-1017
>>>> > 1017:  finishing write tid 3 to nodev21799-1017
>>>> > 1017:  finishing write tid 5 to nodev21799-1017
>>>> > 1017:  finishing write tid 6 to nodev21799-1017
>>>> > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num
>>>> > 1016) dirty exists
>>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>>> > 1018:  finishing write tid 1 to nodev21799-1018
>>>> > 1018:  finishing write tid 2 to nodev21799-1018
>>>> > 1018:  finishing write tid 3 to nodev21799-1018
>>>> > 1018:  finishing write tid 4 to nodev21799-1018
>>>> > 1018:  finishing write tid 6 to nodev21799-1018
>>>> > 1018:  finishing write tid 7 to nodev21799-1018
>>>> > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num
>>>> > 1017) dirty exists
>>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>>> > 1019:  finishing write tid 1 to nodev21799-1019
>>>> > 1019:  finishing write tid 2 to nodev21799-1019
>>>> > 1019:  finishing write tid 3 to nodev21799-1019
>>>> > 1019:  finishing write tid 5 to nodev21799-1019
>>>> > 1019:  finishing write tid 6 to nodev21799-1019
>>>> > update_object_version oid 1019 v 462 (ObjNum 1018 snap 0 seq_num 1018)
>>>> > dirty exists
>>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>>> > 1021:  finishing write tid 1 to nodev21799-1021
>>>> > 1020:  finishing write tid 1 to nodev21799-1020
>>>> > 1020:  finishing write tid 2 to nodev21799-1020
>>>> > 1020:  finishing write tid 3 to nodev21799-1020
>>>> > 1020:  finishing write tid 5 to nodev21799-1020
>>>> > 1020:  finishing write tid 6 to nodev21799-1020
>>>> > update_object_version oid 1020 v 1287 (ObjNum 1019 snap 0 seq_num
>>>> > 1019) dirty exists
>>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>>> > 1021:  finishing write tid 2 to nodev21799-1021
>>>> > 1021:  finishing write tid 3 to nodev21799-1021
>>>> > 1021:  finishing write tid 5 to nodev21799-1021
>>>> > 1021:  finishing write tid 6 to nodev21799-1021
>>>> > update_object_version oid 1021 v 1077 (ObjNum 1020 snap 0 seq_num
>>>> > 1020) dirty exists
>>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>>> > 1022:  finishing write tid 1 to nodev21799-1022
>>>> > 1022:  finishing write tid 2 to nodev21799-1022
>>>> > 1022:  finishing write tid 3 to nodev21799-1022
>>>> > 1022:  finishing write tid 5 to nodev21799-1022
>>>> > 1022:  finishing write tid 6 to nodev21799-1022
>>>> > update_object_version oid 1022 v 1213 (ObjNum 1021 snap 0 seq_num
>>>> > 1021) dirty exists
>>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>>> > 1023:  finishing write tid 1 to nodev21799-1023
>>>> > 1023:  finishing write tid 2 to nodev21799-1023
>>>> > 1023:  finishing write tid 3 to nodev21799-1023
>>>> > 1023:  finishing write tid 5 to nodev21799-1023
>>>> > 1023:  finishing write tid 6 to nodev21799-1023
>>>> > update_object_version oid 1023 v 2612 (ObjNum 1022 snap 0 seq_num
>>>> > 1022) dirty exists
>>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>>> > 1024:  finishing write tid 1 to nodev21799-1024
>>>> > 1025: Error: oid 219 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0df8a16700 time
>>>> > 2016-03-17 10:53:43.493575
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7f0e015dd91d]
>>>> > 4: (()+0x72519) [0x7f0e015b8519]
>>>> > 5: (()+0x13c178) [0x7f0e01682178]
>>>> > 6: (()+0x80a4) [0x7f0e011160a4]
>>>> > 7: (clone()+0x6d) [0x7f0dffa9104d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > nodew:
>>>> > 1117:  expect (ObjNum 8 snap 0 seq_num 8)
>>>> > 1120:  expect (ObjNum 95 snap 0 seq_num 95)
>>>> > 1121:  expect (ObjNum 994 snap 0 seq_num 994)
>>>> > 1113:  expect (ObjNum 362 snap 0 seq_num 362)
>>>> > 1118:  expect (ObjNum 179 snap 0 seq_num 179)
>>>> > 1115:  expect (ObjNum 943 snap 0 seq_num 943)
>>>> > 1119:  expect (ObjNum 250 snap 0 seq_num 250)
>>>> > 1124:  finishing write tid 1 to nodew21820-361
>>>> > 1124:  finishing write tid 2 to nodew21820-361
>>>> > 1124:  finishing write tid 3 to nodew21820-361
>>>> > 1124:  finishing write tid 4 to nodew21820-361
>>>> > 1124:  finishing write tid 6 to nodew21820-361
>>>> > 1124:  finishing write tid 7 to nodew21820-361
>>>> > update_object_version oid 361 v 892 (ObjNum 1061 snap 0 seq_num 1061)
>>>> > dirty exists
>>>> > 1124:  left oid 361 (ObjNum 1061 snap 0 seq_num 1061)
>>>> > 1125:  finishing write tid 1 to nodew21820-486
>>>> > 1125:  finishing write tid 2 to nodew21820-486
>>>> > 1125:  finishing write tid 3 to nodew21820-486
>>>> > 1125:  finishing write tid 5 to nodew21820-486
>>>> > 1125:  finishing write tid 6 to nodew21820-486
>>>> > update_object_version oid 486 v 1317 (ObjNum 1062 snap 0 seq_num 1062)
>>>> > dirty exists
>>>> > 1125:  left oid 486 (ObjNum 1062 snap 0 seq_num 1062)
>>>> > 1126:  expect (ObjNum 289 snap 0 seq_num 289)
>>>> > 1127:  finishing write tid 1 to nodew21820-765
>>>> > 1127:  finishing write tid 2 to nodew21820-765
>>>> > 1127:  finishing write tid 3 to nodew21820-765
>>>> > 1127:  finishing write tid 5 to nodew21820-765
>>>> > 1127:  finishing write tid 6 to nodew21820-765
>>>> > update_object_version oid 765 v 1156 (ObjNum 1063 snap 0 seq_num 1063)
>>>> > dirty exists
>>>> > 1127:  left oid 765 (ObjNum 1063 snap 0 seq_num 1063)
>>>> > 1128:  finishing write tid 1 to nodew21820-40
>>>> > 1128:  finishing write tid 2 to nodew21820-40
>>>> > 1128:  finishing write tid 3 to nodew21820-40
>>>> > 1128:  finishing write tid 5 to nodew21820-40
>>>> > 1128:  finishing write tid 6 to nodew21820-40
>>>> > update_object_version oid 40 v 876 (ObjNum 1064 snap 0 seq_num 1064)
>>>> > dirty exists
>>>> > 1128:  left oid 40 (ObjNum 1064 snap 0 seq_num 1064)
>>>> > 1129:  expect (ObjNum 616 snap 0 seq_num 616)
>>>> > 1110: done (14 left)
>>>> > 1113: done (13 left)
>>>> > 1115: done (12 left)
>>>> > 1117: done (11 left)
>>>> > 1118: done (10 left)
>>>> > 1119: done (9 left)
>>>> > 1120: done (8 left)
>>>> > 1121: done (7 left)
>>>> > 1124: done (6 left)
>>>> > 1125: done (5 left)
>>>> > 1126: done (4 left)
>>>> > 1127: done (3 left)
>>>> > 1128: done (2 left)
>>>> > 1129: done (1 left)
>>>> > 1131: read oid 29 snap -1
>>>> > 1131:  expect (ObjNum 28 snap 0 seq_num 28)
>>>> > 1132: read oid 764 snap -1
>>>> > 1132:  expect (ObjNum 763 snap 0 seq_num 763)
>>>> > 1133: read oid 469 snap -1
>>>> > 1133:  expect (ObjNum 468 snap 0 seq_num 468)
>>>> > 1134: write oid 243 current snap is 0
>>>> > 1134:  seq_num 1065 ranges
>>>> > {483354=596553,1514502=531232,2509844=632287,3283353=1}
>>>> > 1134:  writing nodew21820-243 from 483354 to 1079907 tid 1
>>>> > 1134:  writing nodew21820-243 from 1514502 to 2045734 tid 2
>>>> > 1134:  writing nodew21820-243 from 2509844 to 3142131 tid 3
>>>> > 1134:  writing nodew21820-243 from 3283353 to 3283354 tid 4
>>>> > 1135: read oid 569 snap -1
>>>> > 1135:  expect (ObjNum 568 snap 0 seq_num 568)
>>>> > 1133: Error: oid 469 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fae71d03700 time
>>>> > 2016-03-17 11:00:02.124951
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7fae7a8ca91d]
>>>> > 4: (()+0x72519) [0x7fae7a8a5519]
>>>> > 5: (()+0x13c178) [0x7fae7a96f178]
>>>> > 6: (()+0x80a4) [0x7fae7a4030a4]
>>>> > 7: (clone()+0x6d) [0x7fae78d7e04d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > nodex:
>>>> >
>>>> > 1024:  finishing write tid 1 to nodex22014-1024
>>>> > 1025:  expect (ObjNum 75 snap 0 seq_num 75)
>>>> > 1024:  finishing write tid 2 to nodex22014-1024
>>>> > 1024:  finishing write tid 3 to nodex22014-1024
>>>> > 1024:  finishing write tid 5 to nodex22014-1024
>>>> > 1024:  finishing write tid 6 to nodex22014-1024
>>>> > update_object_version oid 1024 v 753 (ObjNum 1023 snap 0 seq_num 1023)
>>>> > dirty exists
>>>> > 1024:  left oid 1024 (ObjNum 1023 snap 0 seq_num 1023)
>>>> > 982: done (44 left)
>>>> > 983: done (43 left)
>>>> > 984: done (42 left)
>>>> > 985: done (41 left)
>>>> > 986: done (40 left)
>>>> > 987: done (39 left)
>>>> > 988: done (38 left)
>>>> > 989: done (37 left)
>>>> > 990: done (36 left)
>>>> > 991: done (35 left)
>>>> > 992: done (34 left)
>>>> > 993: done (33 left)
>>>> > 994: done (32 left)
>>>> > 995: done (31 left)
>>>> > 996: done (30 left)
>>>> > 997: done (29 left)
>>>> > 998: done (28 left)
>>>> > 999: done (27 left)
>>>> > 1000: done (26 left)
>>>> > 1001: done (25 left)
>>>> > 1002: done (24 left)
>>>> > 1003: done (23 left)
>>>> > 1004: done (22 left)
>>>> > 1005: done (21 left)
>>>> > 1006: done (20 left)
>>>> > 1007: done (19 left)
>>>> > 1008: done (18 left)
>>>> > 1009: done (17 left)
>>>> > 1010: done (16 left)
>>>> > 1011: done (15 left)
>>>> > 1012: done (14 left)
>>>> > 1013: done (13 left)
>>>> > 1014: done (12 left)
>>>> > 1015: done (11 left)
>>>> > 1016: done (10 left)
>>>> > 1017: done (9 left)
>>>> > 1018: done (8 left)
>>>> > 1019: done (7 left)
>>>> > 1020: done (6 left)
>>>> > 1021: done (5 left)
>>>> > 1022: done (4 left)
>>>> > 1023: done (3 left)
>>>> > 1024: done (2 left)
>>>> > 1025: done (1 left)
>>>> > 1026: done (0 left)
>>>> > 1027: delete oid 101 current snap is 0
>>>> > 1027: done (0 left)
>>>> > 1028: read oid 156 snap -1
>>>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>>>> > 1029: read oid 691 snap -1
>>>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>>>> > 1030: read oid 282 snap -1
>>>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>>>> > 1031: read oid 979 snap -1
>>>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>>>> > 1032: read oid 203 snap -1
>>>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>>>> > 1033: setattr oid 464 current snap is 0
>>>> > 1032: Error: oid 203 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fafee64a700 time
>>>> > 2016-03-17 10:53:44.291343
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7faff721191d]
>>>> > 4: (()+0x72519) [0x7faff71ec519]
>>>> > 5: (()+0x13c178) [0x7faff72b6178]
>>>> > 6: (()+0x80a4) [0x7faff6d4a0a4]
>>>> > 7: (clone()+0x6d) [0x7faff56c504d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > nodey:
>>>> >
>>>> > 974: done (52 left)
>>>> > 975: done (51 left)
>>>> > 976: done (50 left)
>>>> > 977: done (49 left)
>>>> > 978: done (48 left)
>>>> > 979: done (47 left)
>>>> > 980: done (46 left)
>>>> > 981: done (45 left)
>>>> > 982: done (44 left)
>>>> > 983: done (43 left)
>>>> > 984: done (42 left)
>>>> > 985: done (41 left)
>>>> > 986: done (40 left)
>>>> > 987: done (39 left)
>>>> > 988: done (38 left)
>>>> > 989: done (37 left)
>>>> > 990: done (36 left)
>>>> > 991: done (35 left)
>>>> > 992: done (34 left)
>>>> > 993: done (33 left)
>>>> > 994: done (32 left)
>>>> > 995: done (31 left)
>>>> > 996: done (30 left)
>>>> > 997: done (29 left)
>>>> > 998: done (28 left)
>>>> > 999: done (27 left)
>>>> > 1000: done (26 left)
>>>> > 1001: done (25 left)
>>>> > 1002: done (24 left)
>>>> > 1003: done (23 left)
>>>> > 1004: done (22 left)
>>>> > 1005: done (21 left)
>>>> > 1006: done (20 left)
>>>> > 1007: done (19 left)
>>>> > 1008: done (18 left)
>>>> > 1009: done (17 left)
>>>> > 1010: done (16 left)
>>>> > 1011: done (15 left)
>>>> > 1012: done (14 left)
>>>> > 1013: done (13 left)
>>>> > 1014: done (12 left)
>>>> > 1015: done (11 left)
>>>> > 1016: done (10 left)
>>>> > 1017: done (9 left)
>>>> > 1018: done (8 left)
>>>> > 1019: done (7 left)
>>>> > 1020: done (6 left)
>>>> > 1021: done (5 left)
>>>> > 1022: done (4 left)
>>>> > 1023: done (3 left)
>>>> > 1024: done (2 left)
>>>> > 1025: done (1 left)
>>>> > 1026: done (0 left)
>>>> > 1027: delete oid 101 current snap is 0
>>>> > 1027: done (0 left)
>>>> > 1028: read oid 156 snap -1
>>>> > 1028:  expect (ObjNum 155 snap 0 seq_num 155)
>>>> > 1029: read oid 691 snap -1
>>>> > 1029:  expect (ObjNum 690 snap 0 seq_num 690)
>>>> > 1030: read oid 282 snap -1
>>>> > 1030:  expect (ObjNum 281 snap 0 seq_num 281)
>>>> > 1031: read oid 979 snap -1
>>>> > 1031:  expect (ObjNum 978 snap 0 seq_num 978)
>>>> > 1032: read oid 203 snap -1
>>>> > 1032:  expect (ObjNum 202 snap 0 seq_num 202)
>>>> > 1033: setattr oid 464 current snap is 0
>>>> > 1028: Error: oid 156 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f55fa3c6700 time
>>>> > 2016-03-17 10:53:57.082571
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7f5602f8d91d]
>>>> > 4: (()+0x72519) [0x7f5602f68519]
>>>> > 5: (()+0x13c178) [0x7f5603032178]
>>>> > 6: (()+0x80a4) [0x7f5602ac60a4]
>>>> > 7: (clone()+0x6d) [0x7f560144104d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > nodez:
>>>> >
>>>> > 1014: done (11 left)
>>>> > 1026: delete oid 717 current snap is 0
>>>> > 1015:  finishing write tid 2 to nodez24249-1015
>>>> > 1015:  finishing write tid 4 to nodez24249-1015
>>>> > 1015:  finishing write tid 5 to nodez24249-1015
>>>> > update_object_version oid 1015 v 3003 (ObjNum 1014 snap 0 seq_num
>>>> > 1014) dirty exists
>>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>>> > 1016:  finishing write tid 1 to nodez24249-1016
>>>> > 1016:  finishing write tid 2 to nodez24249-1016
>>>> > 1016:  finishing write tid 3 to nodez24249-1016
>>>> > 1016:  finishing write tid 4 to nodez24249-1016
>>>> > 1016:  finishing write tid 6 to nodez24249-1016
>>>> > 1016:  finishing write tid 7 to nodez24249-1016
>>>> > update_object_version oid 1016 v 1201 (ObjNum 1015 snap 0 seq_num
>>>> > 1015) dirty exists
>>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>>> > 1017:  finishing write tid 1 to nodez24249-1017
>>>> > 1017:  finishing write tid 2 to nodez24249-1017
>>>> > 1017:  finishing write tid 3 to nodez24249-1017
>>>> > 1017:  finishing write tid 5 to nodez24249-1017
>>>> > 1017:  finishing write tid 6 to nodez24249-1017
>>>> > update_object_version oid 1017 v 3007 (ObjNum 1016 snap 0 seq_num
>>>> > 1016) dirty exists
>>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>>> > 1018:  finishing write tid 1 to nodez24249-1018
>>>> > 1018:  finishing write tid 2 to nodez24249-1018
>>>> > 1018:  finishing write tid 3 to nodez24249-1018
>>>> > 1018:  finishing write tid 4 to nodez24249-1018
>>>> > 1018:  finishing write tid 6 to nodez24249-1018
>>>> > 1018:  finishing write tid 7 to nodez24249-1018
>>>> > update_object_version oid 1018 v 1283 (ObjNum 1017 snap 0 seq_num
>>>> > 1017) dirty exists
>>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>>> > 1019:  finishing write tid 1 to nodez24249-1019
>>>> > 1019:  finishing write tid 2 to nodez24249-1019
>>>> > 1019:  finishing write tid 3 to nodez24249-1019
>>>> > 1019:  finishing write tid 5 to nodez24249-1019
>>>> > 1019:  finishing write tid 6 to nodez24249-1019
>>>> > update_object_version oid 1019 v 999 (ObjNum 1018 snap 0 seq_num 1018)
>>>> > dirty exists
>>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>>> > 1020:  finishing write tid 1 to nodez24249-1020
>>>> > 1020:  finishing write tid 2 to nodez24249-1020
>>>> > 1020:  finishing write tid 3 to nodez24249-1020
>>>> > 1020:  finishing write tid 5 to nodez24249-1020
>>>> > 1020:  finishing write tid 6 to nodez24249-1020
>>>> > update_object_version oid 1020 v 813 (ObjNum 1019 snap 0 seq_num 1019)
>>>> > dirty exists
>>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>>> > 1021:  finishing write tid 1 to nodez24249-1021
>>>> > 1021:  finishing write tid 2 to nodez24249-1021
>>>> > 1021:  finishing write tid 3 to nodez24249-1021
>>>> > 1021:  finishing write tid 5 to nodez24249-1021
>>>> > 1021:  finishing write tid 6 to nodez24249-1021
>>>> > update_object_version oid 1021 v 1038 (ObjNum 1020 snap 0 seq_num
>>>> > 1020) dirty exists
>>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>>> > 1022:  finishing write tid 1 to nodez24249-1022
>>>> > 1022:  finishing write tid 2 to nodez24249-1022
>>>> > 1022:  finishing write tid 3 to nodez24249-1022
>>>> > 1022:  finishing write tid 5 to nodez24249-1022
>>>> > 1022:  finishing write tid 6 to nodez24249-1022
>>>> > update_object_version oid 1022 v 781 (ObjNum 1021 snap 0 seq_num 1021)
>>>> > dirty exists
>>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>>> > 1023:  finishing write tid 1 to nodez24249-1023
>>>> > 1023:  finishing write tid 2 to nodez24249-1023
>>>> > 1023:  finishing write tid 3 to nodez24249-1023
>>>> > 1023:  finishing write tid 5 to nodez24249-1023
>>>> > 1023:  finishing write tid 6 to nodez24249-1023
>>>> > update_object_version oid 1023 v 1537 (ObjNum 1022 snap 0 seq_num
>>>> > 1022) dirty exists
>>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>>> > 1024:  finishing write tid 1 to nodez24249-1024
>>>> > 1025: Error: oid 230 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fd9bb7fe700 time
>>>> > 2016-03-17 10:53:41.757921
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> >  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> >  2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> >  3: (()+0x9791d) [0x7fd9d088d91d]
>>>> >  4: (()+0x72519) [0x7fd9d0868519]
>>>> >  5: (()+0x13c178) [0x7fd9d0932178]
>>>> >  6: (()+0x80a4) [0x7fd9d03c60a4]
>>>> >  7: (clone()+0x6d) [0x7fd9ced4104d]
>>>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> >
>>>> > nodezz:
>>>> >
>>>> > 1015:  finishing write tid 1 to nodezz25161-1015
>>>> > 1015:  finishing write tid 2 to nodezz25161-1015
>>>> > 1015:  finishing write tid 4 to nodezz25161-1015
>>>> > 1015:  finishing write tid 5 to nodezz25161-1015
>>>> > update_object_version oid 1015 v 900 (ObjNum 1014 snap 0 seq_num 1014)
>>>> > dirty exists
>>>> > 1015:  left oid 1015 (ObjNum 1014 snap 0 seq_num 1014)
>>>> > 1016:  finishing write tid 1 to nodezz25161-1016
>>>> > 1016:  finishing write tid 2 to nodezz25161-1016
>>>> > 1016:  finishing write tid 3 to nodezz25161-1016
>>>> > 1016:  finishing write tid 4 to nodezz25161-1016
>>>> > 1016:  finishing write tid 6 to nodezz25161-1016
>>>> > 1016:  finishing write tid 7 to nodezz25161-1016
>>>> > update_object_version oid 1016 v 1021 (ObjNum 1015 snap 0 seq_num
>>>> > 1015) dirty exists
>>>> > 1016:  left oid 1016 (ObjNum 1015 snap 0 seq_num 1015)
>>>> > 1017:  finishing write tid 1 to nodezz25161-1017
>>>> > 1017:  finishing write tid 2 to nodezz25161-1017
>>>> > 1017:  finishing write tid 3 to nodezz25161-1017
>>>> > 1017:  finishing write tid 5 to nodezz25161-1017
>>>> > 1017:  finishing write tid 6 to nodezz25161-1017
>>>> > update_object_version oid 1017 v 3011 (ObjNum 1016 snap 0 seq_num
>>>> > 1016) dirty exists
>>>> > 1017:  left oid 1017 (ObjNum 1016 snap 0 seq_num 1016)
>>>> > 1018:  finishing write tid 1 to nodezz25161-1018
>>>> > 1018:  finishing write tid 2 to nodezz25161-1018
>>>> > 1018:  finishing write tid 3 to nodezz25161-1018
>>>> > 1018:  finishing write tid 4 to nodezz25161-1018
>>>> > 1018:  finishing write tid 6 to nodezz25161-1018
>>>> > 1018:  finishing write tid 7 to nodezz25161-1018
>>>> > update_object_version oid 1018 v 1099 (ObjNum 1017 snap 0 seq_num
>>>> > 1017) dirty exists
>>>> > 1018:  left oid 1018 (ObjNum 1017 snap 0 seq_num 1017)
>>>> > 1019:  finishing write tid 1 to nodezz25161-1019
>>>> > 1019:  finishing write tid 2 to nodezz25161-1019
>>>> > 1019:  finishing write tid 3 to nodezz25161-1019
>>>> > 1019:  finishing write tid 5 to nodezz25161-1019
>>>> > 1019:  finishing write tid 6 to nodezz25161-1019
>>>> > update_object_version oid 1019 v 1300 (ObjNum 1018 snap 0 seq_num
>>>> > 1018) dirty exists
>>>> > 1019:  left oid 1019 (ObjNum 1018 snap 0 seq_num 1018)
>>>> > 1020:  finishing write tid 1 to nodezz25161-1020
>>>> > 1020:  finishing write tid 2 to nodezz25161-1020
>>>> > 1020:  finishing write tid 3 to nodezz25161-1020
>>>> > 1020:  finishing write tid 5 to nodezz25161-1020
>>>> > 1020:  finishing write tid 6 to nodezz25161-1020
>>>> > update_object_version oid 1020 v 1324 (ObjNum 1019 snap 0 seq_num
>>>> > 1019) dirty exists
>>>> > 1020:  left oid 1020 (ObjNum 1019 snap 0 seq_num 1019)
>>>> > 1021:  finishing write tid 1 to nodezz25161-1021
>>>> > 1021:  finishing write tid 2 to nodezz25161-1021
>>>> > 1021:  finishing write tid 3 to nodezz25161-1021
>>>> > 1021:  finishing write tid 5 to nodezz25161-1021
>>>> > 1021:  finishing write tid 6 to nodezz25161-1021
>>>> > update_object_version oid 1021 v 890 (ObjNum 1020 snap 0 seq_num 1020)
>>>> > dirty exists
>>>> > 1021:  left oid 1021 (ObjNum 1020 snap 0 seq_num 1020)
>>>> > 1022:  finishing write tid 1 to nodezz25161-1022
>>>> > 1022:  finishing write tid 2 to nodezz25161-1022
>>>> > 1022:  finishing write tid 3 to nodezz25161-1022
>>>> > 1022:  finishing write tid 5 to nodezz25161-1022
>>>> > 1022:  finishing write tid 6 to nodezz25161-1022
>>>> > update_object_version oid 1022 v 464 (ObjNum 1021 snap 0 seq_num 1021)
>>>> > dirty exists
>>>> > 1022:  left oid 1022 (ObjNum 1021 snap 0 seq_num 1021)
>>>> > 1023:  finishing write tid 1 to nodezz25161-1023
>>>> > 1023:  finishing write tid 2 to nodezz25161-1023
>>>> > 1023:  finishing write tid 3 to nodezz25161-1023
>>>> > 1023:  finishing write tid 5 to nodezz25161-1023
>>>> > 1023:  finishing write tid 6 to nodezz25161-1023
>>>> > update_object_version oid 1023 v 1516 (ObjNum 1022 snap 0 seq_num
>>>> > 1022) dirty exists
>>>> > 1023:  left oid 1023 (ObjNum 1022 snap 0 seq_num 1022)
>>>> > 1024:  finishing write tid 1 to nodezz25161-1024
>>>> > 1024:  finishing write tid 2 to nodezz25161-1024
>>>> > 1025: Error: oid 219 read returned error code -2
>>>> > ./test/osd/RadosModel.h: In function 'virtual void
>>>> > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fbb1bfff700 time
>>>> > 2016-03-17 10:53:53.071338
>>>> > ./test/osd/RadosModel.h: 1109: FAILED assert(0)
>>>> > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x76) [0x4db956]
>>>> > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c]
>>>> > 3: (()+0x9791d) [0x7fbb30ff191d]
>>>> > 4: (()+0x72519) [0x7fbb30fcc519]
>>>> > 5: (()+0x13c178) [0x7fbb31096178]
>>>> > 6: (()+0x80a4) [0x7fbb30b2a0a4]
>>>> > 7: (clone()+0x6d) [0x7fbb2f4a504d]
>>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> > needed to interpret this.
>>>> > terminate called after throwing an instance of 'ceph::FailedAssertion'
>>>> > Aborted
>>>> > ----------------
>>>> > Robert LeBlanc
>>>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>> >
>>>> >
>>>> > On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>>>> >> On Thu, 17 Mar 2016, Robert LeBlanc wrote:
>>>> >>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> >>> Hash: SHA256
>>>> >>>
>>>> >>> I'm having trouble finding documentation about using ceph_test_rados. Can I
>>>> >>> run this on the existing cluster and will that provide useful info? It seems
>>>> >>>  running it in the build will not have the caching set up (vstart.sh).
>>>> >>>
>>>> >>> I have accepted a job with another company and only have until Wednesday to
>>>> >>> help with getting information about this bug. My new job will not be using C
>>>> >>> eph, so I won't be able to provide any additional info after Tuesday. I want
>>>> >>>  to leave the company on a good trajectory for upgrading, so any input you c
>>>> >>> an provide will be helpful.
>>>> >>
>>>> >> I'm sorry to hear it!  You'll be missed.  :)
>>>> >>
>>>> >>> I've found:
>>>> >>>
>>>> >>> ./ceph_test_rados --op read 100 --op write 100 --op delete 50
>>>> >>> - --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000
>>>> >>> - --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600
>>>> >>> - --op copy_from 50 --op snap_create 50 --op snap_remove 50 --op
>>>> >>> rollback 50 --op setattr 25 --op rmattr 25 --pool unique_pool_0
>>>> >>>
>>>> >>> Is that enough if I change --pool to the cached pool and do the toggling whi
>>>> >>> le ceph_test_rados is running? I think this will run for 10 minutes.
>>>> >>
>>>> >> Precisely.  You can probably drop copy_from and snap ops from the list
>>>> >> since your workload wasn't exercising those.
>>>> >>
>>>> >> Thanks!
>>>> >> sage
>>>> >>
>>>> >>
>>>>
>>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux