On Mon, Oct 23, 2017 at 4:51 PM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:Hello,
Le 23/10/2017 à 02:05, Brad Hubbard a écrit :
How is-it possible ? How to fix it. I am sure, if I run a lot of read, other objects like this will crash other osd.2017-10-22 17:32:56.031086 7f3acaff5700 1 osd.14 pg_epoch: 72024 pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13 ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0 lpr=72022 crt=71593'41657 lcod 0'It appears to be looking for (and failing to find) a hitset object with a timestamp from August? Does that sound right to you? Of course, it appears an object for that timestamp does not exist.
0 mlcod 0'0 active+clean] hit_set_trim 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31 01%3a03%3a24.697717Z_2017-08-3 1 01%3a52%3a34.767197Z:head not found
2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&, unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105
osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
(Cluster is OK now, I will probably destroy OSD 14 and recreate it).
How to find this object ?You should be able to do a find on the OSDs filestore and grep the output for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs responsible for pg 37.1c and then move on to the others if it's feasible.
Let us know the results.
For information : All ceph server are NTP time synchrone.
What are the settings for this cache tier?
Just Tier in "backwrite" on erasure pool 2+1.
# ceph osd pool get cache-nvme-data all
size: 3
min_size: 2
crash_replay_interval: 0
pg_num: 512
pgp_num: 512
crush_ruleset: 10
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
hit_set_type: bloom
hit_set_period: 14400
hit_set_count: 12
hit_set_fpp: 0.05
use_gmt_hitset: 1
auid: 0
target_max_objects: 1000000
target_max_bytes: 100000000000
cache_target_dirty_ratio: 0.4
cache_target_dirty_high_ratio: 0.6
cache_target_full_ratio: 0.8
cache_min_flush_age: 600
cache_min_evict_age: 1800
min_read_recency_for_promote: 1
min_write_recency_for_promote: 1
fast_read: 0
hit_set_grade_decay_rate: 0
hit_set_search_last_n: 0
# ceph osd pool get raid-2-1-data all
size: 3
min_size: 2
crash_replay_interval: 0
pg_num: 1024
pgp_num: 1024
crush_ruleset: 8
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
auid: 0
erasure_code_profile: raid-2-1
min_write_recency_for_promote: 0
fast_read: 0
# ceph osd erasure-code-profile get raid-2-1
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8
Could you check your logs for any errors from the 'agent_load_hit_sets' function?
join log : # pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04 ssh -x %h 'zgrep -B10 -A10 agent_load_hit_sets /var/log/ceph/ceph-osd.*gz'|le ss > log_agent_load_hit_sets.log
On 19 October, I restarted on morning OSD 14.
thanks for your help.
regards,
On Mon, Oct 23, 2017 at 2:41 AM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:
Hello,
I ran today a lot read IO with an simple rsync... and again, an OSD crashed :
But as before, I can't restart OSD. It continue crashing again. So OSD is out, cluster is recovering.
I had just time to increase OSD log.
# ceph tell osd.14 injectargs --debug-osd 5/5
Join log :
# grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log
If I ran another read, an other OSD willl probably crash.
Any Idee ?
I will probably plan to move data from erasure pool to replicat 3x pool. It's becoming unstable without any change.
Regards,
PS: Last sunday, I lost RBD header during remove of cache tier... a lot of thanks to http://fnordahl.com/2017/04/17
/ceph-rbd-volume-header-recove , to recreate it and resurrect RBD disk :)ry/
Le 19/10/2017 à 00:19, Brad Hubbard a écrit :
On Wed, Oct 18, 2017 at 11:16 PM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:hello, For 2 week, I lost sometime some OSD : Here trace : 0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&, unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962 osd/ReplicatedPG.cc: 11782: FAILED assert(obc) Can you try to capture a log with debug_osd set to 10 or greater as per http://tracker.ceph.com/issues/19185 ? This will allow us to see the output from the PrimaryLogPG::get_object_context() function which may help identify the problem. Please also check your machines all have the same time zone set and their clocks are in sync. ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55eec15a09e5] 2: (ReplicatedPG::hit_set_trim(st d::unique_ptr<ReplicatedPG::Op Context, std::default_delete<Replicated PG::OpContext> >&, unsigned int)+0x6dd) [0x55eec107a52d] 3: (ReplicatedPG::hit_set_persist ()+0xd7c) [0x55eec107d1bc] 4: (ReplicatedPG::do_op(std::shar ed_ptr<OpRequest>&)+0x1a92) [0x55eec109bbe2] 5: (ReplicatedPG::do_request(std: :shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x747) [0x55eec10588a7] 6: (OSD::dequeue_op(boost::intrus ive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad] 7: (PGQueueable::RunVis::operator ()(std::shared_ptr<OpRequest>& )+0x6d) [0x55eec0f0bdfd] 8: (OSD::ShardedOpWQ::_process(un signed int, ceph::heartbeat_handle_d*)+0x7 7b) [0x55eec0f0f7db] 9: (ShardedThreadPool::shardedthr eadpool_worker(unsigned int)+0x887) [0x55eec1590987] 10: (ShardedThreadPool::WorkThread Sharded::entry()+0x10) [0x55eec15928f0] 11: (()+0x7e25) [0x7f7c4fd52e25] 12: (clone()+0x6d) [0x7f7c4e3dc34d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. I am using Jewel 10.2.10 I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3 replica with simple RBD disk. (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x NVMe Osd (I split NVMe in 2). Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed cache and recreated It. >From this days, it work fine. Today, an OSD crahed. But it was not an NVME OSD this time, a normal OSD (sata). Any idee ? what about this void "*ReplicatedPG::hit_set_trim". *thanks for your help,* * Regards, ______________________________ _________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo .cgi/ceph-users-ceph.com
--Cheers,
Brad
--
Brad
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com