Re: [Jewel] Crash Osd with void Hit_set_trim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, Oct 23, 2017 at 4:51 PM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:

Hello,

Le 23/10/2017 à 02:05, Brad Hubbard a écrit :
2017-10-22 17:32:56.031086 7f3acaff5700  1 osd.14 pg_epoch: 72024 pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13 ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0 lpr=72022 crt=71593'41657 lcod 0'
0 mlcod 0'0 active+clean] hit_set_trim 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31 01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found
2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&, unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105
osd/ReplicatedPG.cc: 11782: FAILED assert(obc)

It appears to be looking for (and failing to find) a hitset object with a timestamp from August? Does that sound right to you? Of course, it appears an object for that timestamp does not exist.

How is-it possible ? How to fix it. I am sure, if I run a lot of read, other objects like this will crash other osd.
(Cluster is OK now, I will probably destroy OSD 14 and recreate it).
How to find this object ?

You should be able to do a find on the OSDs filestore and grep the output for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs responsible for pg 37.1c and then move on to the others if it's feasible.

Let us know the results.


For information : All ceph server are NTP time synchrone.

What are the settings for this cache tier?

Just Tier in "backwrite" on erasure pool 2+1.

# ceph osd pool get cache-nvme-data all
size: 3
min_size: 2
crash_replay_interval: 0
pg_num: 512
pgp_num: 512
crush_ruleset: 10
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
hit_set_type: bloom
hit_set_period: 14400
hit_set_count: 12
hit_set_fpp: 0.05
use_gmt_hitset: 1
auid: 0
target_max_objects: 1000000
target_max_bytes: 100000000000
cache_target_dirty_ratio: 0.4
cache_target_dirty_high_ratio: 0.6
cache_target_full_ratio: 0.8
cache_min_flush_age: 600
cache_min_evict_age: 1800
min_read_recency_for_promote: 1
min_write_recency_for_promote: 1
fast_read: 0
hit_set_grade_decay_rate: 0
hit_set_search_last_n: 0

#  ceph osd pool get raid-2-1-data all
size: 3
min_size: 2
crash_replay_interval: 0
pg_num: 1024
pgp_num: 1024
crush_ruleset: 8
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
auid: 0
erasure_code_profile: raid-2-1
min_write_recency_for_promote: 0
fast_read: 0

# ceph osd erasure-code-profile get raid-2-1
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Could you check your logs for any errors from the 'agent_load_hit_sets' function?

join log : #  pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04 ssh -x  %h 'zgrep -B10 -A10 agent_load_hit_sets  /var/log/ceph/ceph-osd.*gz'|less > log_agent_load_hit_sets.log

On 19 October, I restarted on morning OSD 14.

thanks for your help.

regards,


On Mon, Oct 23, 2017 at 2:41 AM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:

Hello,

I ran today a lot read IO with an simple rsync... and again, an OSD crashed :

But as before, I can't restart OSD. It continue crashing again. So OSD is out, cluster is recovering.

I had just time to increase OSD log.

# ceph tell osd.14 injectargs --debug-osd 5/5

Join log :

# grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log

If I ran another read, an other OSD willl probably crash.

Any Idee ?

I will probably plan to move data from erasure pool to replicat 3x pool. It's becoming unstable without any change.

Regards,

PS: Last sunday, I lost RBD header during remove of cache tier... a lot of thanks to http://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/, to recreate it and resurrect RBD disk :)

Le 19/10/2017 à 00:19, Brad Hubbard a écrit :
On Wed, Oct 18, 2017 at 11:16 PM, pascal.pucci@xxxxxxxxxxxxxxx
<pascal.pucci@xxxxxxxxxxxxxxx> wrote:
hello,

For 2 week, I lost sometime some OSD :
Here trace :

    0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In
function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&,
unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962
osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
Can you try to capture a log with debug_osd set to 10 or greater as
per http://tracker.ceph.com/issues/19185 ?

This will allow us to see the output from the
PrimaryLogPG::get_object_context() function which may help identify
the problem.

Please also check your machines all have the same time zone set and
their clocks are in sync.

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x55eec15a09e5]
 2: (ReplicatedPG::hit_set_trim(std::unique_ptr<ReplicatedPG::OpContext,
std::default_delete<ReplicatedPG::OpContext> >&, unsigned int)+0x6dd)
[0x55eec107a52d]
 3: (ReplicatedPG::hit_set_persist()+0xd7c) [0x55eec107d1bc]
 4: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1a92)
[0x55eec109bbe2]
 5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x747) [0x55eec10588a7]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad]
 7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d)
[0x55eec0f0bdfd]
 8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x77b) [0x55eec0f0f7db]
 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887)
[0x55eec1590987]
 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eec15928f0]
 11: (()+0x7e25) [0x7f7c4fd52e25]
 12: (clone()+0x6d) [0x7f7c4e3dc34d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

I am using Jewel 10.2.10

I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3
replica with simple RBD disk.
(12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x
NVMe Osd (I split NVMe in 2).
Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed
cache and recreated It.
>From this days, it work fine. Today, an OSD crahed. But it was not an NVME
OSD this time, a normal OSD (sata).

Any idee ? what about this void "*ReplicatedPG::hit_set_trim".

*thanks for your help,*
*
Regards,





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                    






--
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux