Hello Jean-Charles!
I have finally catch the problem, It was at 13-02.
[cephuser@storage-ru1-osd3 ~]$ ceph health detail
HEALTH_WARN 18 slow requests are blocked > 32 sec
REQUEST_SLOW 18 slow requests are blocked > 32 sec
3 ops are blocked > 65.536 sec
15 ops are blocked > 32.768 sec
osd.15 has blocked requests > 65.536 sec
[cephuser@storage-ru1-osd3 ~]$
But surprise - there is no information in ceph-osd.15.log that time
2018-05-08 12:54:26.105919 7f003f5f9700 4 rocksdb: (Original Log Time
2018/05/08-12:54:26.105843) EVENT_LOG_v1 {"time_micros":
1525773266105834, "job": 2793, "event": "trivial_move", "dest
ination_level": 3, "files": 1, "total_files_size": 68316970}
2018-05-08 12:54:26.105926 7f003f5f9700 4 rocksdb: (Original Log Time
2018/05/08-12:54:26.105854)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1537]
[default] Moved #1 files to level-3 68316970 bytes OK
: base level 1 max bytes base 268435456 files[0 4 45 403 722 0 0] max
score 0.98
2018-05-08 13:07:29.711425 7f004f619700 4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684]
reusing log 8051 from recycle list
2018-05-08 13:07:29.711497 7f004f619700 4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:725]
[default] New memtable created with log file: #8089. Immutable memtables: 0.
2018-05-08 13:07:29.726107 7f003fdfa700 4 rocksdb: (Original Log Time
2018/05/08-13:07:29.711524)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
Calling FlushMemTableToOutputFile with column family
[default], flush slots available 1, compaction slots allowed 1,
compaction slots scheduled 1
2018-05-08 13:07:29.726124 7f003fdfa700 4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/flush_job.cc:264]
[default] [JOB 2794] Flushing memtable with next log file: 8089
Should I have some deeply logging?
Grigory Murashov
Voximplant
07.05.2018 18:59, Jean-Charles Lopez пишет:
Hi,
ceph health detail
This will tell you which OSDs are experiencing the problem so you can then go and inspect the logs and use the admin socket to find out which requests are at the source.
Regards
JC
On May 7, 2018, at 03:52, Grigory Murashov <murashov@xxxxxxxxxxxxxx> wrote:
Hello!
I'm not much experiensed in ceph troubleshouting that why I ask for help.
I have multiple warnings coming from zabbix as a result of ceph -s
REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec
I don't see any hardware problems that time.
I'm able to find the same strings in ceph.log and ceph-mon.log like
2018-05-07 12:37:57.375546 7f3037dae700 0 log_channel(cluster) log [WRN] : Health check failed: 12 slow requests are blocked > 32 sec (REQUEST_SLOW)
Now It's important to find out the root of the problem.
How to find out:
1. which OSDs are affected
2. which particular requests were slowed and blocked?
I assume I need more detailed logging - how to do that?
Appreciate your help.
--
Grigory Murashov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com