Hey friends,
a month ago i had an issue with few blocked requests where some of my
VMs did freeze while this happened.
I guessed the culprit was a spinning disk with a lot of "delayed ECC"
(showed via smartctl: 48701).
So we decided to take this osd down/out to do some checks. After this
blocked requests were gone and we got no more freezes.
Btw, this is related to the mentioned blocked requests
*dmesg* on the Server produced (two times)
[4927177.901845] INFO: task filestore_sync:5907 blocked for more than
120 seconds.
[4927177.902147] Tainted: G I 4.4.0-43-generic
#63-Ubuntu
[4927177.902416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4927177.902735] filestore_sync D ffff8810073e3e00 0 5907 1
0x00000000
[4927177.902741] ffff8810073e3e00 ffff88102a1f0db8 ffff8810367fb700
ffff8810281b0dc0
[4927177.902745] ffff8810073e4000 ffff88102a1f0de8 ffff88102a1f0a98
ffff8810073e3e8c
[4927177.902748] 00005638fa13e000 ffff8810073e3e18 ffffffff8182d7c5
ffff8810073e3e8c
[4927177.902751] Call Trace:
[4927177.902764] [<ffffffff8182d7c5>] schedule+0x35/0x80
[4927177.902771] [<ffffffff812378e8>] wb_wait_for_completion+0x58/0xa0
[4927177.902779] [<ffffffff810c3dd0>] ?
wake_atomic_t_function+0x60/0x60
[4927177.902782] [<ffffffff8123b2d3>] sync_inodes_sb+0xa3/0x1f0
[4927177.902786] [<ffffffff812418ea>] sync_filesystem+0x5a/0xa0
[4927177.902789] [<ffffffff81241a7e>] SyS_syncfs+0x3e/0x70
[4927177.902794] [<ffffffff818318b2>]
entry_SYSCALL_64_fastpath+0x16/0x71
Later (after smartctl long check) we put the mentioned osd in again and
had also no more issues.
Finaly my Question :)
Is Ceph able to deal with "problematic" disks? How to tune this? Perhaps
special timeouts?
I mean, let's say ceph cannot read a shard of a pg because there is a
"i/o error"? Or..
When a OSD takes too long - like the dmesg output above?
In our setup we are using size of 3, so when a read/write request takes
too much time ceph should be able to use another copy of the shard (for
reads).
This is my Setup (in production):
*Software/OS*
- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.3
(ecc23778eb545d8dd55e2e4735b53cc93f92e65b)"
#> ceph tell mon.* version
[...] ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
- Ubuntu 16.04.01 LTS on all OSD and MON Server
#> uname -a
Linux server 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
*Server*
4x OSD Server, 3x with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
Hyper-Threading
- 64GB RAM
- 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
12 Disks (20G Journal size)
- 1x Samsung SSD 840/850 Pro only for the OS
and 1x OSD Server with
- 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
- 64GB RAM
- 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
- 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
24 Disks (15G Journal size)
- 1x Samsung SSD 850 Pro only for the OS
3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4
Cores, 8 Threads)
- The third one has 2x Intel(R) Xeon(R) CPU L5430 @ 2.66GHz ==> 8 Cores,
no Hyper-Threading
- 32 GB RAM
- 1x Raid 10 (4 Disks)
*Network*
- Each Server and Client has 2x 10GB (LACP);
- We do not use Jumbo Frames yet..
- Public and Cluster-Network related Ceph traffic is going through this
one active (LACP) 10GB Interface on each Server.
*ceph.conf*
[global]
fsid = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
public_network = xxx.16.0.0/24
cluster_network = xx.0.0.0/24
mon_initial_members = monserver1, monserver2, monserver3
mon_host = xxx.16.0.2,xxx.16.0.3,xxx.16.0.4
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_crush_initial_weight = 0
mon_osd_full_ratio = 0.90
mon_osd_nearfull_ratio = 0.80
[mon]
mon_allow_pool_delete = false
[osd]
#osd_journal_size = 20480
osd_journal_size = 15360
Please ask if you need more information.
Thanks so far.
- Mehmet
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com