How to prevent blocked requests?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey friends,

a month ago i had an issue with few blocked requests where some of my VMs did freeze while this happened. I guessed the culprit was a spinning disk with a lot of "delayed ECC" (showed via smartctl: 48701).

So we decided to take this osd down/out to do some checks. After this blocked requests were gone and we got no more freezes.

Btw, this is related to the mentioned blocked requests
*dmesg* on the Server produced (two times)
[4927177.901845] INFO: task filestore_sync:5907 blocked for more than 120 seconds. [4927177.902147] Tainted: G I 4.4.0-43-generic #63-Ubuntu [4927177.902416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4927177.902735] filestore_sync D ffff8810073e3e00 0 5907 1 0x00000000 [4927177.902741] ffff8810073e3e00 ffff88102a1f0db8 ffff8810367fb700 ffff8810281b0dc0 [4927177.902745] ffff8810073e4000 ffff88102a1f0de8 ffff88102a1f0a98 ffff8810073e3e8c [4927177.902748] 00005638fa13e000 ffff8810073e3e18 ffffffff8182d7c5 ffff8810073e3e8c
[4927177.902751] Call Trace:
[4927177.902764]  [<ffffffff8182d7c5>] schedule+0x35/0x80
[4927177.902771]  [<ffffffff812378e8>] wb_wait_for_completion+0x58/0xa0
[4927177.902779] [<ffffffff810c3dd0>] ? wake_atomic_t_function+0x60/0x60
[4927177.902782]  [<ffffffff8123b2d3>] sync_inodes_sb+0xa3/0x1f0
[4927177.902786]  [<ffffffff812418ea>] sync_filesystem+0x5a/0xa0
[4927177.902789]  [<ffffffff81241a7e>] SyS_syncfs+0x3e/0x70
[4927177.902794] [<ffffffff818318b2>] entry_SYSCALL_64_fastpath+0x16/0x71

Later (after smartctl long check) we put the mentioned osd in again and had also no more issues.

Finaly my Question :)

Is Ceph able to deal with "problematic" disks? How to tune this? Perhaps special timeouts? I mean, let's say ceph cannot read a shard of a pg because there is a "i/o error"? Or..
When a OSD takes too long - like the dmesg output above?
In our setup we are using size of 3, so when a read/write request takes too much time ceph should be able to use another copy of the shard (for reads).

This is my Setup (in production):

*Software/OS*
- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)"

#> ceph tell mon.* version
 [...] ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)

- Ubuntu 16.04.01 LTS on all OSD and MON Server
#> uname -a
Linux server 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

*Server*
4x OSD Server, 3x with

- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no Hyper-Threading
- 64GB RAM
- 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for 12 Disks (20G Journal size)
- 1x Samsung SSD 840/850 Pro only for the OS

and 1x OSD Server with

- 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
- 64GB RAM
- 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
- 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for 24 Disks (15G Journal size)
- 1x Samsung SSD 850 Pro only for the OS

3x MON Server

- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU L5430 @ 2.66GHz ==> 8 Cores, no Hyper-Threading
- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*

- Each Server and Client has 2x 10GB (LACP);
- We do not use Jumbo Frames yet..
- Public and Cluster-Network related Ceph traffic is going through this one active (LACP) 10GB Interface on each Server.

*ceph.conf*
[global]
fsid = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
public_network = xxx.16.0.0/24
cluster_network = xx.0.0.0/24
mon_initial_members = monserver1, monserver2, monserver3
mon_host = xxx.16.0.2,xxx.16.0.3,xxx.16.0.4
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_crush_initial_weight = 0

mon_osd_full_ratio = 0.90
mon_osd_nearfull_ratio = 0.80

[mon]
mon_allow_pool_delete = false

[osd]
#osd_journal_size = 20480
osd_journal_size = 15360

Please ask if you need more information.
Thanks so far.

- Mehmet
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux