Re: ONE pg deep-scrub blocks cluster

Christian Balzer <chibi@xxxxxxx> · Fri, 29 Jul 2016 10:05:06 +0900

Hello,

On Thu, 28 Jul 2016 14:46:58 +0200 c wrote:

> Hello Ceph alikes :)
> 
> i have a strange issue with one PG (0.223) combined with "deep-scrub".
> 
> Always when ceph - or I manually - run a " ceph pg deep-scrub 0.223 ", 
> this leads to many "slow/block requests" so that nearly all of my VMs 
> stop working for a while.
> 
For the record, this ONLY happens with this PG and no others that share
the same OSDs, right?

If so then we're looking at something (HDD or FS wise) that's specific to
the data of this PG.

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and see if a
particular OSD (HDD) stands out, as I would expect it to.

Since you already removed osd.4 with the same result, continue to cycle
through the other OSDs.
Running a fsck on the (out) OSDs might be helpful, too.

Christian

> This happens only to this one PG 0.223 and in combination with 
> deep-scrub (!). All other Placement Groups where a deep-scrub occurs are 
> fine. The mentioned PG also works fine when a "normal scrub" occurs.
> 
> These OSDs are involved:
> 
> #> ceph pg map 0.223
> osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]
> 
> *The LogFiles*
> 
> "deep-scrub" starts @ 2016-07-28 12:44:00.588542 and takes approximately 
> 12 Minutes (End: 2016-07-28 12:56:31.891165)
> - ceph.log: http://pastebin.com/FSY45VtM
> 
> I have done " ceph tell osd injectargs '--debug-osd = 5/5' " for the 
> related OSDs 4,16 and 28
> 
> LogFile - osd.4
> - ceph-osd.4.log: http://slexy.org/view/s20zzAfxFH
> 
> LogFile - osd.16
> - ceph-osd.16.log: http://slexy.org/view/s25H3Zvkb0
> 
> LogFile - osd.28
> - ceph-osd.28.log: http://slexy.org/view/s21Ecpwd70
> 
> I have checked the disks 4,16 and 28 with smartctl and could not any 
> issues - also there are no odd "dmesg" messages.
> 
> *ceph -s*
>      cluster 98a410bf-b823-47e4-ad17-4543afa24992
>       health HEALTH_OK
>       monmap e2: 3 mons at 
> {monitor1=172.16.0.2:6789/0,monitor3=172.16.0.4:6789/0,monitor2=172.16.0.3:6789/0}
>              election epoch 38, quorum 0,1,2 monitor1,monitor2,monitor3
>       osdmap e7047: 30 osds: 30 up, 30 in
>              flags sortbitwise
>        pgmap v3253519: 1024 pgs, 1 pools, 2858 GB data, 692 kobjects
>              8577 GB used, 96256 GB / 102 TB avail
>                  1024 active+clean
>    client io 396 kB/s rd, 3141 kB/s wr, 55 op/s rd, 269 op/s wr
> 
> This is my Setup:
> 
> *Software/OS*
> 
> - Jewel
> #> ceph tell osd.* version | grep version | uniq
> "version": "ceph version 10.2.2 
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
> #> ceph tell mon.* version
> [...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 
> - Ubuntu 16.04 LTS on all OSD and MON Server
> #> uname -a
> Linux galawyn 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> *Server*
> 
> 3x OSD Server, each with
> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
> Hyper-Threading
> - 64GB RAM
> - 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for 
> 10-12 Disks
> - 1x Samsung SSD 840/850 Pro only for the OS
> 
> 3x MON Server
> - Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
> Cores, 8 Threads)
> - The third one has 2x Intel(R) Xeon(R) CPU L5430  @ 2.66GHz ==> 8 
> Cores, no Hyper-Threading
> - 32 GB RAM
> - 1x Raid 10 (4 Disks)
> 
> *Network*
> - Each Server and Client has an active connection @ 1x 10GB; A second 
> connection is also connected via 10GB but provides only a Backup 
> connection when the active Switch fails - no LACP possible.
> - We do not use Jumbo Frames yet..
> - Public and Cluster-Network related Ceph traffic is going through this 
> one active 10GB Interface on each Server.
> 
> Any ideas what is going on?
> Can I provide more input to find a solution?
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com