Re: ONE pg deep-scrub blocks cluster

c <ceph@xxxxxxxxxx> · Thu, 28 Jul 2016 17:46:09 +0200

Am 2016-07-28 15:26, schrieb Bill Sharer:
I suspect the data for one or more shards on this osd's underlying
filesystem has a marginally bad sector or sectors.  A read from the
deep scrub may be causing the drive to perform repeated seeks and
reads of the sector until it gets a good read from the filesystem.
You might want to look at the SMART info on the drive or drives in the
RAID set to see what the error counts suggest about this.  You may
also be looking at a drive that's about to fail.

Bill Sharer

Hello Bill,

thank you for reading and answering my eMail :)

As I wrote, I have already checked the disks via "smartctl"

- osd.4: http://slexy.org/view/s2LR5ncr8G
- osd.16: http://slexy.org/view/s2LH6FBcYP
- osd.28: http://slexy.org/view/s21Yod9dUw

Now there is running a long test " smartctl --test long /dev/DISK " on 
all disks - to be really on the safe side. This will take a while.

There is no RAID used for the OSDs!

I have forgot to mention that for a test I had removed (completely) 
"osd.4" from the Cluster and did run " ceph pg deep-scrub 0.223 " again 
with the same result (nearly all of my VMs stop working for a while).

- Mehmet

On 07/28/2016 08:46 AM, c wrote:
Hello Ceph alikes :)

i have a strange issue with one PG (0.223) combined with "deep-scrub".

Always when ceph - or I manually - run a " ceph pg deep-scrub 0.223 ", 
this leads to many "slow/block requests" so that nearly all of my VMs 
stop working for a while.

This happens only to this one PG 0.223 and in combination with 
deep-scrub (!). All other Placement Groups where a deep-scrub occurs 
are fine. The mentioned PG also works fine when a "normal scrub" 
occurs.

These OSDs are involved:

#> ceph pg map 0.223
osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]

*The LogFiles*

"deep-scrub" starts @ 2016-07-28 12:44:00.588542 and takes 
approximately 12 Minutes (End: 2016-07-28 12:56:31.891165)
- ceph.log: http://pastebin.com/FSY45VtM

I have done " ceph tell osd injectargs '--debug-osd = 5/5' " for the 
related OSDs 4,16 and 28

LogFile - osd.4
- ceph-osd.4.log: http://slexy.org/view/s20zzAfxFH

LogFile - osd.16
- ceph-osd.16.log: http://slexy.org/view/s25H3Zvkb0

LogFile - osd.28
- ceph-osd.28.log: http://slexy.org/view/s21Ecpwd70

I have checked the disks 4,16 and 28 with smartctl and could not any 
issues - also there are no odd "dmesg" messages.

*ceph -s*
    cluster 98a410bf-b823-47e4-ad17-4543afa24992
     health HEALTH_OK
     monmap e2: 3 mons at 
{monitor1=172.16.0.2:6789/0,monitor3=172.16.0.4:6789/0,monitor2=172.16.0.3:6789/0}
            election epoch 38, quorum 0,1,2 monitor1,monitor2,monitor3
     osdmap e7047: 30 osds: 30 up, 30 in
            flags sortbitwise
      pgmap v3253519: 1024 pgs, 1 pools, 2858 GB data, 692 kobjects
            8577 GB used, 96256 GB / 102 TB avail
                1024 active+clean
  client io 396 kB/s rd, 3141 kB/s wr, 55 op/s rd, 269 op/s wr

This is my Setup:

*Software/OS*

- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"
#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
Linux galawyn 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 
2016 x86_64 x86_64 x86_64 GNU/Linux

*Server*

3x OSD Server, each with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading
- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device 
for 10-12 Disks
- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
Cores, 8 Threads)
- The third one has 2x Intel(R) Xeon(R) CPU L5430  @ 2.66GHz ==> 8 
Cores, no Hyper-Threading
- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Each Server and Client has an active connection @ 1x 10GB; A second 
connection is also connected via 10GB but provides only a Backup 
connection when the active Switch fails - no LACP possible.
- We do not use Jumbo Frames yet..
- Public and Cluster-Network related Ceph traffic is going through 
this one active 10GB Interface on each Server.

Any ideas what is going on?
Can I provide more input to find a solution?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com