Re: Troubleshoot blocked OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Load on all nodes is 1.04 to 1.07

I am updating now to Jewel 10.2 (from 9.2)

This is CephFS with SSD journals.

 

Hopefully the update to jewel fixes lots.

 

 

Brian Andrus

ITACS/Research Computing

Naval Postgraduate School

Monterey, California

voice: 831-656-6238

 

 

 

From: Lincoln Bryant [mailto:lincolnb@xxxxxxxxxxxx]
Sent: Thursday, April 28, 2016 12:56 PM
To: Andrus, Brian Contractor
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] Troubleshoot blocked OSDs

 

OK, a few more questions.

 

What does the load look like on the OSDs with ‘iostat’ during the rsync?

 

What version of Ceph? Are you using RBD, CephFS, something else? 

 

SSD journals or no?

 

—Lincoln

 

On Apr 28, 2016, at 2:53 PM, Andrus, Brian Contractor <bdandrus@xxxxxxx> wrote:

 

Lincoln,

 

That was the odd thing to me. Ceph health detail listed all 4 OSDs, so I checked all the systems.

I have since let it settle until it is OK again and started. Within a couple minutes, it started showing blocked requests and they are indeed on all 4 OSDs.

 

Brian Andrus

ITACS/Research Computing

Naval Postgraduate School

Monterey, California

voice: 831-656-6238

 

 

 

From: Lincoln Bryant [mailto:lincolnb@xxxxxxxxxxxx] 
Sent: Thursday, April 28, 2016 12:31 PM
To: Andrus, Brian Contractor
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] Troubleshoot blocked OSDs

 

Hi Brian,

 

The first thing you can do is “ceph health detail”, which should give you some more information about which OSD(s) have blocked requests.

 

If it’s isolated to one OSD in particular, perhaps use iostat to check utilization and/or smartctl to check health. 

 

—Lincoln

 

On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor <bdandrus@xxxxxxx> wrote:

 

All,

 

I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems.

I was rsyncing about 50TB of files and things get very slow. To the point I stopped the rsync, but even with everything stopped, I see:

 

health HEALTH_WARN

            80 requests are blocked > 32 sec

 

The number was as high as 218, but they seem to be draining down.

I see no issues on any of the systems, CPU load is low, memory usage is low.

 

How do I go about finding why a request is blocked for so long? These have been hitting >500 seconds for block time.

 

Brian Andrus

ITACS/Research Computing

Naval Postgraduate School

Monterey, California

voice: 831-656-6238

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux