Re: Data not accessible after replacing OSD with larger volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, I checked the ceph logs and I see a ton of messages like this, which seem they could (probably are?) related to the I/O issue:

2017-05-01 11:20:22.096657 7f12b6aa5700  0 -- 10.0.1.1:6846/3810 >> 10.0.33.1:6811/3413 pipe(0x76563600 sd=31 :53523 s=1 pgs=0 cs=1 l=0 c=0x8264ec60).connect claims to be 10.0.33.1:6811/3389 not 10.0.33.1:6811/3413 - wrong node!

2017-05-01 11:20:22.098839 7f12b6aa5700  0 -- 10.0.1.1:6846/3810 >> 10.0.33.1:6811/3413 pipe(0x76563600 sd=31 :53524 s=1 pgs=0 cs=1 l=0 c=0x8264ec60).connect claims to be 10.0.33.1:6811/3389 not 10.0.33.1:6811/3413 - wrong node!

2017-05-01 11:20:22.105574 7f12b6aa5700  0 -- 10.0.1.1:6846/3810 >> 10.0.33.1:6811/3413 pipe(0x76563600 sd=31 :53525 s=1 pgs=0 cs=1 l=0 c=0x8264ec60).connect claims to be 10.0.33.1:6811/3389 not 10.0.33.1:6811/3413 - wrong node!

2017-05-01 11:20:22.108402 7f12b6aa5700  0 -- 10.0.1.1:6846/3810 >> 10.0.33.1:6811/3413 pipe(0x76563600 sd=31 :53526 s=1 pgs=0 cs=1 l=0 c=0x8264ec60).connect claims to be 10.0.33.1:6811/3389 not 10.0.33.1:6811/3413 - wrong node!




Kind Regards,
Scott Lewis
Sr. Developer & Head of Content
Iconfinder Aps


"Helping Designers Make a Living Doing What They Love" 

On Mon, May 1, 2017 at 12:39 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
The crush weight should match the size of your osds. The 100GB osds having 0.090 probably based on GiB vs GB. Your 2TB osds should have a weight of 2.000, or there about.  Your reweight values will be able to go back much closer to 1 once you fix the weights of the larger osds.  Fixing that might allow your cluster to finish backfilling.

How do you access your images? Is it through cephfs, rgw, or rbd? Your current health doesn't look like it should prevent access to your images.  The only thing I can think of other than mds or rgw not running would be to issue a deep scrub on some of the pgs on the newly increased osd to see if there are any inconsistent pgs on it.

On Sun, Apr 30, 2017, 10:40 AM Scott Lewis <scott@xxxxxxxxxxxxxx> wrote:
Hi,

I am a complete n00b to CEPH and cannot seem to figure out why my cluster isn't working as expected. We have 39 OSDs, 36 of which are 100 GB volumes and 3 are 2 TB volumes managed under AWS EC2. 

Yesterday I replaced one of the 100 GB volumes with a new 2 TB volume which includes creating a snapshot, detaching the old volume, attaching the new volume, then using parted to correctly set the start/end of the data partition. This all went smoothly and no issues reported from AWS or the server.

However, when I started reweighting the OSDs, the health status went to HEALTH_WARN with over 500 pgs stuck unclean, and about 14% of objects misplaced. I am adding the health detail, crushmap, and OSD tree here:


We use CEPH to storage our image inventory which is about 5 million or so images. If you do a search on our site, https://iconfinder.com, none of the images is showing up.

This all started after doing the reweights when the new volume was added. I tried setting all of the weights back to their original settings but this did not help.

The only other thing that I changed was to set the max PID threads to the max allowed. I reset this to the original setting but that didn't work either.

sudo sysctl -w kernel.pid_max=32768


Thanks in advance for any help.

Scott Lewis
Sr. Developer & Head of Content
Iconfinder Aps


"Helping Designers Make a Living Doing What They Love" 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux