Re: recoverying from 95% full osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, January 8, 2013 at 10:52 PM, Sage Weil wrote:
> On Wed, 9 Jan 2013, Roman Hlynovskiy wrote:
> > Thanks a lot Greg,
> > 
> > that was the black magic command I was looking for )
> > 
> > I deleted some obsolete data and reached those figures:
> > 
> > chef@cephgw:~$ ./clu.sh (http://clu.sh) exec "df -kh"|grep osd
> > /dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0
> > /dev/mapper/vg00-osd 252G 180G 73G 72% /var/lib/ceph/osd/ceph-1
> > /dev/mapper/vg00-osd 252G 213G 40G 85% /var/lib/ceph/osd/ceph-2
> > 
> > which in comparison to previous one:
> > 
> > /dev/mapper/vg00-osd 252G 173G 80G 69% /var/lib/ceph/osd/ceph-0
> > /dev/mapper/vg00-osd 252G 203G 50G 81% /var/lib/ceph/osd/ceph-1
> > /dev/mapper/vg00-osd 252G 240G 13G 96% /var/lib/ceph/osd/ceph-2
> > 
> > show that 20gig were removed from osd-1, 23gig from osd-2 and 27gig from osd-3.
> > So, cleaned up space also has some disproportion.
> > 
> > at the same time:
> > chef@cephgw:~$ ceph osd tree
> > 
> > # id weight type name up/down reweight
> > -1 3 pool default
> > -3 3 rack unknownrack
> > -2 1 host ceph-node01
> > 0 1 osd.0 up 1
> > -4 1 host ceph-node02
> > 1 1 osd.1 up 1
> > -5 1 host ceph-node03
> > 2 1 osd.2 up 1
> > 
> > 
> > all osd weights are the same. I guess there is no automatic way to
> > balance storage usage for my case and I have to play with osd weights
> > using 'ceph osd reweight-by-utilization xx' until storage is used more
> > or less equally and when get the weights back to 1?
> 
> 
> 
> How many pgs do you have? ('ceph osd dump | grep ^pool').

I believe this is it. 384 PGs, but three pools of which only one (or maybe a second one, sort of) is in use. Automatically setting the right PG counts is coming some day, but until then being able to set up pools of the right size is a big gotcha. :(
Depending on how mutable the data is, recreate with larger PG counts on the pools in use. Otherwise we can do something more detailed.
-Greg
 
> 
> You might also adjust the crush tunables, see
> 
> http://ceph.com/docs/master/rados/operations/crush-map/?highlight=tunable#tunables
> 
> sage
> 
> > 
> > 
> > 
> > 2013/1/8 Gregory Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)>:
> > > On Tue, Jan 8, 2013 at 2:42 AM, Roman Hlynovskiy
> > > <roman.hlynovskiy@xxxxxxxxx (mailto:roman.hlynovskiy@xxxxxxxxx)> wrote:
> > > > Hello,
> > > > 
> > > > I am running ceph v0.56 and at the moment trying to recover ceph which
> > > > got completely stuck after 1 osd got filled by 95%. Looks like the
> > > > distribution algorithm is not perfect since all 3 OSD's I user are
> > > > 256Gb each, however one of them got filled faster than others:
> > > > 
> > > > osd-1:
> > > > Filesystem Size Used Avail Use% Mounted on
> > > > /dev/mapper/vg00-osd 252G 173G 80G 69% /var/lib/ceph/osd/ceph-0
> > > > 
> > > > osd-2:
> > > > Filesystem Size Used Avail Use% Mounted on
> > > > /dev/mapper/vg00-osd 252G 203G 50G 81% /var/lib/ceph/osd/ceph-1
> > > > 
> > > > osd-3:
> > > > Filesystem Size Used Avail Use% Mounted on
> > > > /dev/mapper/vg00-osd 252G 240G 13G 96% /var/lib/ceph/osd/ceph-2
> > > > 
> > > > 
> > > > by the moment mds is showing the following behaviour:
> > > > 2013-01-08 16:25:47.006354 b4a73b70 0 mds.0.objecter FULL, paused
> > > > modify 0x9ba63c0 tid 23448
> > > > 2013-01-08 16:26:47.005211 b4a73b70 0 mds.0.objecter FULL, paused
> > > > modify 0xca86c30 tid 23449
> > > > 
> > > > so, it does not respond to any mount requests
> > > > 
> > > > I've played around with all types of commands like:
> > > > ceph mon tell \* injectargs '--mon-osd-full-ratio 98'
> > > > ceph mon tell \* injectargs '--mon-osd-full-ratio 0.98'
> > > > 
> > > > and
> > > > 
> > > > 'mon osd full ratio = 0.98' in mon configuration for each mon
> > > > 
> > > > however
> > > > 
> > > > chef@ceph-node03:/var/log/ceph$ ceph health detail
> > > > HEALTH_ERR 1 full osd(s)
> > > > osd.2 is full at 95%
> > > > 
> > > > mds still believes 95% is the threshold, so no responses to mount requests.
> > > > 
> > > > chef@ceph-node03:/var/log/ceph$ rados -p data bench 10 write
> > > > Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> > > > Object prefix: benchmark_data_ceph-node03_3903
> > > > 2013-01-08 16:33:02.363206 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa467ff0 tid 1
> > > > 2013-01-08 16:33:02.363618 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa468780 tid 2
> > > > 2013-01-08 16:33:02.363741 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa468f88 tid 3
> > > > 2013-01-08 16:33:02.364056 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa469348 tid 4
> > > > 2013-01-08 16:33:02.364171 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa469708 tid 5
> > > > 2013-01-08 16:33:02.365024 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa469ac8 tid 6
> > > > 2013-01-08 16:33:02.365187 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46a2d0 tid 7
> > > > 2013-01-08 16:33:02.365296 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46a690 tid 8
> > > > 2013-01-08 16:33:02.365402 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46aa50 tid 9
> > > > 2013-01-08 16:33:02.365508 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46ae10 tid 10
> > > > 2013-01-08 16:33:02.365635 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46b1d0 tid 11
> > > > 2013-01-08 16:33:02.365742 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46b590 tid 12
> > > > 2013-01-08 16:33:02.365868 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46b950 tid 13
> > > > 2013-01-08 16:33:02.365975 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46bd10 tid 14
> > > > 2013-01-08 16:33:02.366096 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46c0d0 tid 15
> > > > 2013-01-08 16:33:02.366203 b6be3710 0 client.9958.objecter FULL,
> > > > paused modify 0xa46c490 tid 16
> > > > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> > > > 0 16 16 0 0 0 - 0
> > > > 1 16 16 0 0 0 - 0
> > > > 2 16 16 0 0 0 - 0
> > > > 
> > > > rados doesn't work.
> > > > 
> > > > chef@ceph-node03:/var/log/ceph$ ceph osd reweight-by-utilization
> > > > no change: average_util: 0.812678, overload_util: 0.975214. overloaded
> > > > osds: (none)
> > > > 
> > > > this one also.
> > > > 
> > > > 
> > > > is there any chance to recover ceph?
> > > 
> > > "ceph pg set_full_ratio 0.98"
> > > 
> > > However, as Mark mentioned, you want to figure out why one OSD is so
> > > much fuller than the others first. Even in a small cluster I don't
> > > think you should be able to see that kind of variance. Simply setting
> > > the full ratio to 98% and then continuing to run could cause bigger
> > > problems if that OSD continues to get a disproportionate share of the
> > > writes and fills up its disk.
> > > -Greg
> > 
> > 
> > 
> > 
> > 
> > -- 
> > ...WBR, Roman Hlynovskiy
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx)
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux