At this point, it's probably best to delete the pool. I'm assuming the pool only contains benchmark data, and nothing important.
Assuming you can delete the pool:
First, figure out the ID of the data pool. You can get that from ceph osd dump | grep '^pool'
Once you have the number, delete the data pool: rados rmpool data data --yes-i-really-really-mean-it
That will only free up space on OSDs that are up. You'll need to manually some PGs on the OSDs that are 100% full. Go to /var/lib/ceph/osd/ceph-<OSDID>/current, and delete a few directories that start with your data pool ID. You don't need to delete all of them. Once the disk is below 95% full, you should be able to start that OSD. Once it's up, it will finish deleting the pool.
If you can't delete the pool, it is possible, but it's more work, and you still run the risk of losing data if you make a mistake. You need to disable backfilling, then delete some PGs on each OSD that's full. Try to only delete one copy of each PG. If you delete every copy of a PG on all OSDs, then you lost the data that was in that PG. As before, once you delete enough that the disk is less than 95% full, you can start the OSD. Once you start it, start deleting your benchmark data out of the data pool. Once that's done, you can re-enable backfilling. You may need to scrub or deep-scrub the OSDs you deleted data from to get everything back to normal.
So how did you get the disks 100% full anyway? Ceph normally won't let you do that. Did you increase mon_osd_full_ratio, osd_backfill_full_ratio, or osd_failsafe_full_ratio?
On Mon, Nov 17, 2014 at 7:00 AM, han vincent <hangzws@xxxxxxxxx> wrote:
hello, every one:
These days a problem of "ceph" has troubled me for a long time.
I build a cluster with 3 hosts and each host has three osds in it.
And after that
I used the command "rados bench 360 -p data -b 4194304 -t 300 write
--no-cleanup"
to test the write performance of the cluster.
When the cluster is near full, there couldn't write any data to
it. Unfortunately,
there was a host hung up, then a lots of PG was going to migrate to other OSDs.
After a while, a lots of OSD was marked down and out, my cluster couldn't work
any more.
The following is the output of "ceph -s":
cluster 002c3742-ab04-470f-8a7a-ad0658b547d6
health HEALTH_ERR 103 pgs degraded; 993 pgs down; 617 pgs
incomplete; 1008 pgs peering; 12 pgs recovering; 534 pgs stale; 1625
pgs stuck inactive; 534 pgs stuck stale; 1728 pgs stuck unclean;
recovery 945/29649 objects degraded (3.187%); 1 full osd(s); 1 mons
down, quorum 0,2 2,1
monmap e1: 3 mons at
{0=10.0.0.97:6789/0,1=10.0.0.98:6789/0,2=10.0.0.70:6789/0}, election
epoch 40, quorum 0,2 2,1
osdmap e173: 9 osds: 2 up, 2 in
flags full
pgmap v1779: 1728 pgs, 3 pools, 39528 MB data, 9883 objects
37541 MB used, 3398 MB / 40940 MB avail
945/29649 objects degraded (3.187%)
34 stale+active+degraded+remapped
176 stale+incomplete
320 stale+down+peering
53 active+degraded+remapped
408 incomplete
1 active+recovering+degraded
673 down+peering
1 stale+active+degraded
15 remapped+peering
3 stale+active+recovering+degraded+remapped
3 active+degraded
33 remapped+incomplete
8 active+recovering+degraded+remapped
The following is the output of "ceph osd tree":
# id weight type name up/down reweight
-1 9 root default
-3 9 rack unknownrack
-2 3 host 10.0.0.97
0 1 osd.0 down 0
1 1 osd.1 down 0
2 1 osd.2 down 0
-4 3 host 10.0.0.98
3 1 osd.3 down 0
4 1 osd.4 down 0
5 1 osd.5 down 0
-5 3 host 10.0.0.70
6 1 osd.6 up 1
7 1 osd.7 up 1
8 1 osd.8 down 0
The following is part of output os osd.0.log
-3> 2014-11-14 17:33:02.166022 7fd9dd1ab700 0
filestore(/data/osd/osd.0) error (28) No space left on device not
handled on operation 10 (15804.0.13, or op 13, counting from 0)
-2> 2014-11-14 17:33:02.216768 7fd9dd1ab700 0
filestore(/data/osd/osd.0) ENOSPC handling not implemented
-1> 2014-11-14 17:33:02.216783 7fd9dd1ab700 0
filestore(/data/osd/osd.0) transaction dump:
...
...
0> 2014-11-14 17:33:02.541008 7fd9dd1ab700 -1 os/FileStore.cc: In
function 'unsigned int
FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int,
ThreadPool::TPHandle*)' thread 7fd9dd1ab700 time
2014-11-14 17:33:02.251570
os/FileStore.cc: 2540: FAILED assert(0 == "unexpected error")
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x17f8675]
2: (FileStore::_do_transaction(ObjectStore::Transaction&,
unsigned long, int, ThreadPool::TPHandle*)+0x4855) [0x1534c21]
3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*,
std::allocator<ObjectStore::Transaction*> >&, unsigned long,
ThreadPool::TPHandle*)+0x101) [0x152d67d]
4: (FileStore::_do_op(FileStore::OpSequencer*,
ThreadPool::TPHandle&)+0x57b) [0x152bdc3]
5: (FileStore::OpWQ::_process(FileStore::OpSequencer*,
ThreadPool::TPHandle&)+0x2f) [0x1553c6f]
6: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*,
ThreadPool::TPHandle&)+0x37) [0x15625e7]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x7a4) [0x18801de]
8: (ThreadPool::WorkThread::entry()+0x23) [0x1881f2d]
9: (Thread::_entry_func(void*)+0x23) [0x1998117]
10: (()+0x79d1) [0x7fd9e92bf9d1]
11: (clone()+0x6d) [0x7fd9e78ca9dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
It seens the error code was ENOSPC(No space left), why the osd
program exited with "assert" at
this time? If there was no space left, why the cluster should choose
to migrate? Only osd.6
and osd.7 was alive. I tried to restarted other OSDs, but after a
while, there osds crashed again.
And now I can't read the data any more.
Is it a bug? Anyone can help me?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com