Re: osd crashed while there was no space

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Mon, 17 Nov 2014 13:28:31 -0800

At this point, it's probably best to delete the pool.  I'm assuming the pool only contains benchmark data, and nothing important.
Assuming you can delete the pool:
First, figure out the ID of the data pool.  You can get that from ceph osd dump | grep '^pool'

Once you have the number, delete the data pool: rados rmpool data data --yes-i-really-really-mean-it

That will only free up space on OSDs that are up.  You'll need to manually some PGs on the OSDs that are 100% full.  Go to /var/lib/ceph/osd/ceph-<OSDID>/current, and delete a few directories that start with your data pool ID.  You don't need to delete all of them.  Once the disk is below 95% full, you should be able to start that OSD.  Once it's up, it will finish deleting the pool.

If you can't delete the pool, it is possible, but it's more work, and you still run the risk of losing data if you make a mistake.  You need to disable backfilling, then delete some PGs on each OSD that's full. Try to only delete one copy of each PG.  If you delete every copy of a PG on all OSDs, then you lost the data that was in that PG.  As before, once you delete enough that the disk is less than 95% full, you can start the OSD.  Once you start it, start deleting your benchmark data out of the data pool.  Once that's done, you can re-enable backfilling.  You may need to scrub or deep-scrub the OSDs you deleted data from to get everything back to normal.

So how did you get the disks 100% full anyway?  Ceph normally won't let you do that.  Did you increase mon_osd_full_ratio, osd_backfill_full_ratio, or osd_failsafe_full_ratio?

On Mon, Nov 17, 2014 at 7:00 AM, han vincent <hangzws@xxxxxxxxx> wrote:
hello, every one:

    These days a problem of "ceph" has troubled me for a long time.

    I build a cluster with 3 hosts and each host has three osds in it.

And after that

I used the command "rados bench 360 -p data -b 4194304 -t 300 write

--no-cleanup"

to test the write performance of the cluster.

    When the cluster is near full, there couldn't write any data to

it. Unfortunately,

there was a host hung up, then a lots of PG was going to migrate to other OSDs.

After a while, a lots of OSD was marked down and out, my cluster couldn't work

any more.

    The following is the output of "ceph -s":

    cluster 002c3742-ab04-470f-8a7a-ad0658b547d6

    health HEALTH_ERR 103 pgs degraded; 993 pgs down; 617 pgs

incomplete; 1008 pgs peering; 12 pgs recovering; 534 pgs stale; 1625

pgs stuck inactive; 534 pgs stuck stale; 1728 pgs stuck unclean;

recovery 945/29649 objects degraded (3.187%); 1 full osd(s); 1 mons

down, quorum 0,2 2,1

     monmap e1: 3 mons at

{0=10.0.0.97:6789/0,1=10.0.0.98:6789/0,2=10.0.0.70:6789/0}, election

epoch 40, quorum 0,2 2,1

     osdmap e173: 9 osds: 2 up, 2 in

            flags full

      pgmap v1779: 1728 pgs, 3 pools, 39528 MB data, 9883 objects

            37541 MB used, 3398 MB / 40940 MB avail

            945/29649 objects degraded (3.187%)

                  34 stale+active+degraded+remapped

                 176 stale+incomplete

                 320 stale+down+peering

                  53 active+degraded+remapped

                 408 incomplete

                   1 active+recovering+degraded

                 673 down+peering

                   1 stale+active+degraded

                  15 remapped+peering

                   3 stale+active+recovering+degraded+remapped

                   3 active+degraded

                  33 remapped+incomplete

                   8 active+recovering+degraded+remapped

    The following is the output of "ceph osd tree":

    # id    weight  type name       up/down reweight

    -1      9       root default

    -3      9               rack unknownrack

    -2      3                       host 10.0.0.97

     0       1                               osd.0   down    0

     1       1                               osd.1   down    0

     2       1                               osd.2   down    0

     -4      3                       host 10.0.0.98

     3       1                               osd.3   down    0

     4       1                               osd.4   down    0

     5       1                               osd.5   down    0

     -5      3                       host 10.0.0.70

     6       1                               osd.6   up      1

     7       1                               osd.7   up      1

     8       1                               osd.8   down    0

The following is part of output os osd.0.log

    -3> 2014-11-14 17:33:02.166022 7fd9dd1ab700  0

filestore(/data/osd/osd.0)  error (28) No space left on device not

handled on operation 10 (15804.0.13, or op 13, counting from 0)

    -2> 2014-11-14 17:33:02.216768 7fd9dd1ab700  0

filestore(/data/osd/osd.0) ENOSPC handling not implemented

    -1> 2014-11-14 17:33:02.216783 7fd9dd1ab700  0

filestore(/data/osd/osd.0)  transaction dump:

    ...

    ...

    0> 2014-11-14 17:33:02.541008 7fd9dd1ab700 -1 os/FileStore.cc: In

function 'unsigned int

FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int,

ThreadPool::TPHandle*)' thread 7fd9dd1ab700             time

2014-11-14 17:33:02.251570

      os/FileStore.cc: 2540: FAILED assert(0 == "unexpected error")

      ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char

const*)+0x85) [0x17f8675]

     2: (FileStore::_do_transaction(ObjectStore::Transaction&,

unsigned long, int, ThreadPool::TPHandle*)+0x4855)         [0x1534c21]

     3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*,

std::allocator<ObjectStore::Transaction*> >&,      unsigned long,

ThreadPool::TPHandle*)+0x101) [0x152d67d]

     4: (FileStore::_do_op(FileStore::OpSequencer*,

ThreadPool::TPHandle&)+0x57b) [0x152bdc3]

     5: (FileStore::OpWQ::_process(FileStore::OpSequencer*,

ThreadPool::TPHandle&)+0x2f) [0x1553c6f]

     6: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*,

ThreadPool::TPHandle&)+0x37)      [0x15625e7]

     7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x7a4) [0x18801de]

     8: (ThreadPool::WorkThread::entry()+0x23) [0x1881f2d]

     9: (Thread::_entry_func(void*)+0x23) [0x1998117]

    10: (()+0x79d1) [0x7fd9e92bf9d1]

    11: (clone()+0x6d) [0x7fd9e78ca9dd]

    NOTE: a copy of the executable, or `objdump -rdS <executable>` is

needed to interpret this.

    It seens the error code was ENOSPC(No space left), why the osd

program exited with "assert" at

this time? If there was no space left, why the cluster should choose

to migrate? Only osd.6

and osd.7 was alive. I tried to restarted other OSDs, but after a

while, there osds crashed again.

And now I can't read the data any more.

    Is it a bug? Anyone can help me?

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com