Re: Getting "No space left on device" when reading from cephfs

Kári Bertilsson <karibertils@xxxxxxxxx> · Thu, 9 May 2019 13:43:29 +0000

Thanks for the tips
A single OSD was indeed 95% full, and after removing it there is 24TB usable space and everything working again. :D

I hope during the backfilling, another OSD won't go 95% also.

It's a bit odd with ~140 OSD's that a single full one can take everything down with it.
I would understand since 8/2 erasure coding spreads data over 10 disks that if one of those is full it cant use the capacity of the other 9 disks.
But seems it can only use the free capacity based on the lowest one in the whole cluster

On Thu, May 9, 2019 at 1:25 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
One full OSD stops everything.

You can change what's considered 'full', the default is 95%

ceph osd set-full-ratio 0.95

Never let an OSD run 100% full, that will lead to lots of real

problems, 95% is a good default (it's not exact, some metadata might

not always be accounted or it might temporarily need more)

A quick and dirty work-around if only one OSD is full: take it down ;)

Paul

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

On Thu, May 9, 2019 at 2:08 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:

>

> Hello

>

> I am running cephfs with 8/2 erasure coding. I had about 40tb usable free(110tb raw), one small disk crashed and i added 2x10tb disks. Now it's backfilling & recovering with 0B free and i can't read a single file from the file system...

>

> This happend with max-backfilling 4, but i have increased max backfills to 128, to hopefully get this over a little faster since system has been unusable for 12 hours anyway. Not sure yet if that was a good idea.

>

> 131TB of raw space was somehow not enough to keep things running. Any tips to avoid this kind of scenario in the future ?

>

> GLOBAL:

>    SIZE       AVAIL      RAW USED     %RAW USED

>    489TiB     131TiB       358TiB         73.17

> POOLS:

>    NAME                ID     USED        %USED      MAX AVAIL     OBJECTS

>    ec82_pool           41      278TiB     100.00            0B     28549450

>    cephfs_metadata     42      174MiB       0.04        381GiB       666939

>    rbd                 51     99.3GiB      20.68        381GiB        25530

>

>   data:

>    pools:   3 pools, 704 pgs

>    objects: 29.24M objects, 278TiB

>    usage:   358TiB used, 131TiB / 489TiB avail

>    pgs:     1265432/287571907 objects degraded (0.440%)

>             12366014/287571907 objects misplaced (4.300%)

>             536 active+clean

>             137 active+remapped+backfilling

>             27  active+undersized+degraded+remapped+backfilling

>             4   active+remapped+backfill_toofull

>

>  io:

>    client:   64.0KiB/s wr, 0op/s rd, 7op/s wr

>    recovery: 1.17GiB/s, 113objects/s

>

> Is there anything i can do to restore reading ? I can understand writing not working, but why is it blocking reading also ? Any tips ?

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com