Re: Getting "No space left on device" when reading from cephfs

Paul Emmerich <paul.emmerich@xxxxxxxx> · Thu, 9 May 2019 15:25:36 +0200

One full OSD stops everything.

You can change what's considered 'full', the default is 95%

ceph osd set-full-ratio 0.95

Never let an OSD run 100% full, that will lead to lots of real
problems, 95% is a good default (it's not exact, some metadata might
not always be accounted or it might temporarily need more)

A quick and dirty work-around if only one OSD is full: take it down ;)

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, May 9, 2019 at 2:08 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
>
> Hello
>
> I am running cephfs with 8/2 erasure coding. I had about 40tb usable free(110tb raw), one small disk crashed and i added 2x10tb disks. Now it's backfilling & recovering with 0B free and i can't read a single file from the file system...
>
> This happend with max-backfilling 4, but i have increased max backfills to 128, to hopefully get this over a little faster since system has been unusable for 12 hours anyway. Not sure yet if that was a good idea.
>
> 131TB of raw space was somehow not enough to keep things running. Any tips to avoid this kind of scenario in the future ?
>
> GLOBAL:
>    SIZE       AVAIL      RAW USED     %RAW USED
>    489TiB     131TiB       358TiB         73.17
> POOLS:
>    NAME                ID     USED        %USED      MAX AVAIL     OBJECTS
>    ec82_pool           41      278TiB     100.00            0B     28549450
>    cephfs_metadata     42      174MiB       0.04        381GiB       666939
>    rbd                 51     99.3GiB      20.68        381GiB        25530
>
>   data:
>    pools:   3 pools, 704 pgs
>    objects: 29.24M objects, 278TiB
>    usage:   358TiB used, 131TiB / 489TiB avail
>    pgs:     1265432/287571907 objects degraded (0.440%)
>             12366014/287571907 objects misplaced (4.300%)
>             536 active+clean
>             137 active+remapped+backfilling
>             27  active+undersized+degraded+remapped+backfilling
>             4   active+remapped+backfill_toofull
>
>  io:
>    client:   64.0KiB/s wr, 0op/s rd, 7op/s wr
>    recovery: 1.17GiB/s, 113objects/s
>
> Is there anything i can do to restore reading ? I can understand writing not working, but why is it blocking reading also ? Any tips ?
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com