Re: Full OSD with 29% free

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bryan,

We are setting up a cluster using xfs and have been a bit concerned about running into similar issues to the ones you described below.

I am just wondering if you came across any potential downsides to using a 2K block size with xfs on your osd's.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

________________________________________
From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bryan Stillwell [bstillwell@xxxxxxxxxxxxxxx]
Sent: Wednesday, October 30, 2013 2:18 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Full OSD with 29% free

I wanted to report back on this since I've made some progress on
fixing this issue.

After converting every OSD on a single server to use a 2K block size,
I've been able to cross 90% utilization without running into the 'No
space left on device' problem.  They're currently between 51% and 75%,
but I hit 90% over the weekend after a couple OSDs died during
recovery.

This conversion was pretty rough though with OSDs randomly dying
multiple times during the process (logs point at suicide time outs).
When looking at top I would frequently see xfsalloc pegging multiple
cores, so I wonder if that has something to do with it.  I also had
the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
which was fixed by running xfs_repair on those partitions.  This has
me wondering how well XFS is tested with non-default block sizes on
CentOS 6.4...

Anyways, after about a week I was finally able to get the cluster to
fully recover today.  Now I need to repeat the process on 7 more
servers before I can finish populating my cluster...

In case anyone is wondering how I switched to a 2K block size, this is
what I added to my ceph.conf:

[osd]
osd_mount_options_xfs = "rw,noatime,inode64"
osd_mkfs_options_xfs = "-f -b size=2048"


The cluster is currently running the 0.71 release.

Bryan

On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell
<bstillwell@xxxxxxxxxxxxxxx> wrote:
> So I'm running into this issue again and after spending a bit of time
> reading the XFS mailing lists, I believe the free space is too
> fragmented:
>
> [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
>    from      to extents  blocks    pct
>       1       1 85773 85773   0.24
>       2       3  176891  444356   1.27
>       4       7  430854 2410929   6.87
>       8      15 2327527 30337352  86.46
>      16      31   75871 1809577   5.16
> total free extents 3096916
> total free blocks 35087987
> average free extent size 11.33
>
>
> Compared to a drive which isn't reporting 'No space left on device':
>
> [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
>    from      to extents  blocks    pct
>       1       1  133148  133148   0.15
>       2       3  320737  808506   0.94
>       4       7  809748 4532573   5.27
>       8      15 4536681 59305608  68.96
>      16      31   31531  751285   0.87
>      32      63     364   16367   0.02
>      64     127      90    9174   0.01
>     128     255       9    2072   0.00
>     256     511      48   18018   0.02
>     512    1023     128  102422   0.12
>    1024    2047     290  451017   0.52
>    2048    4095     538 1649408   1.92
>    4096    8191     851 5066070   5.89
>    8192   16383     746 8436029   9.81
>   16384   32767     194 4042573   4.70
>   32768   65535      15  614301   0.71
>   65536  131071       1   66630   0.08
> total free extents 5835119
> total free blocks 86005201
> average free extent size 14.7392
>
>
> What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
> would help?  I'm pretty sure this would take require re-running
> mkfs.xfs on every OSD to fix if that's the case...
>
> Thanks,
> Bryan
>
>
> On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>
>> The filesystem isn't as full now, but the fragmentation is pretty low:
>>
>> [root@den2ceph001 ~]# df /dev/sdc1
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sdc1            486562672 270845628 215717044  56% /var/lib/ceph/osd/ceph-1
>> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
>> actual 3481543, ideal 3447443, fragmentation factor 0.98%
>>
>> Bryan
>>
>> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote:
>> >
>> > How fragmented is that file system?
>> >
>> > Sent from my iPad
>> >
>> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote:
>> > >
>> > > This appears to be more of an XFS issue than a ceph issue, but I've
>> > > run into a problem where some of my OSDs failed because the filesystem
>> > > was reported as full even though there was 29% free:
>> > >
>> > > [root@den2ceph001 ceph-1]# touch blah
>> > > touch: cannot touch `blah': No space left on device
>> > > [root@den2ceph001 ceph-1]# df .
>> > > Filesystem           1K-blocks      Used Available Use% Mounted on
>> > > /dev/sdc1            486562672 342139340 144423332  71% /var/lib/ceph/osd/ceph-1
>> > > [root@den2ceph001 ceph-1]# df -i .
>> > > Filesystem            Inodes   IUsed   IFree IUse% Mounted on
>> > > /dev/sdc1            60849984 4097408 56752576    7% /var/lib/ceph/osd/ceph-1
>> > > [root@den2ceph001 ceph-1]#
>> > >
>> > > I've tried remounting the filesystem with the inode64 option like a
>> > > few people recommended, but that didn't help (probably because it
>> > > doesn't appear to be running out of inodes).
>> > >
>> > > This happened while I was on vacation and I'm pretty sure it was
>> > > caused by another OSD failing on the same node.  I've been able to
>> > > recover from the situation by bringing the failed OSD back online, but
>> > > it's only a matter of time until I'll be running into this issue again
>> > > since my cluster is still being populated.
>> > >
>> > > Any ideas on things I can try the next time this happens?
>> > >
>> > > Thanks,
>> > > Bryan
>> > > _______________________________________________
>> > > ceph-users mailing list
>> > > ceph-users@xxxxxxxxxxxxxx
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--


Bryan Stillwell
SENIOR SYSTEM ADMINISTRATOR

E: bstillwell@xxxxxxxxxxxxxxx
O: 303.228.5109
M: 970.310.6085
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux