Re: Full OSD with 29% free

Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> · Thu, 31 Oct 2013 11:02:32 -0600

Shain,

After getting the segfaults when running 'xfs_db -r "-c freesp -s"' on
a couple partitions, I'm concerned that 2K block sizes aren't nearly
as well tested as 4K block sizes.  This could just be a problem with
RHEL/CentOS 6.4 though, so if you're using a newer kernel the problem
might already be fixed.  There also appears to be more overhead with
2K block sizes which I believe manifests as high CPU usage by the
xfsalloc processes.  However, my cluster has been running in a clean
state for over 24 hours and none of the scrubs have found a problem
yet.

According to 'ceph -s' my cluster has the following stats:

     osdmap e16882: 40 osds: 40 up, 40 in
      pgmap v3520420: 2808 pgs, 13 pools, 5694 GB data, 72705 kobjects
            18095 GB used, 13499 GB / 31595 GB avail

That's about 78k per object on average, so if your files aren't that
small I would stay with 4K block sizes to avoid headaches.

Bryan

On Thu, Oct 31, 2013 at 6:43 AM, Shain Miley <SMiley@xxxxxxx> wrote:
>
> Bryan,
>
> We are setting up a cluster using xfs and have been a bit concerned about running into similar issues to the ones you described below.
>
> I am just wondering if you came across any potential downsides to using a 2K block size with xfs on your osd's.
>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649
>
> ________________________________________
> From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bryan Stillwell [bstillwell@xxxxxxxxxxxxxxx]
> Sent: Wednesday, October 30, 2013 2:18 PM
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Full OSD with 29% free
>
> I wanted to report back on this since I've made some progress on
> fixing this issue.
>
> After converting every OSD on a single server to use a 2K block size,
> I've been able to cross 90% utilization without running into the 'No
> space left on device' problem.  They're currently between 51% and 75%,
> but I hit 90% over the weekend after a couple OSDs died during
> recovery.
>
> This conversion was pretty rough though with OSDs randomly dying
> multiple times during the process (logs point at suicide time outs).
> When looking at top I would frequently see xfsalloc pegging multiple
> cores, so I wonder if that has something to do with it.  I also had
> the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
> which was fixed by running xfs_repair on those partitions.  This has
> me wondering how well XFS is tested with non-default block sizes on
> CentOS 6.4...
>
> Anyways, after about a week I was finally able to get the cluster to
> fully recover today.  Now I need to repeat the process on 7 more
> servers before I can finish populating my cluster...
>
> In case anyone is wondering how I switched to a 2K block size, this is
> what I added to my ceph.conf:
>
> [osd]
> osd_mount_options_xfs = "rw,noatime,inode64"
> osd_mkfs_options_xfs = "-f -b size=2048"
>
>
> The cluster is currently running the 0.71 release.
>
> Bryan
>
> On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell
> <bstillwell@xxxxxxxxxxxxxxx> wrote:
> > So I'm running into this issue again and after spending a bit of time
> > reading the XFS mailing lists, I believe the free space is too
> > fragmented:
> >
> > [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
> >    from      to extents  blocks    pct
> >       1       1 85773 85773   0.24
> >       2       3  176891  444356   1.27
> >       4       7  430854 2410929   6.87
> >       8      15 2327527 30337352  86.46
> >      16      31   75871 1809577   5.16
> > total free extents 3096916
> > total free blocks 35087987
> > average free extent size 11.33
> >
> >
> > Compared to a drive which isn't reporting 'No space left on device':
> >
> > [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
> >    from      to extents  blocks    pct
> >       1       1  133148  133148   0.15
> >       2       3  320737  808506   0.94
> >       4       7  809748 4532573   5.27
> >       8      15 4536681 59305608  68.96
> >      16      31   31531  751285   0.87
> >      32      63     364   16367   0.02
> >      64     127      90    9174   0.01
> >     128     255       9    2072   0.00
> >     256     511      48   18018   0.02
> >     512    1023     128  102422   0.12
> >    1024    2047     290  451017   0.52
> >    2048    4095     538 1649408   1.92
> >    4096    8191     851 5066070   5.89
> >    8192   16383     746 8436029   9.81
> >   16384   32767     194 4042573   4.70
> >   32768   65535      15  614301   0.71
> >   65536  131071       1   66630   0.08
> > total free extents 5835119
> > total free blocks 86005201
> > average free extent size 14.7392
> >
> >
> > What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
> > would help?  I'm pretty sure this would take require re-running
> > mkfs.xfs on every OSD to fix if that's the case...
> >
> > Thanks,
> > Bryan
> >
> >
> > On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
> > <bstillwell@xxxxxxxxxxxxxxx> wrote:
> >>
> >> The filesystem isn't as full now, but the fragmentation is pretty low:
> >>
> >> [root@den2ceph001 ~]# df /dev/sdc1
> >> Filesystem           1K-blocks      Used Available Use% Mounted on
> >> /dev/sdc1            486562672 270845628 215717044  56% /var/lib/ceph/osd/ceph-1
> >> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
> >> actual 3481543, ideal 3447443, fragmentation factor 0.98%
> >>
> >> Bryan
> >>
> >> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote:
> >> >
> >> > How fragmented is that file system?
> >> >
> >> > Sent from my iPad
> >> >
> >> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote:
> >> > >
> >> > > This appears to be more of an XFS issue than a ceph issue, but I've
> >> > > run into a problem where some of my OSDs failed because the filesystem
> >> > > was reported as full even though there was 29% free:
> >> > >
> >> > > [root@den2ceph001 ceph-1]# touch blah
> >> > > touch: cannot touch `blah': No space left on device
> >> > > [root@den2ceph001 ceph-1]# df .
> >> > > Filesystem           1K-blocks      Used Available Use% Mounted on
> >> > > /dev/sdc1            486562672 342139340 144423332  71% /var/lib/ceph/osd/ceph-1
> >> > > [root@den2ceph001 ceph-1]# df -i .
> >> > > Filesystem            Inodes   IUsed   IFree IUse% Mounted on
> >> > > /dev/sdc1            60849984 4097408 56752576    7% /var/lib/ceph/osd/ceph-1
> >> > > [root@den2ceph001 ceph-1]#
> >> > >
> >> > > I've tried remounting the filesystem with the inode64 option like a
> >> > > few people recommended, but that didn't help (probably because it
> >> > > doesn't appear to be running out of inodes).
> >> > >
> >> > > This happened while I was on vacation and I'm pretty sure it was
> >> > > caused by another OSD failing on the same node.  I've been able to
> >> > > recover from the situation by bringing the failed OSD back online, but
> >> > > it's only a matter of time until I'll be running into this issue again
> >> > > since my cluster is still being populated.
> >> > >
> >> > > Any ideas on things I can try the next time this happens?
> >> > >
> >> > > Thanks,
> >> > > Bryan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com