Bryan, We are setting up a cluster using xfs and have been a bit concerned about running into similar issues to the ones you described below. I am just wondering if you came across any potential downsides to using a 2K block size with xfs on your osd's. Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 ________________________________________ From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bryan Stillwell [bstillwell@xxxxxxxxxxxxxxx] Sent: Wednesday, October 30, 2013 2:18 PM To: ceph-users@xxxxxxxxxxxxxx Subject: Re: Full OSD with 29% free I wanted to report back on this since I've made some progress on fixing this issue. After converting every OSD on a single server to use a 2K block size, I've been able to cross 90% utilization without running into the 'No space left on device' problem. They're currently between 51% and 75%, but I hit 90% over the weekend after a couple OSDs died during recovery. This conversion was pretty rough though with OSDs randomly dying multiple times during the process (logs point at suicide time outs). When looking at top I would frequently see xfsalloc pegging multiple cores, so I wonder if that has something to do with it. I also had the 'xfs_db -r "-c freesp -s"' command segfault on me a few times which was fixed by running xfs_repair on those partitions. This has me wondering how well XFS is tested with non-default block sizes on CentOS 6.4... Anyways, after about a week I was finally able to get the cluster to fully recover today. Now I need to repeat the process on 7 more servers before I can finish populating my cluster... In case anyone is wondering how I switched to a 2K block size, this is what I added to my ceph.conf: [osd] osd_mount_options_xfs = "rw,noatime,inode64" osd_mkfs_options_xfs = "-f -b size=2048" The cluster is currently running the 0.71 release. Bryan On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote: > So I'm running into this issue again and after spending a bit of time > reading the XFS mailing lists, I believe the free space is too > fragmented: > > [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1 > from to extents blocks pct > 1 1 85773 85773 0.24 > 2 3 176891 444356 1.27 > 4 7 430854 2410929 6.87 > 8 15 2327527 30337352 86.46 > 16 31 75871 1809577 5.16 > total free extents 3096916 > total free blocks 35087987 > average free extent size 11.33 > > > Compared to a drive which isn't reporting 'No space left on device': > > [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1 > from to extents blocks pct > 1 1 133148 133148 0.15 > 2 3 320737 808506 0.94 > 4 7 809748 4532573 5.27 > 8 15 4536681 59305608 68.96 > 16 31 31531 751285 0.87 > 32 63 364 16367 0.02 > 64 127 90 9174 0.01 > 128 255 9 2072 0.00 > 256 511 48 18018 0.02 > 512 1023 128 102422 0.12 > 1024 2047 290 451017 0.52 > 2048 4095 538 1649408 1.92 > 4096 8191 851 5066070 5.89 > 8192 16383 746 8436029 9.81 > 16384 32767 194 4042573 4.70 > 32768 65535 15 614301 0.71 > 65536 131071 1 66630 0.08 > total free extents 5835119 > total free blocks 86005201 > average free extent size 14.7392 > > > What I'm wondering is if reducing the block size from 4K to 2K (or 1K) > would help? I'm pretty sure this would take require re-running > mkfs.xfs on every OSD to fix if that's the case... > > Thanks, > Bryan > > > On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell > <bstillwell@xxxxxxxxxxxxxxx> wrote: >> >> The filesystem isn't as full now, but the fragmentation is pretty low: >> >> [root@den2ceph001 ~]# df /dev/sdc1 >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/sdc1 486562672 270845628 215717044 56% /var/lib/ceph/osd/ceph-1 >> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1 >> actual 3481543, ideal 3447443, fragmentation factor 0.98% >> >> Bryan >> >> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote: >> > >> > How fragmented is that file system? >> > >> > Sent from my iPad >> > >> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote: >> > > >> > > This appears to be more of an XFS issue than a ceph issue, but I've >> > > run into a problem where some of my OSDs failed because the filesystem >> > > was reported as full even though there was 29% free: >> > > >> > > [root@den2ceph001 ceph-1]# touch blah >> > > touch: cannot touch `blah': No space left on device >> > > [root@den2ceph001 ceph-1]# df . >> > > Filesystem 1K-blocks Used Available Use% Mounted on >> > > /dev/sdc1 486562672 342139340 144423332 71% /var/lib/ceph/osd/ceph-1 >> > > [root@den2ceph001 ceph-1]# df -i . >> > > Filesystem Inodes IUsed IFree IUse% Mounted on >> > > /dev/sdc1 60849984 4097408 56752576 7% /var/lib/ceph/osd/ceph-1 >> > > [root@den2ceph001 ceph-1]# >> > > >> > > I've tried remounting the filesystem with the inode64 option like a >> > > few people recommended, but that didn't help (probably because it >> > > doesn't appear to be running out of inodes). >> > > >> > > This happened while I was on vacation and I'm pretty sure it was >> > > caused by another OSD failing on the same node. I've been able to >> > > recover from the situation by bringing the failed OSD back online, but >> > > it's only a matter of time until I'll be running into this issue again >> > > since my cluster is still being populated. >> > > >> > > Any ideas on things I can try the next time this happens? >> > > >> > > Thanks, >> > > Bryan >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Bryan Stillwell SENIOR SYSTEM ADMINISTRATOR E: bstillwell@xxxxxxxxxxxxxxx O: 303.228.5109 M: 970.310.6085 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com