Shain, I investigated the segfault a little more since I sent this message and found this email thread: http://oss.sgi.com/archives/xfs/2012-06/msg00066.html After reading that I did the following: [root@den2ceph001 ~]# xfs_db -r "-c freesp -s" /dev/sdb1 Segmentation fault (core dumped) [root@den2ceph001 ~]# service ceph stop osd.0 === osd.0 === Stopping Ceph osd.0 on den2ceph001...kill 2407...kill 2407...done [root@den2ceph001 ~]# umount /dev/sdb1 [root@den2ceph001 ~]# xfs_db -r "-c freesp -s" /dev/sdb1 from to extents blocks pct 1 1 44510 44510 0.05 2 3 60341 142274 0.16 4 7 68836 355735 0.39 8 15 274122 3212122 3.50 16 31 1429274 37611619 41.02 32 63 43225 1945740 2.12 64 127 39480 3585579 3.91 128 255 36046 6544005 7.14 256 511 30946 10899979 11.89 512 1023 14119 9907129 10.80 1024 2047 5727 7998938 8.72 2048 4095 2647 6811258 7.43 4096 8191 362 1940622 2.12 8192 16383 59 603690 0.66 16384 32767 5 90464 0.10 total free extents 2049699 total free blocks 91693664 average free extent size 44.7352 That gives me a little more confidence in using 2K block sizes now. :) Bryan On Thu, Oct 31, 2013 at 11:02 AM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote: > Shain, > > After getting the segfaults when running 'xfs_db -r "-c freesp -s"' on > a couple partitions, I'm concerned that 2K block sizes aren't nearly > as well tested as 4K block sizes. This could just be a problem with > RHEL/CentOS 6.4 though, so if you're using a newer kernel the problem > might already be fixed. There also appears to be more overhead with > 2K block sizes which I believe manifests as high CPU usage by the > xfsalloc processes. However, my cluster has been running in a clean > state for over 24 hours and none of the scrubs have found a problem > yet. > > According to 'ceph -s' my cluster has the following stats: > > osdmap e16882: 40 osds: 40 up, 40 in > pgmap v3520420: 2808 pgs, 13 pools, 5694 GB data, 72705 kobjects > 18095 GB used, 13499 GB / 31595 GB avail > > That's about 78k per object on average, so if your files aren't that > small I would stay with 4K block sizes to avoid headaches. > > Bryan > > > On Thu, Oct 31, 2013 at 6:43 AM, Shain Miley <SMiley@xxxxxxx> wrote: >> >> Bryan, >> >> We are setting up a cluster using xfs and have been a bit concerned about running into similar issues to the ones you described below. >> >> I am just wondering if you came across any potential downsides to using a 2K block size with xfs on your osd's. >> >> Thanks, >> >> Shain >> >> Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 >> >> ________________________________________ >> From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bryan Stillwell [bstillwell@xxxxxxxxxxxxxxx] >> Sent: Wednesday, October 30, 2013 2:18 PM >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: Full OSD with 29% free >> >> I wanted to report back on this since I've made some progress on >> fixing this issue. >> >> After converting every OSD on a single server to use a 2K block size, >> I've been able to cross 90% utilization without running into the 'No >> space left on device' problem. They're currently between 51% and 75%, >> but I hit 90% over the weekend after a couple OSDs died during >> recovery. >> >> This conversion was pretty rough though with OSDs randomly dying >> multiple times during the process (logs point at suicide time outs). >> When looking at top I would frequently see xfsalloc pegging multiple >> cores, so I wonder if that has something to do with it. I also had >> the 'xfs_db -r "-c freesp -s"' command segfault on me a few times >> which was fixed by running xfs_repair on those partitions. This has >> me wondering how well XFS is tested with non-default block sizes on >> CentOS 6.4... >> >> Anyways, after about a week I was finally able to get the cluster to >> fully recover today. Now I need to repeat the process on 7 more >> servers before I can finish populating my cluster... >> >> In case anyone is wondering how I switched to a 2K block size, this is >> what I added to my ceph.conf: >> >> [osd] >> osd_mount_options_xfs = "rw,noatime,inode64" >> osd_mkfs_options_xfs = "-f -b size=2048" >> >> >> The cluster is currently running the 0.71 release. >> >> Bryan >> >> On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell >> <bstillwell@xxxxxxxxxxxxxxx> wrote: >> > So I'm running into this issue again and after spending a bit of time >> > reading the XFS mailing lists, I believe the free space is too >> > fragmented: >> > >> > [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1 >> > from to extents blocks pct >> > 1 1 85773 85773 0.24 >> > 2 3 176891 444356 1.27 >> > 4 7 430854 2410929 6.87 >> > 8 15 2327527 30337352 86.46 >> > 16 31 75871 1809577 5.16 >> > total free extents 3096916 >> > total free blocks 35087987 >> > average free extent size 11.33 >> > >> > >> > Compared to a drive which isn't reporting 'No space left on device': >> > >> > [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1 >> > from to extents blocks pct >> > 1 1 133148 133148 0.15 >> > 2 3 320737 808506 0.94 >> > 4 7 809748 4532573 5.27 >> > 8 15 4536681 59305608 68.96 >> > 16 31 31531 751285 0.87 >> > 32 63 364 16367 0.02 >> > 64 127 90 9174 0.01 >> > 128 255 9 2072 0.00 >> > 256 511 48 18018 0.02 >> > 512 1023 128 102422 0.12 >> > 1024 2047 290 451017 0.52 >> > 2048 4095 538 1649408 1.92 >> > 4096 8191 851 5066070 5.89 >> > 8192 16383 746 8436029 9.81 >> > 16384 32767 194 4042573 4.70 >> > 32768 65535 15 614301 0.71 >> > 65536 131071 1 66630 0.08 >> > total free extents 5835119 >> > total free blocks 86005201 >> > average free extent size 14.7392 >> > >> > >> > What I'm wondering is if reducing the block size from 4K to 2K (or 1K) >> > would help? I'm pretty sure this would take require re-running >> > mkfs.xfs on every OSD to fix if that's the case... >> > >> > Thanks, >> > Bryan >> > >> > >> > On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell >> > <bstillwell@xxxxxxxxxxxxxxx> wrote: >> >> >> >> The filesystem isn't as full now, but the fragmentation is pretty low: >> >> >> >> [root@den2ceph001 ~]# df /dev/sdc1 >> >> Filesystem 1K-blocks Used Available Use% Mounted on >> >> /dev/sdc1 486562672 270845628 215717044 56% /var/lib/ceph/osd/ceph-1 >> >> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1 >> >> actual 3481543, ideal 3447443, fragmentation factor 0.98% >> >> >> >> Bryan >> >> >> >> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote: >> >> > >> >> > How fragmented is that file system? >> >> > >> >> > Sent from my iPad >> >> > >> >> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote: >> >> > > >> >> > > This appears to be more of an XFS issue than a ceph issue, but I've >> >> > > run into a problem where some of my OSDs failed because the filesystem >> >> > > was reported as full even though there was 29% free: >> >> > > >> >> > > [root@den2ceph001 ceph-1]# touch blah >> >> > > touch: cannot touch `blah': No space left on device >> >> > > [root@den2ceph001 ceph-1]# df . >> >> > > Filesystem 1K-blocks Used Available Use% Mounted on >> >> > > /dev/sdc1 486562672 342139340 144423332 71% /var/lib/ceph/osd/ceph-1 >> >> > > [root@den2ceph001 ceph-1]# df -i . >> >> > > Filesystem Inodes IUsed IFree IUse% Mounted on >> >> > > /dev/sdc1 60849984 4097408 56752576 7% /var/lib/ceph/osd/ceph-1 >> >> > > [root@den2ceph001 ceph-1]# >> >> > > >> >> > > I've tried remounting the filesystem with the inode64 option like a >> >> > > few people recommended, but that didn't help (probably because it >> >> > > doesn't appear to be running out of inodes). >> >> > > >> >> > > This happened while I was on vacation and I'm pretty sure it was >> >> > > caused by another OSD failing on the same node. I've been able to >> >> > > recover from the situation by bringing the failed OSD back online, but >> >> > > it's only a matter of time until I'll be running into this issue again >> >> > > since my cluster is still being populated. >> >> > > >> >> > > Any ideas on things I can try the next time this happens? >> >> > > >> >> > > Thanks, >> >> > > Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com