Shain, After getting the segfaults when running 'xfs_db -r "-c freesp -s"' on a couple partitions, I'm concerned that 2K block sizes aren't nearly as well tested as 4K block sizes. This could just be a problem with RHEL/CentOS 6.4 though, so if you're using a newer kernel the problem might already be fixed. There also appears to be more overhead with 2K block sizes which I believe manifests as high CPU usage by the xfsalloc processes. However, my cluster has been running in a clean state for over 24 hours and none of the scrubs have found a problem yet. According to 'ceph -s' my cluster has the following stats: osdmap e16882: 40 osds: 40 up, 40 in pgmap v3520420: 2808 pgs, 13 pools, 5694 GB data, 72705 kobjects 18095 GB used, 13499 GB / 31595 GB avail That's about 78k per object on average, so if your files aren't that small I would stay with 4K block sizes to avoid headaches. Bryan On Thu, Oct 31, 2013 at 6:43 AM, Shain Miley <SMiley@xxxxxxx> wrote: > > Bryan, > > We are setting up a cluster using xfs and have been a bit concerned about running into similar issues to the ones you described below. > > I am just wondering if you came across any potential downsides to using a 2K block size with xfs on your osd's. > > Thanks, > > Shain > > Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 > > ________________________________________ > From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bryan Stillwell [bstillwell@xxxxxxxxxxxxxxx] > Sent: Wednesday, October 30, 2013 2:18 PM > To: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Full OSD with 29% free > > I wanted to report back on this since I've made some progress on > fixing this issue. > > After converting every OSD on a single server to use a 2K block size, > I've been able to cross 90% utilization without running into the 'No > space left on device' problem. They're currently between 51% and 75%, > but I hit 90% over the weekend after a couple OSDs died during > recovery. > > This conversion was pretty rough though with OSDs randomly dying > multiple times during the process (logs point at suicide time outs). > When looking at top I would frequently see xfsalloc pegging multiple > cores, so I wonder if that has something to do with it. I also had > the 'xfs_db -r "-c freesp -s"' command segfault on me a few times > which was fixed by running xfs_repair on those partitions. This has > me wondering how well XFS is tested with non-default block sizes on > CentOS 6.4... > > Anyways, after about a week I was finally able to get the cluster to > fully recover today. Now I need to repeat the process on 7 more > servers before I can finish populating my cluster... > > In case anyone is wondering how I switched to a 2K block size, this is > what I added to my ceph.conf: > > [osd] > osd_mount_options_xfs = "rw,noatime,inode64" > osd_mkfs_options_xfs = "-f -b size=2048" > > > The cluster is currently running the 0.71 release. > > Bryan > > On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell > <bstillwell@xxxxxxxxxxxxxxx> wrote: > > So I'm running into this issue again and after spending a bit of time > > reading the XFS mailing lists, I believe the free space is too > > fragmented: > > > > [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1 > > from to extents blocks pct > > 1 1 85773 85773 0.24 > > 2 3 176891 444356 1.27 > > 4 7 430854 2410929 6.87 > > 8 15 2327527 30337352 86.46 > > 16 31 75871 1809577 5.16 > > total free extents 3096916 > > total free blocks 35087987 > > average free extent size 11.33 > > > > > > Compared to a drive which isn't reporting 'No space left on device': > > > > [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1 > > from to extents blocks pct > > 1 1 133148 133148 0.15 > > 2 3 320737 808506 0.94 > > 4 7 809748 4532573 5.27 > > 8 15 4536681 59305608 68.96 > > 16 31 31531 751285 0.87 > > 32 63 364 16367 0.02 > > 64 127 90 9174 0.01 > > 128 255 9 2072 0.00 > > 256 511 48 18018 0.02 > > 512 1023 128 102422 0.12 > > 1024 2047 290 451017 0.52 > > 2048 4095 538 1649408 1.92 > > 4096 8191 851 5066070 5.89 > > 8192 16383 746 8436029 9.81 > > 16384 32767 194 4042573 4.70 > > 32768 65535 15 614301 0.71 > > 65536 131071 1 66630 0.08 > > total free extents 5835119 > > total free blocks 86005201 > > average free extent size 14.7392 > > > > > > What I'm wondering is if reducing the block size from 4K to 2K (or 1K) > > would help? I'm pretty sure this would take require re-running > > mkfs.xfs on every OSD to fix if that's the case... > > > > Thanks, > > Bryan > > > > > > On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell > > <bstillwell@xxxxxxxxxxxxxxx> wrote: > >> > >> The filesystem isn't as full now, but the fragmentation is pretty low: > >> > >> [root@den2ceph001 ~]# df /dev/sdc1 > >> Filesystem 1K-blocks Used Available Use% Mounted on > >> /dev/sdc1 486562672 270845628 215717044 56% /var/lib/ceph/osd/ceph-1 > >> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1 > >> actual 3481543, ideal 3447443, fragmentation factor 0.98% > >> > >> Bryan > >> > >> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote: > >> > > >> > How fragmented is that file system? > >> > > >> > Sent from my iPad > >> > > >> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> wrote: > >> > > > >> > > This appears to be more of an XFS issue than a ceph issue, but I've > >> > > run into a problem where some of my OSDs failed because the filesystem > >> > > was reported as full even though there was 29% free: > >> > > > >> > > [root@den2ceph001 ceph-1]# touch blah > >> > > touch: cannot touch `blah': No space left on device > >> > > [root@den2ceph001 ceph-1]# df . > >> > > Filesystem 1K-blocks Used Available Use% Mounted on > >> > > /dev/sdc1 486562672 342139340 144423332 71% /var/lib/ceph/osd/ceph-1 > >> > > [root@den2ceph001 ceph-1]# df -i . > >> > > Filesystem Inodes IUsed IFree IUse% Mounted on > >> > > /dev/sdc1 60849984 4097408 56752576 7% /var/lib/ceph/osd/ceph-1 > >> > > [root@den2ceph001 ceph-1]# > >> > > > >> > > I've tried remounting the filesystem with the inode64 option like a > >> > > few people recommended, but that didn't help (probably because it > >> > > doesn't appear to be running out of inodes). > >> > > > >> > > This happened while I was on vacation and I'm pretty sure it was > >> > > caused by another OSD failing on the same node. I've been able to > >> > > recover from the situation by bringing the failed OSD back online, but > >> > > it's only a matter of time until I'll be running into this issue again > >> > > since my cluster is still being populated. > >> > > > >> > > Any ideas on things I can try the next time this happens? > >> > > > >> > > Thanks, > >> > > Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com