Re: Issue with free Inodes

Christian Balzer <chibi@xxxxxxx> · Mon, 23 Mar 2015 23:09:55 +0900

On Mon, 23 Mar 2015 15:26:07 +0300 Kamil Kuramshin wrote:

> Yes, I understand that.
> 
> The initial purpose of first email was just an advise for new comers. My 
> fault was in that I was selected ext4 for SSD disks as backend.
> But I  did not foresee that inode number can reach its limit before the 
> free space :)
> 
> And maybe there must be some sort of warning not only for free space in 
> MiBs(GiBs,TiBs) and there must be dedicated warning about free inodes 
> for filesystems with static inode allocation  like ext4.
> Because if OSD reach inode limit it becames totally unusable and 
> immediately goes down, and from that moment there is no way to start it!
> 
While all that is true and should probably be addressed, please re-read
what I wrote before.

With the 3.3 million inodes used and thus likely as many files (did you
verify this?) and 4MB objects that would make something in the 12TB
ballpark area.

Something very very strange and wrong is going on with your cache tier.

Christian

> 
> 23.03.2015 13:42, Thomas Foster пишет:
> > You could fix this by changing your block size when formatting the 
> > mount-point with the mkfs -b command.  I had this same issue when 
> > dealing with the filesystem using glusterfs and the solution is to 
> > either use a filesystem that allocates inodes automatically or change 
> > the block size when you build the filesystem.  Unfortunately, the only 
> > way to fix the problem that I have seen is to reformat
> >
> > On Mon, Mar 23, 2015 at 5:51 AM, Kamil Kuramshin 
> > <kamil.kuramshin@xxxxxxxx <mailto:kamil.kuramshin@xxxxxxxx>> wrote:
> >
> >     In my case there was cache pool for ec-pool serving RBD-images,
> >     and object size is 4Mb, and client was an /kernel-rbd /client
> >     each SSD disk is 60G disk, 2 disk per node,  6 nodes in total = 12
> >     OSDs in total
> >
> >
> >     23.03.2015 12:00, Christian Balzer пишет:
> >>     Hello,
> >>
> >>     This is rather confusing, as cache-tiers are just normal
> >> OSDs/pools and thus should have Ceph objects of around 4MB in size by
> >> default.
> >>
> >>     This is matched on what I see with Ext4 here (normal OSD, not a
> >> cache tier):
> >>     ---
> >>     size:
> >>     /dev/sde1       2.7T  204G  2.4T   8% /var/lib/ceph/osd/ceph-0
> >>     inodes:
> >>     /dev/sde1      183148544 55654 183092890
> >> 1% /var/lib/ceph/osd/ceph-0 ---
> >>
> >>     On a more fragmented cluster I see a 5:1 size to inode ratio.
> >>
> >>     I just can't fathom how there could be 3.3 million inodes (and
> >> thus a close number of files) using 30G, making the average file size
> >> below 10 Bytes.
> >>
> >>     Something other than your choice of file system is probably at
> >> play here.
> >>
> >>     How fragmented are those SSDs?
> >>     What's your default Ceph object size?
> >>     Where _are_ those 3 million files in that OSD, are they actually
> >> in the object files like:
> >>     -rw-r--r-- 1 root root 4194304 Jan  9
> >> 15:27 /var/lib/ceph/osd/ceph-0/current/3.117_head/DIR_7/DIR_1/DIR_5/rb.0.23a8f.238e1f29.000000027632__head_C4F3D517__3
> >>
> >>     What's your use case, RBD, CephFS, RadosGW?
> >>
> >>     Regards,
> >>
> >>     Christian
> >>
> >>     On Mon, 23 Mar 2015 10:32:55 +0300 Kamil Kuramshin wrote:
> >>
> >>>     Recently got a problem with OSDs based on SSD disks used in
> >>> cache tier for EC-pool
> >>>
> >>>     superuser@node02:~$ df -i
> >>>     Filesystem                    Inodes   IUsed *IFree* IUse%
> >>> Mounted on <...>
> >>>     /dev/sdb1                    3335808 3335808 *0* 100%
> >>>     /var/lib/ceph/osd/ceph-45
> >>>     /dev/sda1                    3335808 3335808 *0* 100%
> >>>     /var/lib/ceph/osd/ceph-46
> >>>
> >>>     Now that OSDs are down on each ceph-node and cache tiering is not
> >>>     working.
> >>>
> >>>     superuser@node01:~$ sudo tail /var/log/ceph/ceph-osd.45.log
> >>>     2015-03-23 10:04:23.631137 7fb105345840  0 ceph version 0.87.1
> >>>     (283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd,
> >>> pid 1453465 2015-03-23 10:04:23.640676 7fb105345840  0
> >>>     filestore(/var/lib/ceph/osd/ceph-45) backend generic (magic
> >>> 0xef53) 2015-03-23 10:04:23.640735 7fb105345840 -1
> >>>     genericfilestorebackend(/var/lib/ceph/osd/ceph-45)
> >>> detect_features: unable to
> >>> create /var/lib/ceph/osd/ceph-45/fiemap_test: (28) No space left on
> >>> device 2015-03-23 10:04:23.640763 7fb105345840 -1
> >>>     filestore(/var/lib/ceph/osd/ceph-45) _detect_fs: detect_features
> >>> error: (28) No space left on device
> >>>     2015-03-23 10:04:23.640772 7fb105345840 -1
> >>>     filestore(/var/lib/ceph/osd/ceph-45) FileStore::mount : error in
> >>>     _detect_fs: (28) No space left on device
> >>>     2015-03-23 10:04:23.640783 7fb105345840 -1  ** ERROR: error
> >>> converting store /var/lib/ceph/osd/ceph-45: (28) *No space left on
> >>> device*
> >>>
> >>>     In the same time*df -h *is confusing:
> >>>
> >>>     superuser@node01:~$ df -h
> >>>     Filesystem                  Size  Used *Avail* Use% Mounted on
> >>>     <...>
> >>>     /dev/sda1                    50G   29G *20G*
> >>>     60% /var/lib/ceph/osd/ceph-45 /dev/sdb1                    50G
> >>> 27G *21G*  56% /var/lib/ceph/osd/ceph-46
> >>>
> >>>
> >>>     Filesystem used on affected OSDs is EXt4. All OSDs are deployed
> >>> with ceph-deploy:
> >>>     $ ceph-deploy osd create --zap-disk --fs-type ext4
> >>> <node-name>:<device>
> >>>
> >>>
> >>>     Help me out what it was just test deployment and all EC-pool
> >>> data was lost since I /can't start OSDs/ and ceph cluster/becames
> >>> degraded /until I removed all affected tiered pools (cache & EC)
> >>>     So this is just my observation of what kind of problems can be
> >>> faced if you choose wrong Filesystem for OSD backend.
> >>>     And now I *strongly* recommend you to choose*XFS* or *Btrfs*
> >>> filesystems because both are supporting dynamic inode allocation and
> >>> this problem can't arise with them.
> >>>
> >>>
> >
> >
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com