Re: Issue with free Inodes

Kamil Kuramshin <kamil.kuramshin@xxxxxxxx> · Mon, 23 Mar 2015 15:26:07 +0300

    Yes, I understand that. 

    The initial purpose of first email was just an advise for new
    comers. My fault was in that I was selected ext4 for SSD disks as
    backend.

    But I  did not foresee that inode number can reach its limit before
    the free space :)

    And maybe there must be some sort of warning not only for free space
    in MiBs(GiBs,TiBs) and there must be dedicated warning about free
    inodes for filesystems with static inode allocation  like ext4.

    Because if OSD reach inode limit it becames totally unusable and
    immediately goes down, and from that moment there is no way to start
    it!

    23.03.2015 13:42, Thomas Foster пишет:

      You
          could fix this by changing your block size when formatting the
          mount-point with the mkfs -b command.  I had this same issue
          when dealing with the filesystem using glusterfs and the
          solution is to either use a filesystem that allocates inodes
          automatically or change the block size when you build the
          filesystem.  Unfortunately, the only way to fix the problem
          that I have seen is to reformat

        On Mon, Mar 23, 2015 at 5:51 AM, Kamil
          Kuramshin <kamil.kuramshin@xxxxxxxx>
          wrote:

             In my case there was
              cache pool for ec-pool serving RBD-images, and object size
              is 4Mb, and client was an kernel-rbd client

              each SSD disk is 60G disk, 2 disk per node,  6 nodes in
              total = 12 OSDs in total

              23.03.2015 12:00, Christian Balzer пишет:

                    Hello,

This is rather confusing, as cache-tiers are just normal OSDs/pools and
thus should have Ceph objects of around 4MB in size by default.

This is matched on what I see with Ext4 here (normal OSD, not a cache
tier):
---
size:
/dev/sde1       2.7T  204G  2.4T   8% /var/lib/ceph/osd/ceph-0
inodes:
/dev/sde1      183148544 55654 183092890    1% /var/lib/ceph/osd/ceph-0
---

On a more fragmented cluster I see a 5:1 size to inode ratio.

I just can't fathom how there could be 3.3 million inodes (and thus a
close number of files) using 30G, making the average file size below 10
Bytes. 

Something other than your choice of file system is probably at play here.

How fragmented are those SSDs?
What's your default Ceph object size?
Where _are_ those 3 million files in that OSD, are they actually in the
object files like:
-rw-r--r-- 1 root root 4194304 Jan  9 15:27 /var/lib/ceph/osd/ceph-0/current/3.117_head/DIR_7/DIR_1/DIR_5/rb.0.23a8f.238e1f29.000000027632__head_C4F3D517__3

What's your use case, RBD, CephFS, RadosGW?

Regards,

Christian

On Mon, 23 Mar 2015 10:32:55 +0300 Kamil Kuramshin wrote:

                      Recently got a problem with OSDs based on SSD disks used in cache tier 
for EC-pool

superuser@node02:~$ df -i
Filesystem                    Inodes   IUsed *IFree* IUse% Mounted on
<...>
/dev/sdb1                    3335808 3335808 *0* 100% 
/var/lib/ceph/osd/ceph-45
/dev/sda1                    3335808 3335808 *0* 100% 
/var/lib/ceph/osd/ceph-46

Now that OSDs are down on each ceph-node and cache tiering is not
working.

superuser@node01:~$ sudo tail /var/log/ceph/ceph-osd.45.log
2015-03-23 10:04:23.631137 7fb105345840  0 ceph version 0.87.1 
(283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd, pid 1453465
2015-03-23 10:04:23.640676 7fb105345840  0 
filestore(/var/lib/ceph/osd/ceph-45) backend generic (magic 0xef53)
2015-03-23 10:04:23.640735 7fb105345840 -1 
genericfilestorebackend(/var/lib/ceph/osd/ceph-45) detect_features: 
unable to create /var/lib/ceph/osd/ceph-45/fiemap_test: (28) No space 
left on device
2015-03-23 10:04:23.640763 7fb105345840 -1 
filestore(/var/lib/ceph/osd/ceph-45) _detect_fs: detect_features error: 
(28) No space left on device
2015-03-23 10:04:23.640772 7fb105345840 -1 
filestore(/var/lib/ceph/osd/ceph-45) FileStore::mount : error in 
_detect_fs: (28) No space left on device
2015-03-23 10:04:23.640783 7fb105345840 -1  ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-45: (28) *No space left on device*

In the same time*df -h *is confusing:

superuser@node01:~$ df -h
Filesystem                  Size  Used *Avail* Use% Mounted on
<...>
/dev/sda1                    50G   29G *20G*
60% /var/lib/ceph/osd/ceph-45 /dev/sdb1                    50G   27G
*21G*  56% /var/lib/ceph/osd/ceph-46

Filesystem used on affected OSDs is EXt4. All OSDs are deployed with 
ceph-deploy:
$ ceph-deploy osd create --zap-disk --fs-type ext4 <node-name>:<device>

Help me out what it was just test deployment and all EC-pool data was 
lost since I /can't start OSDs/ and ceph cluster/becames degraded /until 
I removed all affected tiered pools (cache & EC)
So this is just my observation of what kind of problems can be faced if 
you choose wrong Filesystem for OSD backend.
And now I *strongly* recommend you to choose*XFS* or *Btrfs* filesystems 
because both are supporting dynamic inode allocation and this problem 
can't arise with them.

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com