Re: Issue with free Inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes I read it and do no not understand what you mean when say *verify this*?
All 3335808 inodes are definetly files and direcories created by ceph OSD process:

tune2fs 1.42.5 (29-Jul-2012)
Filesystem volume name:   <none>
Last mounted on:          /var/lib/ceph/tmp/mnt.05NAJ3
Filesystem UUID:          e4dcca8a-7b68-4f60-9b10-c164dc7f9e33
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              3335808
Block count:              13342945
Reserved block count:     667147
Free blocks:              5674105
Free inodes:              0
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1020
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8176
Inode blocks per group:   511
Flex block group size:    16
Filesystem created:       Fri Feb 20 16:44:25 2015
Last mount time:          Tue Mar 24 09:33:19 2015
Last write time:          Tue Mar 24 09:33:27 2015
Mount count:              7
Maximum mount count:      -1
Last checked:             Fri Feb 20 16:44:25 2015
Check interval:           0 (<none>)
Lifetime writes:          4116 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      148ee5dd-7ee0-470c-a08a-b11c318ff90b
Journal backup:           inode blocks


fsck.ext4 /dev/sda1
e2fsck 1.42.5 (29-Jul-2012)
/dev/sda1: clean, 3335808/3335808 files, 7668840/13342945 blocks


23.03.2015 17:09, Christian Balzer пишет:
On Mon, 23 Mar 2015 15:26:07 +0300 Kamil Kuramshin wrote:

Yes, I understand that.

The initial purpose of first email was just an advise for new comers. My 
fault was in that I was selected ext4 for SSD disks as backend.
But I  did not foresee that inode number can reach its limit before the 
free space :)

And maybe there must be some sort of warning not only for free space in 
MiBs(GiBs,TiBs) and there must be dedicated warning about free inodes 
for filesystems with static inode allocation  like ext4.
Because if OSD reach inode limit it becames totally unusable and 
immediately goes down, and from that moment there is no way to start it!

While all that is true and should probably be addressed, please re-read
what I wrote before.

With the 3.3 million inodes used and thus likely as many files (did you
verify this?) and 4MB objects that would make something in the 12TB
ballpark area.

Something very very strange and wrong is going on with your cache tier.

Christian

23.03.2015 13:42, Thomas Foster пишет:
You could fix this by changing your block size when formatting the 
mount-point with the mkfs -b command.  I had this same issue when 
dealing with the filesystem using glusterfs and the solution is to 
either use a filesystem that allocates inodes automatically or change 
the block size when you build the filesystem.  Unfortunately, the only 
way to fix the problem that I have seen is to reformat

On Mon, Mar 23, 2015 at 5:51 AM, Kamil Kuramshin 
<kamil.kuramshin@xxxxxxxx <mailto:kamil.kuramshin@xxxxxxxx>> wrote:

    In my case there was cache pool for ec-pool serving RBD-images,
    and object size is 4Mb, and client was an /kernel-rbd /client
    each SSD disk is 60G disk, 2 disk per node,  6 nodes in total = 12
    OSDs in total


    23.03.2015 12:00, Christian Balzer пишет:
    Hello,

    This is rather confusing, as cache-tiers are just normal
OSDs/pools and thus should have Ceph objects of around 4MB in size by
default.

    This is matched on what I see with Ext4 here (normal OSD, not a
cache tier):
    ---
    size:
    /dev/sde1       2.7T  204G  2.4T   8% /var/lib/ceph/osd/ceph-0
    inodes:
    /dev/sde1      183148544 55654 183092890
1% /var/lib/ceph/osd/ceph-0 ---

    On a more fragmented cluster I see a 5:1 size to inode ratio.

    I just can't fathom how there could be 3.3 million inodes (and
thus a close number of files) using 30G, making the average file size
below 10 Bytes.

    Something other than your choice of file system is probably at
play here.

    How fragmented are those SSDs?
    What's your default Ceph object size?
    Where _are_ those 3 million files in that OSD, are they actually
in the object files like:
    -rw-r--r-- 1 root root 4194304 Jan  9
15:27 /var/lib/ceph/osd/ceph-0/current/3.117_head/DIR_7/DIR_1/DIR_5/rb.0.23a8f.238e1f29.000000027632__head_C4F3D517__3

    What's your use case, RBD, CephFS, RadosGW?

    Regards,

    Christian

    On Mon, 23 Mar 2015 10:32:55 +0300 Kamil Kuramshin wrote:

    Recently got a problem with OSDs based on SSD disks used in
cache tier for EC-pool

    superuser@node02:~$ df -i
    Filesystem                    Inodes   IUsed *IFree* IUse%
Mounted on <...>
    /dev/sdb1                    3335808 3335808 *0* 100%
    /var/lib/ceph/osd/ceph-45
    /dev/sda1                    3335808 3335808 *0* 100%
    /var/lib/ceph/osd/ceph-46

    Now that OSDs are down on each ceph-node and cache tiering is not
    working.

    superuser@node01:~$ sudo tail /var/log/ceph/ceph-osd.45.log
    2015-03-23 10:04:23.631137 7fb105345840  0 ceph version 0.87.1
    (283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd,
pid 1453465 2015-03-23 10:04:23.640676 7fb105345840  0
    filestore(/var/lib/ceph/osd/ceph-45) backend generic (magic
0xef53) 2015-03-23 10:04:23.640735 7fb105345840 -1
    genericfilestorebackend(/var/lib/ceph/osd/ceph-45)
detect_features: unable to
create /var/lib/ceph/osd/ceph-45/fiemap_test: (28) No space left on
device 2015-03-23 10:04:23.640763 7fb105345840 -1
    filestore(/var/lib/ceph/osd/ceph-45) _detect_fs: detect_features
error: (28) No space left on device
    2015-03-23 10:04:23.640772 7fb105345840 -1
    filestore(/var/lib/ceph/osd/ceph-45) FileStore::mount : error in
    _detect_fs: (28) No space left on device
    2015-03-23 10:04:23.640783 7fb105345840 -1  ** ERROR: error
converting store /var/lib/ceph/osd/ceph-45: (28) *No space left on
device*

    In the same time*df -h *is confusing:

    superuser@node01:~$ df -h
    Filesystem                  Size  Used *Avail* Use% Mounted on
    <...>
    /dev/sda1                    50G   29G *20G*
    60% /var/lib/ceph/osd/ceph-45 /dev/sdb1                    50G
27G *21G*  56% /var/lib/ceph/osd/ceph-46


    Filesystem used on affected OSDs is EXt4. All OSDs are deployed
with ceph-deploy:
    $ ceph-deploy osd create --zap-disk --fs-type ext4
<node-name>:<device>


    Help me out what it was just test deployment and all EC-pool
data was lost since I /can't start OSDs/ and ceph cluster/becames
degraded /until I removed all affected tiered pools (cache & EC)
    So this is just my observation of what kind of problems can be
faced if you choose wrong Filesystem for OSD backend.
    And now I *strongly* recommend you to choose*XFS* or *Btrfs*
filesystems because both are supporting dynamic inode allocation and
this problem can't arise with them.



    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



      


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux