Re: Inodes on /cephfs

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Wed, 1 May 2019 01:23:57 +0200

Am 01.05.19 um 00:51 schrieb Patrick Donnelly:
> On Tue, Apr 30, 2019 at 8:01 AM Oliver Freyermuth
> <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> Dear Cephalopodians,
>>
>> we have a classic libvirtd / KVM based virtualization cluster using Ceph-RBD (librbd) as backend and sharing the libvirtd configuration between the nodes via CephFS
>> (all on Mimic).
>>
>> To share the libvirtd configuration between the nodes, we have symlinked some folders from /etc/libvirt to their counterparts on /cephfs,
>> so all nodes see the same configuration.
>> In general, this works very well (of course, there's a "gotcha": Libvirtd needs reloading / restart for some changes to the XMLs, we have automated that),
>> but there is one issue caused by Yum's cleverness (that's on CentOS 7). Whenever there's a libvirtd update, unattended upgrades fail, and we see:
>>
>>    Transaction check error:
>>      installing package libvirt-daemon-driver-network-4.5.0-10.el7_6.7.x86_64 needs 2 inodes on the /cephfs filesystem
>>      installing package libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.7.x86_64 needs 18 inodes on the /cephfs filesystem
>>
>> So it seems yum follows the symlinks and checks the available inodes on /cephfs. Sadly, that reveals:
>>    [root@kvm001 libvirt]# LANG=C df -i /cephfs/
>>    Filesystem     Inodes IUsed IFree IUse% Mounted on
>>    ceph-fuse          68    68     0  100% /cephfs
>>
>> I think that's just because there is no real "limit" on the maximum inodes on CephFS. However, returning 0 breaks some existing tools (notably, Yum).
>>
>> What do you think? Should CephFS return something different than 0 here to not break existing tools?
>> Or should the tools behave differently? But one might also argue that if the total number of Inodes matches the used number of Inodes, the FS is indeed "full".
>> It's just unclear to me who to file a bug against ;-).
>>
>> Right now, I am just using:
>> yum -y --setopt=diskspacecheck=0 update
>> as a manual workaround, but this is naturally rather cumbersome.
> 
> This is fallout from [1]. See discussion on setting f_free to 0 here
> [2]. In summary, userland tools are trying to be too clever by looking
> at f_free. [I could be convinced to go back to f_free = ULONG_MAX if
> there are other instances of this.]
> 
> [1] https://github.com/ceph/ceph/pull/23323
> [2] https://github.com/ceph/ceph/pull/23323#issuecomment-409249911

Thanks for the references! That certainly enlightens me on why this decision was taken, and of course I congratulate upon trying to prevent false monitoring. 
Still, even though I don't have other instances at hand (yet), I am not yet convinced "0" is a better choice than "ULONG_MAX". 
It certainly alerts users / monitoring software about doing something wrong, but it prevents a check which any file system (or rather, any file system I encountered so far) allows. 

Yum (or other package managers doing things in a safe manner) need to ensure they can fully install a package in an "atomic" way before doing so,
since rolling back may be complex or even impossible (for most file systems). So they need a way to check if a file system can store the additional files in terms of space and inodes, before placing the data there,
or risk installing something only partially, and potentially being unable to roll back. 

In most cases, the free number of inodes allows for that check. Of course, that has no (direct) meaning for CephFS, so one might argue the tools should add an exception for CephFS - 
but as the discussion correctly stated, there's no defined way to find out where the file system has a notion of "free inodes", and - if we go for an exceptional treatment for a list of file systems - 
not even a "clean" way to find out if the file system is CephFS (the tools will only see it is FUSE for ceph-fuse) [1]. 

So my question is: 
How are tools which need to ensure that a file system can accept a given number of bytes and inodes before actually placing the data there check that in case of CephFS? 
And if they should not, how do they find out that this check which is valid on e.g. ext4 is not useful on CephFS? 
(or, in other words: if I would file a bug report against Yum, I could not think of any implementation they could make to solve this issue)

Of course, if it's just us, we can live with the workaround. We monitor space consumption on all file systems, and may start monitoring free inodes on our ext4 file systems, 
such that we can safely disable the Yum check on the affected nodes. 
But I wonder whether this is the best way to go (it prevents a valid use case of a package manager, and there seems to be no clean way to fix it inside Yum that I am aware of). 

Hence, my personal preference would be ULONG_MAX, but of course feel free to stay with 0. If nobody else complains, it's probably a non-issue for other users ;-). 

Cheers,
	Oliver

[1] https://github.com/ceph/ceph/pull/23323#issuecomment-409249911

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com