On 2021/6/26 上午3:06, Martin Steigerwald wrote:
Hi! I found repeatedly that Baloo indexes the same files twice or even more often after a while. I reported this upstream in: Bug 438434 - Baloo appears to be indexing twice the number of files than are actually in my home directory https://bugs.kde.org/show_bug.cgi?id=438434 And got back that if the device number changes, Baloo will think it has new files even tough the path is still the same. And found over time that the device number for the single BTRFS filesystem on a NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I also had this issue already.
Since btrfs has multi-device support by default, it reports anonymous device number, just as if you use a filesystem over LVM. The problem is why the anonymous device number change. If the fs is always mounted at a fixed sequence with fixed snapshots/subvolume mount, it should not get a new anonymous device number. But if snapshots or new subvolumes are involved, or just mounting/reading subvolumes in different order, then the device number for each subvolume will change.
I argued that a desktop application has no business to rely on a device number and got back that search/indexing is in the middle between an application and system software. And that Baloo needs an "invariant" for a file. See comment #11 of that bug report: https://bugs.kde.org/show_bug.cgi?id=438434#c11
Well, a lot of tools relies on device number to distinguish filesystem boundary, like find. Thus it's a little hard to argue. But on the other hand, it also means baloo can't handle regular fs over LVM cases well neither.
I got the suggestion to try to find a way to tell the kernel to use a fixed device number.
I don't think it's possible for btrfs, as each subvolume get its anonymous device number assigned when it gets first read. Thus it's really hard to make it fixed, as the reason for anonymous device number is to avoid conflicts.
I still think, an application or an infrastructure service for a desktop environment or even anything else in user space should not rely on a device number to be fixed and never change upon reboots.
Well, LVM/device mapper is doing the same thing, a lot of behavior change is never a good idea for the kernel. Thus for use cases where we really need a proper mapping, we use hashes, not just device number, like what we did in dupremover.
But maybe you have a different idea about that and it is okay for an userspace component to do that. I would like to hear your idea about that. Another question would be whether I could somehow make sure that the device number does not change, even if just as a work-around.
If you really just want a fixed device number, you can ensure that by: - Make sure all users of anonymous devices get fixed sequence Things like device mapper/LVM, btrfs should get loaded/initialized in a fixed order. - Make sure the subvolume you care always get mounted/read before any other subvolumes So that the target subvolume always get the first device number in the pool. But this also means, all later subvolumes not in the fixed mount/read sequence can not get a fixed number. Thanks, Qu
I know for NFS there is a fsid= mount option, but it does not appear to be something generic, at least the mount man page seems to have nothing related to fsid. Best,