Re: Assumption on fixed device numbers in Plasma's desktop search Baloo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2021/6/26 上午3:06, Martin Steigerwald wrote:
Hi!

I found repeatedly that Baloo indexes the same files twice or even more
often after a while.

I reported this upstream in:

Bug 438434 - Baloo appears to be indexing twice the number of files than
are actually in my home directory

https://bugs.kde.org/show_bug.cgi?id=438434

And got back that if the device number changes, Baloo will think it has
new files even tough the path is still the same. And found over time that
the device number for the single BTRFS filesystem on a NVMe SSD in a
ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do
have BTRFS RAID 1 in another laptop and there I also had this issue
already.

Since btrfs has multi-device support by default, it reports anonymous
device number, just as if you use a filesystem over LVM.

The problem is why the anonymous device number change.

If the fs is always mounted at a fixed sequence with fixed
snapshots/subvolume mount, it should not get a new anonymous device number.

But if snapshots or new subvolumes are involved, or just
mounting/reading subvolumes in different order, then the device number
for each subvolume will change.


I argued that a desktop application has no business to rely on a device
number and got back that search/indexing is in the middle between an
application and system software. And that Baloo needs an "invariant" for
a file. See comment #11 of that bug report:

https://bugs.kde.org/show_bug.cgi?id=438434#c11

Well, a lot of tools relies on device number to distinguish filesystem
boundary, like find.
Thus it's a little hard to argue.

But on the other hand, it also means baloo can't handle regular fs over
LVM cases well neither.


I got the suggestion to try to find a way to tell the kernel to use a
fixed device number.

I don't think it's possible for btrfs, as each subvolume get its
anonymous device number assigned when it gets first read.

Thus it's really hard to make it fixed, as the reason for anonymous
device number is to avoid conflicts.


I still think, an application or an infrastructure service for a desktop
environment or even anything else in user space should not rely on a
device number to be fixed and never change upon reboots.

Well, LVM/device mapper is doing the same thing, a lot of behavior
change is never a good idea for the kernel.

Thus for use cases where we really need a proper mapping, we use hashes,
not just device number, like what we did in dupremover.


But maybe you have a different idea about that and it is okay for an
userspace component to do that. I would like to hear your idea about
that.

Another question would be whether I could somehow make sure that the
device number does not change, even if just as a work-around.

If you really just want a fixed device number, you can ensure that by:

- Make sure all users of anonymous devices get fixed sequence
  Things like device mapper/LVM, btrfs should get loaded/initialized
  in a fixed order.

- Make sure the subvolume you care always get mounted/read before any
  other subvolumes
  So that the target subvolume always get the first device number in the
  pool.

  But this also means, all later subvolumes not in the fixed mount/read
  sequence can not get a fixed number.

Thanks,
Qu

I know for
NFS there is a fsid= mount option, but it does not appear to be
something generic, at least the mount man page seems to have nothing
related to fsid.


Best,





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux