Re: Assumption on fixed device numbers in Plasma's desktop search Baloo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2021/6/26 下午4:49, Martin Steigerwald wrote:
Qu Wenruo - 26.06.21, 02:27:54 CEST:
On 2021/6/26 上午3:06, Martin Steigerwald wrote:
Hi!

I found repeatedly that Baloo indexes the same files twice or even
more often after a while.

I reported this upstream in:

Bug 438434 - Baloo appears to be indexing twice the number of files
than are actually in my home directory

https://bugs.kde.org/show_bug.cgi?id=438434

And got back that if the device number changes, Baloo will think it
has new files even tough the path is still the same. And found over
time that the device number for the single BTRFS filesystem on a
NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe
yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I
also had this issue already.

Since btrfs has multi-device support by default, it reports anonymous
device number, just as if you use a filesystem over LVM.

Ah, this!

I forgot to mention that: I use BTRFS on top of LVM on top of LUKS based
dm-crypt on a partition on the NVMe SSD. Sorry, somehow I forgot to
mention that here. I mentioned it in the bug report. I'd use a different
approach if there would be one that give me full disk encryption. I am
not willing to use ecryptfs on top of BTRFS and as far as I know BTRFS
cannot yet encrypt by itself.

I still think this could give a fixed order of loading:

1. Unlock LUKS.

2. Activate LVM logical volumes. No idea whether that happens in a fixed
order though or whether it can have a different order on each boot.

LVM/LUKS normally isn't a big deal, as most of them are initialized
before btrfs, and have a pretty fixed initialization sequence.

Unless you change the LVM setup, then at least all your LVs should have
a fixed device number.
(But there are still cases where kernel update may change them)


3. Mount BTRFS. /home is always on the same subvolume. So that should
not change.

Normally it won't change.

But it's more dependent on the btrfs behavior.

Thus I'm not that confident it won't change forever.

But at this point I guess you already get the point, under normal cases,
no config change then device number won't change.

However any change in kernel/storage stack/config can lead to different
device number.


The problem is why the anonymous device number change.

Good question. Maybe I have an idea about that. See below.

I argued that a desktop application has no business to rely on a
device number and got back that search/indexing is in the middle
between an application and system software. And that Baloo needs an
"invariant" for a file. See comment #11 of that bug report:

https://bugs.kde.org/show_bug.cgi?id=438434#c11

Well, a lot of tools relies on device number to distinguish filesystem
boundary, like find.
Thus it's a little hard to argue.

But on the other hand, it also means baloo can't handle regular fs
over LVM cases well neither.

Yes. Also it could not handle the case of a driver loading race
condition with two or more different controllers in a desktop machine.

Thus the idea from Neil should help, instead of using device number,
using f_fsid from statfs() should provide a way more stable result.

And f_fsid can also handle btrfs subvolumes pretty well.

But this also means, if one day you change your default/mounted
subvolume, baloo will again rebuild the cache using the new f_fsid.


I got the suggestion to try to find a way to tell the kernel to use
a fixed device number.

I don't think it's possible for btrfs, as each subvolume get its
anonymous device number assigned when it gets first read.

Thus it's really hard to make it fixed, as the reason for anonymous
device number is to avoid conflicts.

Fair enough.

I still think, an application or an infrastructure service for a
desktop environment or even anything else in user space should not
rely on a device number to be fixed and never change upon reboots.

Well, LVM/device mapper is doing the same thing, a lot of behavior
change is never a good idea for the kernel.

Thus for use cases where we really need a proper mapping, we use
hashes, not just device number, like what we did in dupremover.

I think I suggested that some time ago.

Another question would be whether I could somehow make sure that the
device number does not change, even if just as a work-around.

If you really just want a fixed device number, you can ensure that by:

- Make sure all users of anonymous devices get fixed sequence
    Things like device mapper/LVM, btrfs should get loaded/initialized
    in a fixed order.

Ah, I see.

- Make sure the subvolume you care always get mounted/read before any
    other subvolumes
    So that the target subvolume always get the first device number in
the pool.

Hmm, that may be a pointer. This is what I currently have in fstab:

/dev/nvme/home /home btrfs lazytime,compress=zstd 0 0
/dev/nvme/home /zeit/home btrfs subvol=zeit 0 0

In the first line the default subvolume is used which I changed
accordingly after creating this BTRFS. I use the approach to keep
(temporary) snapshots separated from the directory tree in /home.

Could it be that this order between these two mounts is not the same on
every boot?
I use Devuan with Runit, so the mounting would happen by
some init scripts (instead of Systemd).

Then it's out of the scope of btrfs.

I was just wondering if systemd is involved, but you just ruled it out.
But still if the init tool choose to shuffle the mount sequence to do
more parallel mounts, then device number will be even more unreliable.


I am not aware of an option for fstab to mount this one first and then
the other second, but I could set the second mount to noauto and mount
it when I need it.

    But this also means, all later subvolumes not in the fixed
mount/read sequence can not get a fixed number.

I somehow thought this would get complicated.

It's already complicated.

So this just proves Neil is right, device number is only reliable at the
lifespan of the fs, nothing else.

Thanks,
Qu


Best,





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux