On Mon, 02 Aug 2021, Martin Steigerwald wrote: > Hi Neil! > > Wow, this is a bit overwhelming for me. However, I got a very specific > question for userspace developers in order to probably provide valuable > input to the KDE Baloo desktop search developers: > > NeilBrown - 02.08.21, 06:18:29 CEST: > > The "obvious" choice for a replacement is the file handle provided by > > name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't > > supported by the filesystem). This returns an extensible opaque > > byte-array. It is *already* more reliable than st_ino. Comparing > > st_ino is only a reliable way to check if two files are the same if > > you have both of them open. If you don't, then one of the files > > might have been deleted and the inode number reused for the other. A > > filehandle contains a generation number which protects against this. > > > > So I think we need to strongly encourage user-space to start using > > name_to_handle_at() whenever there is a need to test if two things are > > the same. > > How could that work for Baloo's use case to see whether a file it > encounters is already in its database or whether it is a new file. > > Would Baloo compare the whole file handle or just certain fields or make a > hash of the filehandle or what ever? Could you, in pseudo code or > something, describe the approach you'd suggest. I'd then share it on: Yes, the whole filehandle. struct file_handle { unsigned int handle_bytes; /* Size of f_handle [in, out] */ int handle_type; /* Handle type [out] */ unsigned char f_handle[0]; /* File identifier (sized by caller) [out] */ }; i.e. compare handle_type, handle_bytes, and handle_bytes worth of f_handle. This file_handle is local to the filesytem. Two different filesystems can use the same filehandle for different files. So the identity of the filesystem need to be combined with the file_handle. > > Bug 438434 - Baloo appears to be indexing twice the number of files than > are actually in my home directory > > https://bugs.kde.org/438434 This bug wouldn't be address by using the filehandle. Using a filehandle allows you to compare two files within a single filesystem. This bug is about comparing two filesystems either side of a reboot, to see if they are the same. As has already been mentioned in that bug, statfs().f_fsid is the best solution (unless comparing the mount point is satisfactory). NeilBrown