On Thu, Oct 31, 2024 at 03:56:40PM GMT, наб wrote: > On Thu, Oct 31, 2024 at 09:58:19AM +0100, Karel Zak wrote: > > On Mon, Oct 28, 2024 at 07:19:30PM GMT, наб wrote: > > > --list-duplicates codifies what everyone keeps re-implementing with > > > find -exec b2sum or src:perforate's finddup or whatever. > > > > > > hardlink already knows this, so make the data available thusly, > > > in a format well-suited for pipeline processing > > > (fixed-width key for uniq/cut/&c., > > > tab delimiter for cut &a., > > > -z for correct filename handling). > > > > Why do we need a 16-byte discriminator? The list consists of absolute > > paths, so it should be unique enough. This seems like an unusual > > thing, > Well, the point is to have a list of lists of files, right. > hardlink(1) finds, within the given domain, > a set of sets of "these files are identical" > (or, the logical set of "these are the link names of this file" > for all eligible files). > The only way to flatten this is to a single-layer list is by having a > list of filenames discriminated by the set in which they belong, so > [[a, b], [c, d, e]] > discriminated as > 0 a > 0 b > 1 c > 1 d > 1 e > which allows you to reconstuct the sets live while stream-processing > (the implementation uses a unique ASLR-randomised discriminator > because the order isn't stable anyway I think? but same difference). > > A list of just filenames is useless. I see, thanks. > On Thu, Oct 31, 2024 at 09:51:00AM +0100, Karel Zak wrote: > > The new option should also be added to the "bash-completion/hardlink" > > file. However, I can fix this after merging locally. > I missed this. I'll include it in v2 if we get to v2 but if we don't, > please do, thanks. Merged and bash-completion updated. Karel -- Karel Zak <kzak@xxxxxxxxxx> http://karelzak.blogspot.com