Re: [PATCH 1/3] revision: complicated pathspecs disable filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Apr 2020 at 20:37, Derrick Stolee <stolee@xxxxxxxxx> wrote:
[...]
> -->8--
> From 89beb9598daabb19e3c896bbceeb0fc1b9ccc6ca Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> Date: Wed, 15 Apr 2020 18:04:25 +0000
> Subject: [PATCH] bloom: compute all Bloom hashes from lowercase
>
> The changed-path Bloom filters currently hash path strings using
> the exact string for the path. This makes it difficult* to use the
> filters when restricting to case-insensitive pathspecs.
>
> * I say "difficult" because it is possible to generate all 2^n
>   options for the case of a path and test them all, but this is
>   a bad idea and should not be done. "Impossible" is an appropriate
>   alternative.
>
> THIS IS A BREAKING CHANGE. Commit-graph files with changed-path
> Bloom filters computed by a previous commit will not be compatible
> with the filters computed in this commit, nor will we get correct
> results when testing across these incompatible versions. Normally,
> this would be a completely unacceptable change, but the filters
> have not been released and hence are still possible to update
> before release.
>
> TODO: If we decide to move in this direction, then the following
> steps should be done (and some of them should be done anyway):
>
> * We need to document the Bloom filter format to specify exactly
>   how we compute the filter data. The details should be careful
>   enough that someone can reproduce the exact file format without
>   looking at the C code.
>
> * That document would include the tolower() transformation that is
>   being done here.

Why not modify the BDAT chunk to include version of
case folding transformation or other collation algorithm
(other transformation).that is done prior to computing
the Bloom filter key? Though that might be unnecessary
flexibility...

For example the value of 0x00 in such field of BDAT
chunk header would mean no transformation, while
the value of 0x01 would mean per-character tolower()
or Unicode equivalent of it.

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux