Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> writes: > Could we just have a lookup table index extension for identifying the > duplicates (when checking is enabled using core configuration option #3324)? > That table would keep a mapping from a normalized form (maybe include > canonical encoding while we're at it) to the actual octet sequence(s) used. I would agree that the index extension, if we ever are going to do this, would be the right place to store this information, at the single repository level. However, this opens up a can of worms. What's the canonical key should be? If you want to protect yourself from a unicode normalizing filesystem, you would use one canonicalization, while if you want to protect from a case losing filesystem you would use another? Or do we at the same time downcase and NFD normalize at the same time and be done with it? And where should the configuration be stored? If a project wants to be interoperable across Linux and vfat, for example, that canonicalization needs to be enabled in repositories of all participants, be they on Linux or vfat, so that people on Linux can be prevented from creating and register two files xt_mark.c and xt_MARK.c in the same directory, so that people who extract the source on vfat won't have troubles. Which means the information needs to be in-tree. But that should not be in .gitattributes (which by definition is for per-path things). - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html