On Tue, Jul 02, 2024 at 06:59:56PM -0700, Darrick J. Wong wrote: > > > $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt > > > $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt > > > > Should this be automated? > > That will require a bit more thought -- many distro build systems these > days operate in a sealed box with no network access, so you can't really > automate this. libicu (the last time I looked) didn't have a predicate > to tell you if a particular code point was one of the invisible ones. Oh, I absolutely do not suggest to run the wget from a normal build! But if you look at the kernel unicode CI support, it allows you to place the downloaded file into the kernel tree, and then case make file rules to re-generate the tables from it (see fs/unicode/Makefile).