On Wed, May 18 2022 at 09:42, Allison Randal wrote: > On 5/17/22 7:31 PM, Thomas Gleixner wrote: > I actually thought you just ran out of easily scriptable fixes, but it's > nice to hear that there's still substantially more we can do with > scancode rules. I ran out of cycles :) > With the auto-generated patches, you will probably need to rate-limit > like you did in 2019, since the tools can generate patches far more > rapidly than the humans can review them. Sure. > If you have the time and energy to do another burst, go for it. I don't > know that we'll ever get to 100%, but every file we clean up is helpful, > so it's worth continuing. I started to get some structure into this mess. For the first step I excluded the Documentation directory unless files in that, which fit into match rules applying to source files. I'll tend to the Documentation directory in a seperate step. Then I categorized the remaining match rules into the following: Nr Category Rules Files affected 1 GPLv2[+] 141 1607 2 GPL unknown 84 1663 3 MIT 28 3275 4 GPLv2/MIT 2 36 5 BSD 20 114 6 GPL/BSD 32 1004 7 ISC 4 343 8 X11 1 3 9 Other 9 50 10 Unclear 63 916 11 Unknown 78 321 12 Nasty 16 48 13 Bogus 21 861 #1 Pretty clear GPLv2[or later] and LGPL matches. #2 The nasty 'under GPL' ones. Quite some of them reference COPYING #3-9 Pretty clear matches for MIT/BSD/ISC/X11/ZLIB and GPL combos of those #10 The unclear (at least to me) ones #11 Licenses the kernel does not have (yet) in the LICENSES directory, but some of them are not really clear to me #12 GPL version 1 and version 3, reiserfs and some proprietary #13 A set of bogosities in scancode which I need to discuss with Philippe. I probably made some mistakes here and there, but that's what I have now. I've generated static HTML pages from the data, which are available here: https://tglx.de/~tglx/spdx/index.html so you can get a taste of what is coming to you sooner than later. The categories link to pages with rules and the rules to a per rule details page. The latter has links to a Linux cross reference site in case you want to look at the real think instead of the 'normalized' match patterns on the rule page. My plan is to start with categories #1 and #3-9 and send out batches of patches to the list. Which size of batches and what rate do you folks prefer? Thanks, tglx