On Tue, Aug 16, 2022 at 10:53:45PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: > Hey Drew, > Thanks for piping up. > > On 16/08/2022 15:06, Andrew Jones wrote: > > [You don't often get email from ajones@xxxxxxxxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > On Mon, Aug 15, 2022 at 07:18:02PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: > >> Any takers on trashing my regex? Otherwise I'll just submit > >> a v2 with the regex and it can be shat on there instead :) > >> > >> On 09/08/2022 19:36, Conor Dooley wrote: > >>> On 09/08/2022 15:14, Rob Herring wrote: > >>>> On Mon, Aug 08, 2022 at 10:01:11PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: > >>>>> On 08/08/2022 22:34, Jessica Clarke wrote: > >>>>>> On Fri, Aug 05, 2022 at 05:28:42PM +0100, Conor Dooley wrote: > >>>>>>> From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> > >>>>>>> The final patch adds some new ISA strings > >>>>>>> which needs scruitiny from someone with more knowledge about what ISA > >>>>>>> extension strings should be reported in a dt than I have. > >>>>>> > >>>>>> Listing every possible ISA string supported by the Linux kernel really > >>>>>> is not going to scale... > >>>> > >>>> How does the kernel scale? (No need to answer) > >>>> > >>>>> Yeah, totally correct there. Case for adding a regex I suppose, but I > >>>>> am not sure how to go about handling the multi-letter extensions or > >>>>> if parsing them is required from a binding compliance point of view. > >>>>> Hoping for some input from Palmer really. > >>>> > >>>> Yeah, looks like a regex pattern is needed. > >>> > >>> I started pottering away at this but I have arrived at: > >>> rv64imaf?d?c?h?(_z[imafdqcbvkh]([a-z])*)*$ > > > > Don't forget the ^ at the start. > > > > Do we need to worry about optional major and minor version numbers? > > Or check that Z names have at least one character following the category > > character? Actually, the first letter after Z being a category is only a > > convention. Maybe we don't want to enforce that. What about X extensions? > > For the character after Z, I think we could operate on the assumption > that that's a convention until things change. The regex isnt set in > stone forever. > With x, it becomes - which to me makes bad worse: > ^rv64imaf?d?q?c?b?v?k?h?(?:(?:_z[imafdqcbvkh]|_x)(?:[a-z])*)*$ I think we should change the ([a-z]*) to ([a-z]+). > > and then for the version numbers it becomes completely awful. > I'd argue that if we are going to support those, then we should > do that as another regex. We are already forcing lower case in > these ISA strings - is there an actual benefit in adding the > numbers, or might we want to "encourage" removing those too? > > I hope I am missing something, as my regex foo isn't that good, to > enforce the ordering & the numbers - even for the simple case of the > major number only, we'd need to convert "f?" to "(?:f\d+)?" and so > on for every single extension. I don't think we reduce that either > as we want to enforce the ordering. > > For the minor versions it goes to "(?:f\d+p\d+)?". At that point I > don't think we are adding any value but w/e, who am I to decide. > That ballooned out to 194 characters for me. I then decided to have > a bit of fun, and just do both number sets as a oneliner, using > some named match groups. That was about 255 characters. 😍 > Anyway, dt-schema had a panic attack at something I was doing > so I think that /may/ be a bad idea. I presume if a version is used it means one cannot rely on the default version. So we can't always encourage them to be removed. To simplify things we can always require minor numbers. We can also always require underscores. Something like ^rv64_i(\d+p\d+)?_m\1?_a\1?(_f\1?)?(_d\1?)? ... ((_z[imafdqcbvkh]|_x)[a-z]+\1?)*$ > > I vote for allow the x extensions, keep the convention for standard > extensions & revisit this in the future if needed... Sounds good. We can also easily add _s|_h|_zxm to the OR if we want. But, there is a problem with the OR. By using it we don't enforce order. To be pedantic we should ensure _z comes before _s, then _h, then _zxm, then _x. Thanks, drew