On Sun, Oct 29, 2017 at 2:29 PM, Antoine Beaupré <anarcat@xxxxxxxxxx> wrote: > On 2017-10-29 13:24:03, Eric Sunshine wrote: >> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@xxxxxxxxxx> wrote: >> So, the idea is that if the input has form "something:number", then >> you want to look up "something" as a namespace name. Anything else >> (such as "something:foobar") is not considered a valid page reference. >> Right? > > frankly, i have no idea what's going on here. > >> The multiple 'return's are a bit messy. Perhaps collapse the entire >> function to something like this: >> >> sub get_mw_namespace_id_for_page { >> my $arg = shift; >> if ($arg =~ /^([^:]+):\d+$/) { >> return get_mw_namespace_id($1); >> } >> return undef; >> } >> >> In fact, it may be that the intent of the original code *was* meant to >> do exactly the same as shown in my example above, but that the person >> who wrote it accidentally typed: >> >> return get_mw_namespace_id($namespace); >> >> instead of the intended: >> >> return get_mw_namespace_id($1); >> >> So, a minimal fix would be simply to change $namespace to $1. >> Tightening the regex as I did in my example would be a bonus (though >> probably ought to be a separate patch). > > so while i'm happy to just copy-paste your code in there, that's kind of > a sensitive area of the code, as it was originally used only in the > upload procedure, which I haven't tested at all. so i'm hesitant in just > merging that in as is. I don't think there's a need to copy/paste my example code. If you instead make the minimal suggested fix, then the resulting code will be effectively equivalent to my example (minus the tighter regex). > i don't understand why or how this even works, to be honest: page names > don't necessarily look like numbers, in fact, they generally don't. i > don't understand why the patch submitted here even touches that function > at all, considering that the function is only used on uploads. I just > cargo-culted it from the original issue... I, myself, am not familiar with or a user of Mediawiki or with the Git bridging, and I don't know what page names look like, but I'm pretty well convinced from reading both the existing code and this patch that the changes to get_mw_namespace_id_for_page() are really just a bug fix to that function. My interpretation is that the function really was intended to strip the ":id" portion of "name:id" before calling get_mw_namespace_id(); the fact that the original code neglects to do so seems just an oversight. The fact that the regex uses capturing parentheses implies strongly that it was indeed the intention to use $1 in the call to get_mw_namespace_id(). Unlike the "fix" in the patch you posted from Kevin, which is perhaps unnecessarily complicated, the fix I suggested above is about a minimal as possible. That is, changing: return get_mw_namespace_id($namespace); to: return get_mw_namespace_id($1); should achieve the same result. (It could be made more robust by tightening the regex as in my example, but that's a separate topic, not needed just to get the function to work as intended.)