On Tue, Jul 12, 2022 at 11:46 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > Hence if we restrict EOF block deduping to both the src and dst > files having matching EOF offsets in their EOF blocks like so: > > - if (pos_in + count == size_in) { > + if (pos_in + count == size_in && > + (!(remap_flags & REMAP_FILE_DEDUP) || pos_out + count == size_out)) { > bcount = ALIGN(size_in, bs) - pos_in; I agree with checking the target size too. And I can see how missing that might cause the problem. I don't think that is limited to the REMAP_FILE_DEDUP case, though. Even if you a clone operation, you cannot just clone the EOF block to some random part of the destination. Anyway, isn't all of this supposed to be done by generic_remap_check_len()? That function already takes care of a similar concern for REMAP_FILE_CAN_SHORTEN, where the size of the *output* file matters. So generic_remap_check_len() basically already does one EOF block check for the output file. It just doesn't do it for the input side. And currently generic_remap_check_len() is done too late for REMAP_FILE_DEDUP, which did its handling just before calling it. So while I agree with your patch from a "this seems to be the underlying bug", I think the fix should be to move this "both EOF blocks have to match" logic to generic_remap_check_len(), and just do that *before* that if (remap_flags & REMAP_FILE_DEDUP) { in generic_remap_file_range_prep(). No? That said, the rest of that code in generic_remap_checks() still makes little to no sense to me. Look: > bcount = ALIGN(size_in, bs) - pos_in; and we literally *just* checked that "pos_in + count == size_in". So we can write that as > bcount = ALIGN(pos_in + count, bs) - pos_in; That doesn't look simpler, but... Again, just a few lines above this all, we had > if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs)) > return -EINVAL; so we know that 'pos_in' is aligned wrt bs. So we can rewrite that "ALIGN(pos_in + count, bs)" as "pos_in + ALIGN(count, bs)", because 'pos_in' doesn't change anything wrt an alignment operation. And then trivial simplification ("pos_in - pos_in goes away") makes the whole expression be just > bcount = ALIGN(count, bs); which just once more makes me go "maybe this code works, but it is clearly written for maximum nonsensical value". The "else" side is equally overly complex too, and does if (!IS_ALIGNED(count, bs)) count = ALIGN_DOWN(count, bs); bcount = count; which is just a really complicated way to write bcount = ALIGN_DOWN(count, bs); count = bcount; so that side if the if-statement knew that it could just align the count directly, but decided to do that in the least obvious way too. If 'count' was already aligned, ALIGN_DOWN() does nothing. And masking is much cheaper than testing and branching. Not to mention just *simpler*: One case aligns up to the next block boundary ("include the shared EOF block"), the other case aligns down ("only try to merge full blocks")./ Now the code makes sense, although it's still somewhat subtle in that the align-down case will also update 'count' (which is returned), while the EOF code will only set 'bcount' (which is only used for the overlapping range check) . But then, when you look at that and understand what's going on, that in turn then makes *another* thing obvious: the whole existence of 'bcount' is entirely pointless. Because 'bcount' is only used for that range check, and for the non-EOF case it's the same as 'count'. And for the EOF case, doing that alignment is entirely pointless, since if the in/out inodes are the same, then the file size is going to be the same, and the EOF block is going to overlap whether bcount was aligned to block boundary or not. So the EOF case might as well just have made 'bcount = count' without any alignment to the next block boundary at all. And once it does that, now bcount is _always_ the same as count, and there is no point in having bcount at all. So after doing all the above simplification, you can then get rid of 'bcount' entirely. > So, yeah, I think arguing about permissions and API and all that > stuff is just going completely down the wrong track because it > doesn't actually address the root cause of the information leak.... I agree that getting this "check the right range" thing right is the prime thing. The code being *very* hard to follow and not having any obvious rules really does not help, though. The permission checks are odd. And the range checks were odd and inscrutable and buggy to boot. Yes, I require that people don't break user space. That's the #1 rule of kernel development. But that does not mean or imply "write incomprehensible code". And honestly, I think your suggested patch just makes incomprehensibly and pointlessly complex code even more so. Which is why I'm suggesting the real fix is to clean it up, and mover that EOF offset check to generic_remap_check_len() where it belongs, and where we already have that comment: * ... For deduplication we accept a partial EOF block only if it ends at the * destination file's EOF (can not link it into the middle of a file). but that comment doesn't actually match the code in that function. In fact, that comment currently doesn't match any existing code at all (your suggested patch would be that code, but in another place entirely). Can we please all agree that this code is too obscure for its own good? Linus