On Sat, Nov 13, 2021 at 2:30 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > The above snippet is actually ok since if *pagep returned via > shmem_getpage()'s parameter is not NULL, then ret is 0. That's a random implementation detail, and is not ok to rely on. It may or may not be true, and is not part of the rules of error handling. If a function returns an error, you shouldn't be looking at the other stuff it returned. Here's a very recent example of the same kind of problem: https://lore.kernel.org/lkml/163663333331.414.639840290224641315.tip-bot2@tip-bot2/ where people didn't actually look properly at the return value of the function, and instead looked at the page pointers that the function filled in. See? EXACT same logic. And completely buggy. > When shmem_getpage() returns error code, *pagep is NULL IIUC. No. When a function returns an error code, you check for the error code, and don't rely on weather the function then filled in other data (or left it alone, or whatever). So the code should (a) check and handle error returns properly (b) be legible That (b) basically means that if it's not entirely trivial (and none of this was entirely trivial), then when you get an error, you just deal with it right away. You return early, and undo anything you need to undo. You don't do "oh, let's keep that error, and then do something else that maybe also generates an error". That "don't handle the error directly" was why shmem_read_mapping_page_gfp() was buggy and would cause an oops. And while the shmem_write_begin() code migth not cause an oops, it had the same fundamental bad pattern. Error handling is where 99% of all problems occur. But that also means that you should do the obvious thing wrt error handling, and not have some crazy "if function X returned an error, it will have left the return array untouched" which may or may not be true. When a function returns an error code, you do error handling based on that code. Not on some random other state. Linus