On Sat, Nov 13, 2021 at 2:58 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Nov 13, 2021 at 2:30 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > The above snippet is actually ok since if *pagep returned via > > shmem_getpage()'s parameter is not NULL, then ret is 0. > > That's a random implementation detail, and is not ok to rely on. > > It may or may not be true, and is not part of the rules of error handling. > > If a function returns an error, you shouldn't be looking at the other > stuff it returned. > > Here's a very recent example of the same kind of problem: > > https://lore.kernel.org/lkml/163663333331.414.639840290224641315.tip-bot2@tip-bot2/ > > where people didn't actually look properly at the return value of the > function, and instead looked at the page pointers that the function > filled in. > > See? EXACT same logic. And completely buggy. Yes, I agree it is too fragile to rely on. > > > When shmem_getpage() returns error code, *pagep is NULL IIUC. > > No. > > When a function returns an error code, you check for the error code, > and don't rely on weather the function then filled in other data (or > left it alone, or whatever). > > So the code should > > (a) check and handle error returns properly > > (b) be legible > > That (b) basically means that if it's not entirely trivial (and none > of this was entirely trivial), then when you get an error, you just > deal with it right away. You return early, and undo anything you need > to undo. > > You don't do "oh, let's keep that error, and then do something else > that maybe also generates an error". > > That "don't handle the error directly" was why > shmem_read_mapping_page_gfp() was buggy and would cause an oops. > > And while the shmem_write_begin() code migth not cause an oops, it had > the same fundamental bad pattern. > > Error handling is where 99% of all problems occur. But that also means > that you should do the obvious thing wrt error handling, and not have > some crazy "if function X returned an error, it will have left the > return array untouched" which may or may not be true. > > When a function returns an error code, you do error handling based on > that code. Not on some random other state. Thanks a lot for the thorough explanation. Preparing a new patch. > > Linus