On Tue, Feb 25, 2025 at 07:08:09PM +0900, Harry Yoo wrote: > On Mon, Feb 24, 2025 at 02:12:13PM +0200, Lilith Gkini wrote: > > On Mon, Feb 24, 2025 at 09:00:14AM +0900, Harry Yoo wrote: > > > On second thought (after reading your email), > > > I think it should be (fp == NULL) && (search == NULL)? > > > > > > When fp is NULL after the loop, it means (the end of the freelist > > > is NULL) AND (there were equal to or less than (slab->objects - nr) objects > > > on the freelist). > > > > > > If the loop has ended after observing fp == NULL, > > > on_freelist() should return true only when search == NULL. > > > > > > If fp != NULL, it means there were more objects than it should have on > > > the free list. In that case, we can't take risk of iterating the loop > > > forever and it's reasonable to assume that the freelist does not > > > end with NULL. > > > > > > > The reason I m saying this is cause the While loop is looking through > > > > every non-NULL freelist pointer for the "search" pattern. If someone > > > > wants to search for NULL in the freelist that return 1 will never > > > > trigger, even for normal freelists. If "fp" was ever null it wouldn't re > > > > enter the loop. Thus the result of the function would be search == NULL > > > > because the original writers assumed the freelist will always end with > > > > NULL. > > > > > > Agreed. > > > > > > > As you correctly pointed out there might be a case where the freelist > > > > is corrupted and it doesnt end in NULL. In that case two things could happen: > > > > > > > > 1) The check_valid_pointer() catches it and fixes it, restoring the > > > > corrupted freelist. > > > > > > > > 2) The check_valid_pointer() fails to catch it and there isn't any NULL. > > > > > > > > > > > > In the first case the problem fixes itself and thats probably why > > > > validate_slab() worked fine all this time. > > > > > > > > The second case is pretty rare, but thats the case that validate_slab() > > > > wants to rout out when it checks the `!on_freelist(s, slab, NULL)`, > > > > right? > > > > > > Yes. > > > > > > > Also to make my added suggestion of `return fp == NULL` work in the first > > > > case (since it does corrrect the freelist we want it to now return TRUE) > > > > we could also add a line that nulls out the fp right after the > > > > `set_freepointer(s, object, NULL);`. > > > > > > Why? > > > fp = get_freepionter() will observe NULL anyway. > > > > > > > But words are cheap, I should test it out dynamically rather than just > > > > reading the code to see how it behaves. Feel free to also test it > > > > yourself. > > > > > > Yes, testing is important, and you should test > > > to some extent before submitting a patch. > > > > > > > I know I am supposed to send a proper Patch with the multiple added > > > > lines, but for now we are mostly brainstorming solutions. It will be > > > > better to see its behavior first in a debugger I think. > > > > > > I think it's generally considered good practice to discuss before > > > writing any code. > > > > > > > I hope I am making sense with the thought process I outlined for the > > > > return thing. I probably shouldn't be writing emails ealry Saturday > > > > morning, haha. > > > > > > > > Also I really appreciate the kind feedback! The Linux Kernel is infamously > > > > known for how rude and unhinged people can be, which can make it a bit > > > > stressful and intimidating sending any emails, especially for someone > > > > starting out such as myself. > > > > > > You're welcome ;-) > > > > > > -- > > > Cheers, > > > Harry > > Hi, Lilith. > > If you don't mind, could you please avoid bottom posting and > reply with inline comments like how I reply to you? > It makes it easier to follow the conversation. > > > Alright, I managed to run some tests through a debugger. > > > > You are right, my code isn't correct, `return fp == search` should be > > more appropriate. > > > > So I think the right way would be to do these changes: > > - while (fp && nr < slab->objects) { > > > > - fp = NULL; > > > > - return fp == search; > > > > The first line removes the "=" so it doesnt iterate more times than > > slab->objects. > > Yes. > > > The second line sets fp to NULL for the case where the code caught a > > corrupted freelist (check_valid_pointer) and fixes it by setting > > the freepointer to NULL (set_freepointer). Now NULL will be in the > > freelist. > > Yes. but do we care about the return value of on_freelist() > when the freelist is corrupted? (we don't know if it was null-terminated > or not because it's corrupted.) > > > The third line is the return value: > > TRUE if the final fp we got happens to be the search value the caller > > was looking for in the freelist. > > FALSE if the final fp we got was not the same as the search value. > > > > This should make the validate_slab() work right and if anyone ever uses > > the on_freelist() again with some other search value other than NULL it > > should also work as intended. > > Yes! that makes sense to me. > > > For my tests I wrote a kernel module that creates an isolated cache with > > 8 objects per slab, allocated all 8 of them and freed them. Then called > > validate_slab_cache() with my cache and set a breakpoint at on_freelist(). > > From there I could set any value I wanted and observe its behavior > > through GDB (I used QEMU and GDB-ed remotely from my host). > > This way I didn't have to set a bunch of EXPORT_SYMBOL() and change > > stuff to not static; It made testing a lot easier. > > > > Here are the tests I did with this new change I just mentioned. > > > > Note: By "full slab" I mean that I allocated every object in the slab > > and freed them. > > FYI in slab terms (slab->inuse == slab->objects) means full slab, > You probably meant empty slab? > > > By "partial" I mean that I only allocated and freed some > > objects, but not every object in the slab, ie if the total objects can > > be 8 I only alloc-ed and freed 7. > > > > search == NULL > > - full slab > > - correct tail true > > > - corrupted tail with garbage false > > - corrupted tail with valid address false > > Two falses because when the freepointer of the tail object > is corrupted, the loop stops when nr equals slab->objects? > > > - partial slab > > - correct tail true > > > - corrupted tail with garbage true > > This is true because for partial slabs, the number of objects > plus 1 for the garbage, does not exceed slab->objects? > > > - corrupted tail with valid address false > > > > search == some fake address > > - full slab > > - correct tail false > > - corrupted tail with garbage false > > - corrupted tail with valid address false > > > > - partial slab > > - correct tail false > > - corrupted tail with garbage false > > - corrupted tail with valid address false > > > > > > search == some address in the freelist > > - full slab > > - correct tail true > > - corrupted tail with garbage true > > - corrupted tail with valid address true > > > > - partial slab > > - correct tail true > > - corrupted tail with garbage true > > - corrupted tail with valid address true > > The result seems valid to me. At least, this is the best SLUB can do > while avoiding the risk of infinite loop iteration. > > > I apologize if am going into too many details with my testing, I just > > wanna make sure I didn't miss anything. > > No, it's ok ;-) > > > If my proposed changes look confusing I can send a proper patch. > > Should I send it in this chain as a reply, or send a new email > > and add you as well to the cc? > > You can either send a new email or reply to this email with > In-Reply-To header. And please cc SLAB ALLOCATOR folks in MAINTAINERS file > including me—I recently changed my name and email (previously Hyeonggon Yoo). > > -- > Cheers, > Harry Hey, Harry. > Yes. but do we care about the return value of on_freelist() > when the freelist is corrupted? (we don't know if it was null-terminated > or not because it's corrupted.) Oh! So we shouldn't care about the return value of on_freelist() if the freelist is corrupted? My thinking was that if the check_valid_pointer() detects a corrupted freelist and fixes it by setting the freepointer to NULL it fixes it, so we would care about the result. But maybe if the whole object_err() triggers, I think that causes an OOPs if I recall, we should consider this as corrupted and we shouldn't care about what it returns, cause something went terribly wrong? I guess that sounds like a sound logic. If that is how we should do this then yeah, my second suggested line is moot and in my later patch I shouldn't include it. > FYI in slab terms (slab->inuse == slab->objects) means full slab, > You probably meant empty slab? Thats fair. I was thinking more like "I allocated everything and then freed everything in the slab, therefore the freelist will be full". But hopefully since I specified what I meant it wasn't too confusing. > Two falses because when the freepointer of the tail object > is corrupted, the loop stops when nr equals slab->objects? Yeah! Since the freelist pointer is corrupted and doesn't end in NULL on_freelist() won't find a NULL, and in the case of the tail having garbage that the check_valid_pointer() would normally catch, the code won't catch it, because it's at the tail which would exceed the slab->objects and the While loop will exit before that, since the freelist* (not slab, as you pointed out) is full. > This is true because for partial slabs, the number of objects > plus 1 for the garbage, does not exceed slab->objects? Yeah. That goes back to that second line I suggested in the previous email. If the number of objects in the freelist is less than the slab->objects and the corrupted freelist has some garbage address that the check_valid_pointer() will catch, it will set it to NULL instead, and as I said since the freelist now includes a NULL, my thinking was that on_freelist() should return true if we were searching for NULL. But if thats not how on_freelist() should work in a case of a corrupted freelist and we don't add my second line that nulls the "fp" then this will return false instead. > The result seems valid to me. At least, this is the best SLUB can do > while avoiding the risk of infinite loop iteration. Yeah! There were no infinite loop iterations in my tests inside on_freelist(). There were in some cases in other functions, but not in on_freelist(). Fortunately the While condition prevents any infinite loopings, even without my patch. > If you don't mind, could you please avoid bottom posting and > reply with inline comments like how I reply to you? > It makes it easier to follow the conversation. Hopefully I did that correctly this time. Should I always include all the previous messages in the reply chain? It was getting rather long and I thought it would look messy. > You can either send a new email or reply to this email with > In-Reply-To header. And please cc SLAB ALLOCATOR folks in MAINTAINERS file > including me—I recently changed my name and email (previously Hyeonggon Yoo). Oh! You are one of the maintainers? That explains a lot, haha. I just assumed you were someone from the mailing list who happened to be really passionate about SLUB.