Greetings all, On Tue, 2008-02-12 at 17:05 +0100, Bart Van Assche wrote: > On Feb 6, 2008 1:11 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > > I have always observed the case with LIO SE/iSCSI target mode ... > > Hello Nicholas, > > Are you sure that the LIO-SE kernel module source code is ready for > inclusion in the mainstream Linux kernel ? As you know I tried to test > the LIO-SE iSCSI target. Already while configuring the target I > encountered a kernel crash that froze the whole system. I can > reproduce this kernel crash easily, and I reported it 11 days ago on > the LIO-SE mailing list (February 4, 2008). One of the call stacks I > posted shows a crash in mempool_alloc() called from jbd. Or: the crash > is most likely the result of memory corruption caused by LIO-SE. > So I was able to FINALLY track this down to: -# CONFIG_SLUB_DEBUG is not set -# CONFIG_SLAB is not set -CONFIG_SLUB=y +CONFIG_SLAB=y in both your and Chris Weiss's configs that was causing the reproduceable general protection faults. I also disabled CONFIG_RELOCATABLE and crash dump because I was debugging using kdb in x86_64 VM on 2.6.24 with your config. I am pretty sure you can leave this (crash dump) in your config for testing. This can take a while to compile and take up alot of space, esp. with all of the kernel debug options enabled, which on 2.6.24, really amounts to alot of CPU time when building. Also with your original config, I was seeing some strange undefined module objects after Stage 2 Link with iscsi_target_mod with modpost with the SLUB the lockups (which are not random btw, and are tracked back to __kmalloc()).. Also, at module load time with the original config, there where some warning about symbol objects (I believe it was SCSI related, same as the ones with modpost). In any event, the dozen 1000 loop discovery test is now working fine (as well as IPoIB) with the above config change, and you should be ready to go for your testing. Tomo, Vlad, Andrew and Co: Do you have any ideas why this would be the case with LIO-Target..? Is anyone else seeing something similar to this with their target mode (mabye its all out of tree code..?) that is having an issue..? I am using Debian x86_64 and Bart and Chris are using Ubuntu x86_64 and we both have this problem with CONFIG_SLUB on >= 2.6.22 kernel.org kernels. Also, I will recompile some of my non x86 machines with the above enabled and see if I can reproduce.. Here the Bart's config again: http://groups.google.com/group/linux-iscsi-target-dev/browse_thread/thread/30835aede1028188 > Because I was curious to know why it took so long to fix such a severe > crash, I started browsing through the LIO-SE source code. Analysis of > the LIO-SE kernel module source code learned me that this crash is not > a coincidence. Dynamic memory allocation (kmalloc()/kfree()) in the > LIO-SE kernel module is complex and hard to verify. What the LIO-SE Target module does is complex. :P Sorry for taking so long, I had to start tracking this down by CONFIG_ option with your config on an x86_64 VM. > There are 412 > memory allocation/deallocation calls in the current version of the > LIO-SE kernel module source code, which is a lot. Additionally, > because of the complexity of the memory handling in LIO-SE, it is not > possible to verify the correctness of the memory handling by analyzing > a single function at a time. In my opinion this makes the LIO-SE > source code hard to maintain. > Furthermore, the LIO-SE kernel module source code does not follow > conventions that have proven their value in the past like grouping all > error handling at the end of a function. As could be expected, the > consequence is that error handling is not correct in several > functions, resulting in memory leaks in case of an error. I would be more than happy to point the release paths for iSCSI Target and LIO-SE to show they are not actual memory leaks (as I mentioned, this code has been stable for a number of years) for some particular SE or iSCSI Target logic if you are interested.. Also, if we are talking about target mode storage engine that should be going upstream, the API to the current stable and future storage systems, and of course the Mem->SG and SG->Mem that handles all possible cases of max_sectors and sector_size to past, present, and future. I really glad that you have been taking a look at this, because some of the code (as you mention) can get very complex to make this a reality as it has been with LIO-Target since v2.2. > Some > examples of functions in which error handling is clearly incorrect: > * transport_allocate_passthrough(). > * iscsi_do_build_list(). > You did find the one in transport_allocate_passthrough() and the strncpy() + strlen() in userspace. Also, thanks for pointing me to the missing sg_init_table() and sg_mark_end() usage for 2.6.24. I will post an update to my thread about how to do this for other drivers.. I will have a look at your new changes and post them on LIO-Target-Dev for your review. Please feel free to Ack them when I post. (Thanks Bart !!) PS: Sometimes it takes a while when you are on the bleeding edge of development to track these types of issues down. :-) --nab - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html