Hi, My name is Aviad, I'm a student in Tel Aviv university. I've been working on a simple SSD simulator with tgt. The basic idea is that data is saved in RAM. The "disk" itself is multi-threaded, with one threads handling requests and spreading them between other threads (representing NAND flash chips). I'm using tgt-1.0.1 on a multi-core machine (128 cores) with kernel 3.2. So things should be pretty parallelized. Some minor implementation details - every scsi_cmd request is "translated" into small multiple 4KB requests to my code, which services them, responds, and when all parts of the scsi_cmd are serviced, I terminate the scsi_cmd. Things have been working well, I've been able to use dd to issue requests, mkfs and mount a file system (ext3) on top of it. But whenever I run some file system benchmarks, which means more sophisticated workloads, at some points I get resets in tgt code and requests being denied. tgt simply hangs for some time (several seconds up to 60s or so, anyway more than ext3 flush intervals), and most requests get lost, or only get processed after a very long time. My part of the code waits meanwhile doing nothing, just waiting for incoming requests to arrive (without any requests in it's own internal queues). So I suspect the problem is not there. When I run with debugging printout I get these kind of messages, which suggest a timeout has been exceeded (from what I've read here). But I dont understand how could this happen, since I am not servicing any requests meanwhile! tgtd: iscsi_noop_out_rx_start(1607) ffffffff 5e 0 tgtd: iscsi_task_queue(1514) 88d1 88d1 40 tgtd: iscsi_task_tx_start(1860) found a task 0 4294967295 0 0 tgtd: iscsi_task_tx_start(1885) no more data tgtd: iscsi_noop_out_rx_start(1607) ffffffff 8 0 tgtd: iscsi_task_queue(1514) 88d1 88d1 40 tgtd: iscsi_task_tx_start(1860) found a task 0 4294967295 0 0 tgtd: iscsi_task_tx_start(1885) no more data tgtd: iscsi_noop_out_rx_start(1607) ffffffff c 0 tgtd: iscsi_task_queue(1514) 88d1 88d1 40 tgtd: iscsi_task_tx_start(1860) found a task 0 4294967295 0 0 tgtd: iscsi_task_tx_start(1885) no more data tgtd: iscsi_task_queue(1514) 88d1 88d1 42 tgtd: abort_task_set(1008) found 14 0 tgtd: iscsi_task_tx_start(1860) found a task 0 855638016 0 0 tgtd: iscsi_task_tx_start(1885) no more data tgtd: iscsi_task_queue(1514) 88d1 88d1 42 tgtd: abort_task_set(1008) found 0 0 tgtd: abort_cmd(984) found 33 6 When I run tgt with gdb, and check the status of threads when tgt hangs, I get this backtrace from tgt threads Thread 6 (Thread 0x7ffff5631710 (LWP 66238)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/ linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000004215e7 in bs_thread_worker_fn (arg=<value optimized out>) at bs.c:196 #2 0x00007ffff7bc69ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #3 0x00007ffff771c69d in clone () from /lib/tls/libc.so.6 #4 0x0000000000000000 in ?? () Thread 5 (Thread 0x7ffff5e32710 (LWP 66237)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000004215e7 in bs_thread_worker_fn (arg=<value optimized out>) at bs.c:196 #2 0x00007ffff7bc69ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #3 0x00007ffff771c69d in clone () from /lib/tls/libc.so.6 #4 0x0000000000000000 in ?? () Thread 4 (Thread 0x7ffff6633710 (LWP 66236)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000004215e7 in bs_thread_worker_fn (arg=<value optimized out>) at bs.c:196 ---Type <return> to continue, or q <return> to quit--- #2 0x00007ffff7bc69ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #3 0x00007ffff771c69d in clone () from /lib/tls/libc.so.6 #4 0x0000000000000000 in ?? () Thread 3 (Thread 0x7ffff6e34710 (LWP 66235)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000004215e7 in bs_thread_worker_fn (arg=<value optimized out>) at bs.c:196 #2 0x00007ffff7bc69ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #3 0x00007ffff771c69d in clone () from /lib/tls/libc.so.6 #4 0x0000000000000000 in ?? () Thread 2 (Thread 0x7ffff7635710 (LWP 66234)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000004217b5 in bs_thread_ack_fn (arg=<value optimized out>) at bs.c:89 #2 0x00007ffff7bc69ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #3 0x00007ffff771c69d in clone () from /lib/tls/libc.so.6 #4 0x0000000000000000 in ?? () Thread 1 (Thread 0x7ffff7fb1700 (LWP 66220)): #0 0x00007ffff771cc93 in epoll_wait () from /lib/tls/libc.so.6 #1 0x000000000040fde0 in event_loop () at tgtd.c:263 #2 0x0000000000410309 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:438 Any idea what might be going on? I'm at a loss here. I even tried to run the file system (ext3) in sync mode, to lower the stress on tgt. Did not help at all. Using tgt 1.0.36 did not resolve this either. Same reset problem. I'm a newbee to tgt, so it is possible I'm missing something. I'd appreciate your help. Thank you -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html