Hi Ted, I got some JBD2 tracing log by running our test program, most of them will take 10 seconds or so. Compared to the normal system, this test only takes 2 seconds. We saw the spike of running, but checking with kernel source. I don't get what it means. Any insights? <...>-220601 [015] .... 181555.139321: jbd2_run_stats: dev 259,3 tid 1064541 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.141161: jbd2_run_stats: dev 259,3 tid 1064542 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.143399: jbd2_run_stats: dev 259,3 tid 1064543 wait 0 request_delay 0 running 1 locked 0 flushing 0 logging 0 handle_count 4 blocks 3 blocks_logged 4 <...>-220601 [015] .... 181555.143710: jbd2_run_stats: dev 259,3 tid 1064544 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.143840: jbd2_run_stats: dev 259,3 tid 1064545 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181555.176298: jbd2_run_stats: dev 259,3 tid 1064546 wait 0 request_delay 0 running 32 locked 0 flushing 0 logging 0 handle_count 31 blocks 5 blocks_logged 6 jbd2/nvme0n1p1--964 [014] .... 181555.189997: jbd2_run_stats: dev 259,1 tid 293294 wait 0 request_delay 0 running 5000 locked 0 flushing 0 logging 0 handle_count 15613 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181556.126407: jbd2_run_stats: dev 259,3 tid 1064547 wait 0 request_delay 0 running 929 locked 0 flushing 0 logging 0 handle_count 30 blocks 3 blocks_logged 4 <...>-220601 [014] .... 181556.159874: jbd2_run_stats: dev 259,3 tid 1064548 wait 0 request_delay 0 running 10 locked 0 flushing 0 logging 0 handle_count 29 blocks 1 blocks_logged 2 <...>-220601 [000] .... 181557.123872: jbd2_run_stats: dev 259,3 tid 1064549 wait 0 request_delay 0 running 941 locked 0 flushing 0 logging 0 handle_count 30 blocks 3 blocks_logged 4 <...>-220601 [015] .... 181558.135168: jbd2_run_stats: dev 259,3 tid 1064550 wait 0 request_delay 1 running 1 locked 0 flushing 0 logging 0 handle_count 8 blocks 6 blocks_logged 7 <...>-220601 [015] .... 181558.138895: jbd2_run_stats: dev 259,3 tid 1064551 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181558.139091: jbd2_run_stats: dev 259,3 tid 1064552 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 1 handle_count 2 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181558.139886: jbd2_run_stats: dev 259,3 tid 1064553 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 12 blocks 6 blocks_logged 7 <...>-220601 [015] .... 181558.147189: jbd2_run_stats: dev 259,3 tid 1064554 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 Thanks, Roy ---------------------------------------- > From: bsdnet@xxxxxxxxxxx > To: tytso@xxxxxxx > CC: linux-ext4@xxxxxxxxxxxxxxx > Subject: RE: fsync stuck at jbd2_log_wait_commit on NVMe devices > Date: Wed, 12 Aug 2015 12:28:10 -0700 > > Hi Ted, > > I will collect some info about jbd2 and get back to this list. > > To answer your questions, our workload is mostly sequential writes from a couple of daemons. > Every second, we will create 6 to 10 files, and write around 8M bytes into them. We only > do extend or truncate when needed, but I am sure we rarely do this on NVMe devices. > > Reading the kernel document, I also see ext4 ordered option will combine metadata and data blocks > for journaling. If the data block is big, and the workload is heavy, will JBD2 be soaked? > > Thanks, > > Roy > > ---------------------------------------- >> Date: Wed, 12 Aug 2015 14:37:50 -0400 >> From: tytso@xxxxxxx >> To: bsdnet@xxxxxxxxxxx >> CC: linux-ext4@xxxxxxxxxxxxxxx >> Subject: Re: fsync stuck at jbd2_log_wait_commit on NVMe devices >> >> Hi Roy, >> >> My suggestion is to collect information from the jbd2_run_stats and >> jbd2_checkpoint_stats tracepoints. >> >> cd /sys/kernel/debug/tracing >> echo 1> events/jbd2/jbd2_run_stats/enable >> echo 1> events/jbd2/jbd2_checkpoint_stats/enable >> cat trace_pipe> /tmp/traces & >> tail -f /tmp/traces >> >> The jbd2_handle_stats tracepoint can be informative, but it's also far >> more voluminous. >> >> That will give us a hint where things are getting bottlenecked. >> >> What sort of workload is your application doing? Is it just primarily >> doing random writes into a preallocated file? Is it creating or >> deleting files? Extending or truncating files? etc. >> >> - Ted >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html