Hello, I'm working on 2.6.35.9 version of the Linux kernel and am trying to disable Command Completion Coalescing. I have Native Command Queuing enabled by activating the RAID mode through the BIOS. I was looking at the Serial ATA AHCI 1.3 Specification and found on page 115 that - The CCC feature is only in use when CCC_CTL.EN is set to ‘1’. If CCC_CTL.EN is set to ‘0’, no CCC interrupts shall be generated. Next, I had a look at the relevant code (namely, the files concerning AHCI) for this version of the kernel but wasn't able to make any progress. I found the following enum constant - HOST_CAP_CCC = (1 << 7) - in drivers/ata/ahci.h, but I'm not sure how this should be modified to disable command coalescing. I did set HOST_CAP_CCC to 0 but through some experiments that I conducted, I found that responses were being batched. I conducted an experiment wherein I issued requests of size 64KB from my driver code. 64KB corresponds to 128 sectors (each sector = 512 bytes). When I look at the "response timestamp differences", here is what I find: Timestamp | Timestamp | Difference at | at | in microsecs ------------------------------------------------------------ Sector 255 - Sector 127 = 510 Sector 383 - Sector 255 = 3068 Sector 511 - Sector 383 = 22 Sector 639 - Sector 511 = 22 Sector 767 - Sector 639 = 12 Sector 895 - Sector 767 = 19 Sector 1023 - Sector 895 = 13 Sector 1151 - Sector 1023 = 402 As you can see, the _response timestamp_ differences seem to suggest that the write completion interrupts are being batched into one and then one single interrupt is being raised, which might explain the really low numbers (tens of microseconds.) Clearly, there is some interrupt batching involved here which I need to disable so that an interrupt is raised for each and every write request. Will disabling CCC do the trick, or is there some more complexity involved? And yes, I did disable the write cache and a few other caches as well using the following commands: hdparm -a0 -W0 /dev/sdd; hdparm -m0 --yes-i-know-what-i-am-doing /dev/sdd; hdparm -A0 /dev/sdd; Here is another experiment that I tried. Create a bio structure in my driver and call the __make_request() function of the lower level driver. Only one 2560 bytes write request is sent from my driver. Once this write is serviced, an interrupt is generated which is intercepted by do_IRQ(). Finally, the function blk_complete_request() is called. Keep in mind that we are still in the top half of the interrupt handler (i.e., interrupt context, not kernel context). Now, we compose another struct bio in blk_complete_request() and call the __make_request() function of the lower level driver. We record a timestamp at this point (say T_0). When the request completion callback is obtained, we record another timestamp (call it T_1). The difference - T_1 - T_0 - is always above 1 millisec. This experiment was repeated numerous times, and each time, the destination sector affected this difference - T_1 - T_0. It was observed that if the destination sectors are separated by approximately 350 sectors, the time difference is about 1.2 millisec for requests of size 2560 bytes. Every time, the next write request is sent only when the previous request has been serviced. So, all these requests are chained and the disk has to service only one request at a time. My understanding is that since the destination sectors of consecutive requests have been separated by a fairly large amount, by the time the next request is issued, the requested sector would be almost below the disk head and thus the write should happen immediately and T_1 - T_0 should be small (at least < 1 millisec). The following lines of code were inserted to block/blk-softirq.c starting at line number 112: do_gettimeofday(&tv); time_ms = (tv.tv_sec * 1000000) + (tv.tv_usec); if(req && req->rq_disk && req->rq_disk->disk_name) { if(!strncmp(req->rq_disk->disk_name, "sdd", 3)) { if(count < 10) // The experiment involves a total of 10 requests - 1 sent from my driver, and the remaining 9 from here. { if(req->bio && (req->bio->bi_rw == 1) && req->bio->bi_bdev && req->bio->bi_bdev->bd_disk && req->bio->bi_bdev->bd_disk->queue) { tracing_on(); trace_printk("Count = %d: Receive Timestamp for sector #%llu = %lu microsecs; bi_size = %u\n", count, req->bio->bi_sector, time_ms, req->bio->bi_size); compose_bio_rw(&biop, req->bio->bi_bdev, NULL, NULL, 2560, 1); // This function (defined below) populates a bio structure biop->bi_sector = req->bio->bi_sector + 350; subq = req->bio->bi_bdev->bd_disk->queue; if (subq && subq->make_request_fn) { do_gettimeofday(&tv); time_ms = (tv.tv_sec * 1000000) + (tv.tv_usec); trace_printk("Send Timestamp for sector #%llu = %lu microsecs\n", biop->bi_sector, time_ms); count++; subq->make_request_fn(subq, biop); } } } else { count = 0; tracing_off(); } } } static int compose_bio_rw(struct bio **biop, struct block_device *bdev, bio_end_io_t * bi_end_io, void *bi_private, int bi_size, int bi_vec_size) { struct page *bio_page; struct bio *bio; int order = 0, i = 0; order = 0; /* Grab a free page and free bio to hold the log record header */ while (!(bio_page = alloc_pages(GFP_KERNEL, order))) { printk("allocate header_page fails in compose_bio\n"); schedule(); } while (!(bio = bio_alloc(GFP_ATOMIC, bi_vec_size /*MAX_BIO_VEC_NUM */ ))) { printk("Allocate header_bio fails in compose_bio\n"); schedule(); }; for (i = 0; i < bi_vec_size; i++) { bio->bi_io_vec[i].bv_page = &bio_page[i]; bio->bi_io_vec[i].bv_offset = 0; bio->bi_io_vec[i].bv_len = 2560; } bio->bi_sector = -1; /* we do not know the dest_LBA yet */ bio->bi_bdev = bdev; /* set header_bio with same value as bio */ bio->bi_vcnt = bi_vec_size; bio->bi_idx = 0; bio->bi_rw = 1; bio->bi_size = bi_size; bio->bi_end_io = bi_end_io; bio->bi_private = bi_private; *biop = bio; return 0; } The mass storage controller in the system is: Promise Technology, Inc. PDC20268 (Ultra100 TX2) (rev 02), and the HDD being used is: WD Caviar Black (Model number - WD1001FALS). Thank you for reading this really really long mail and assisting me in resolving this issue! Regards, Pallav -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html