On Wed, Mar 14, 2018 at 12:05 PM, Michael Christie <mchristi@xxxxxxxxxx> wrote:
On 03/14/2018 01:27 PM, Michael Christie wrote:
> On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
>> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx
>> <mailto:jdillama@xxxxxxxxxx>> wrote:
>>
>> Maxim, can you provide steps for a reproducer?
>>
>>
>> Yes, but it involves adding two artificial delays: one in tcmu-runner
>> and another in kernel iscsi. If you're willing to take pains of
>
> Send the patches for the changes.
>
>> ...
Where you send the patches that add your delays could you send the
target side /var/log/tcmu-runner.log with log_level = 4.
...
Mike, see please patches and /var/log/tcmu-runner.log in attachment.
Time-line was like this:
1) 13:56:31 the client (iscsi-initiator) sends a request leading to "Acquired exclusive lock." on gateway.
2) 13:56:49 tcmu-runner is suspended by SIGSTOP
3) 13:56:50 the client executes:
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=1536 count=1 seek=10 &
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=2560 count=1 seek=10 &
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=3584 count=1 seek=10
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=1536 count=1 seek=10 &
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=2560 count=1 seek=10 &
dd of=/dev/mapper/mpatha if=/dev/zero oflag=direct bs=3584 count=1 seek=10
4) 13:56:51 gateway is detached from client (and neighbor gateways) by "iptables ... -j DROP"
5) 13:57:06 the client switches to another path, completes these requests above
6) 13:57:07 the client executes (that another path is still active):
dd of=/dev/mapper/mpatha if=/dev/urandom oflag=direct bs=3584 count=1 seek=10
7) 13:57:09 tcmu-runner is resumed by SIGCONT
8) 13:57:15 tcmu-runner successfully processes the third request (zero bs=3584) overwriting newer data.
9) verify that newer data was really overwritten:
# dd if=/dev/mapper/mpatha iflag=direct bs=3584 count=1 skip=10 |od -x
1+0 records in
1+0 records out
3584 bytes (3.6 kB) copied, 0.00232227 s, 1.5 MB/s
0000000 0000 0000 0000 0000 0000 0000 0000 0000
# dd if=/dev/mapper/mpatha iflag=direct bs=3584 count=1 skip=10 |od -x
1+0 records in
1+0 records out
3584 bytes (3.6 kB) copied, 0.00232227 s, 1.5 MB/s
0000000 0000 0000 0000 0000 0000 0000 0000 0000
Thanks,
Maxim
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 9eb10d3..f48ee2c 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1291,6 +1291,13 @@ int iscsit_process_scsi_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, immed_ret = iscsit_handle_immediate_data(cmd, hdr, cmd->first_burst_len); + + if (be32_to_cpu(hdr->data_length) == 3584) { + u64 end_time = ktime_get_ns() + 25ULL * 1000 * 1000 * 1000; + while (ktime_get_ns() < end_time) + schedule_timeout_uninterruptible(HZ); + } + after_immediate_data: if (immed_ret == IMMEDIATE_DATA_NORMAL_OPERATION) { /*
diff --git a/tcmur_cmd_handler.c b/tcmur_cmd_handler.c index 6d89a5f..1b77f11 100644 --- a/tcmur_cmd_handler.c +++ b/tcmur_cmd_handler.c @@ -2292,6 +2292,9 @@ static int tcmur_cmd_handler(struct tcmu_device *dev, struct tcmulib_cmd *cmd) struct tcmur_device *rdev = tcmu_get_daemon_dev_private(dev); uint8_t *cdb = cmd->cdb; + if (cmd->iovec[0].iov_len == 2560) + sleep(1); + track_aio_request_start(rdev); if (tcmu_dev_in_recovery(dev)) {
2018-03-14 13:56:03.042 953 [DEBUG] main:1056: handler path: /usr/lib64/tcmu-runner 2018-03-14 13:56:03.043 953 [INFO] dyn_config_start:410: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS 2018-03-14 13:56:03.059 953 [INFO] load_our_module:537: Inserted module 'target_core_user' 2018-03-14 13:56:03.102 953 [DEBUG] main:1069: 2 runner handlers found 2018-03-14 13:56:03.104 953 [DEBUG] dbus_bus_acquired:441: bus org.kernel.TCMUService1 acquired 2018-03-14 13:56:03.105 953 [DEBUG] dbus_name_acquired:457: name org.kernel.TCMUService1 acquired 2018-03-14 13:56:05.474 953 [DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2. 2018-03-14 13:56:05.474 953 [DEBUG] dev_added:763 rbd/rbd.disk_1: Got block_size 512, size in bytes 2147483648 2018-03-14 13:56:05.474 953 [DEBUG] tcmu_rbd_open:739 rbd/rbd.disk_1: tcmu_rbd_open config rbd/rbd/disk_1;osd_op_timeout=30 block size 512 num lbas 4194304. 2018-03-14 13:56:05.482 953 [DEBUG] timer_check_and_set_def:378 rbd/rbd.disk_1: The cluster's default osd op timeout(0.000000), osd heartbeat grace(20) interval(6) 2018-03-14 13:56:05.505 953 [DEBUG] tcmu_rbd_detect_device_class:295 rbd/rbd.disk_1: Pool rbd using crush rule "replicated_rule" 2018-03-14 13:56:05.506 953 [DEBUG] tcmu_rbd_detect_device_class:311 rbd/rbd.disk_1: SSD not a registered device class. 2018-03-14 13:56:11.933 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:11.935 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:11.937 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:11.938 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:12.930 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:12.932 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:12.935 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:12.938 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:31.152 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:31.161 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 0 2018-03-14 13:56:31.161 953 [DEBUG] tcmu_block_device:403 rbd/rbd.disk_1: blocking kernel device 2018-03-14 13:56:31.161 953 [DEBUG] tcmu_block_device:409 rbd/rbd.disk_1: block done 2018-03-14 13:56:31.162 953 [DEBUG] tcmu_acquire_dev_lock:266 rbd/rbd.disk_1: Waiting for outstanding commands to complete 2018-03-14 13:56:31.162 953 [DEBUG] tcmu_flush_device:383 rbd/rbd.disk_1: waiting for ring to clear 2018-03-14 13:56:31.162 953 [DEBUG] tcmu_flush_device:386 rbd/rbd.disk_1: ring clear 2018-03-14 13:56:31.162 953 [DEBUG] tcmu_acquire_dev_lock:281 rbd/rbd.disk_1: lock call state 2 retries 0 2018-03-14 13:56:31.162 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:56:31.164 953 [DEBUG] tcmu_rbd_lock_break:577 rbd/rbd.disk_1: Attempting to break lock from 192.168.122.216:0/3984316315. 2018-03-14 13:56:31.379 953 [WARN] tcmu_rbd_lock:631 rbd/rbd.disk_1: Acquired exclusive lock. 2018-03-14 13:56:31.379 953 [DEBUG] tcmu_acquire_dev_lock:313 rbd/rbd.disk_1: lock call done. lock state 1 2018-03-14 13:56:31.380 953 [DEBUG] tcmu_unblock_device:421 rbd/rbd.disk_1: unblocking kernel device 2018-03-14 13:56:31.380 953 [DEBUG] tcmu_unblock_device:427 rbd/rbd.disk_1: unblock done 2018-03-14 13:56:31.383 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.408 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.410 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.412 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.414 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.415 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.416 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.417 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.418 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.419 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.421 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.422 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.424 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.425 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.426 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.427 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.427 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.428 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.429 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.430 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.431 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.432 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.433 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.433 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.433 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.433 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.434 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.434 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.436 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.436 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.436 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.437 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.438 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.438 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.439 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.439 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.439 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.439 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.440 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.440 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.440 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.442 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.442 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.442 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.442 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.443 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:31.444 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:56:32.948 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:56:38.750 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:56:38.753 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:56:38.755 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:56:38.757 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:56:38.759 953 [DEBUG] tcmu_rbd_has_lock:508 rbd/rbd.disk_1: Is owner 2018-03-14 13:57:09.644 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1 2018-03-14 13:57:09.645 953 [WARN] tcmu_notify_lock_lost:199 rbd/rbd.disk_1: Async lock drop. Old state 1 2018-03-14 13:57:10.645 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 0 2018-03-14 13:57:10.645 953 [DEBUG] tcmu_block_device:403 rbd/rbd.disk_1: blocking kernel device 2018-03-14 13:57:10.645 953 [DEBUG] tcmu_block_device:409 rbd/rbd.disk_1: block done 2018-03-14 13:57:10.645 953 [DEBUG] tcmu_acquire_dev_lock:266 rbd/rbd.disk_1: Waiting for outstanding commands to complete 2018-03-14 13:57:10.645 953 [DEBUG] tcmu_flush_device:383 rbd/rbd.disk_1: waiting for ring to clear 2018-03-14 13:57:10.645 953 [DEBUG] tcmu_flush_device:386 rbd/rbd.disk_1: ring clear 2018-03-14 13:57:10.646 953 [DEBUG] tcmu_acquire_dev_lock:281 rbd/rbd.disk_1: lock call state 2 retries 0 2018-03-14 13:57:10.649 953 [DEBUG] tcmu_acquire_dev_lock:297 rbd/rbd.disk_1: Try to reopen device. 2018-03-14 13:57:10.649 953 [DEBUG] __tcmu_reopen_dev:52 rbd/rbd.disk_1: Waiting for outstanding commands to complete 2018-03-14 13:57:10.649 953 [DEBUG] __tcmu_reopen_dev:83 rbd/rbd.disk_1: Closing device. 2018-03-14 13:57:10.658 953 [DEBUG] __tcmu_reopen_dev:92 rbd/rbd.disk_1: Opening device. 2018-03-14 13:57:10.658 953 [DEBUG] tcmu_rbd_open:739 rbd/rbd.disk_1: tcmu_rbd_open config rbd/rbd/disk_1;osd_op_timeout=30 block size 512 num lbas 4194304. 2018-03-14 13:57:10.667 953 [DEBUG] timer_check_and_set_def:378 rbd/rbd.disk_1: The cluster's default osd op timeout(0.000000), osd heartbeat grace(20) interval(6) 2018-03-14 13:57:10.692 953 [DEBUG] tcmu_rbd_detect_device_class:295 rbd/rbd.disk_1: Pool rbd using crush rule "replicated_rule" 2018-03-14 13:57:10.693 953 [DEBUG] tcmu_rbd_detect_device_class:311 rbd/rbd.disk_1: SSD not a registered device class. 2018-03-14 13:57:10.723 953 [DEBUG] tcmu_acquire_dev_lock:281 rbd/rbd.disk_1: lock call state 2 retries 1 2018-03-14 13:57:10.723 953 [DEBUG] tcmu_rbd_has_lock:511 rbd/rbd.disk_1: Not owner 2018-03-14 13:57:10.723 953 [DEBUG] tcmu_rbd_lock_break:577 rbd/rbd.disk_1: Attempting to break lock from 192.168.122.216:0/1192112454. 2018-03-14 13:57:11.023 953 [WARN] tcmu_rbd_lock:631 rbd/rbd.disk_1: Acquired exclusive lock. 2018-03-14 13:57:11.023 953 [DEBUG] tcmu_acquire_dev_lock:313 rbd/rbd.disk_1: lock call done. lock state 1 2018-03-14 13:57:11.023 953 [DEBUG] tcmu_unblock_device:421 rbd/rbd.disk_1: unblocking kernel device 2018-03-14 13:57:11.023 953 [DEBUG] tcmu_unblock_device:427 rbd/rbd.disk_1: unblock done 2018-03-14 13:57:15.144 953 [DEBUG] alua_implicit_transition:447 rbd/rbd.disk_1: lock state 1
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com