Re: Why do NBD requests prevent hibernation, and FUSE requests do not?

Nikolaus Rath <Nikolaus@xxxxxxxx> · Fri, 16 Sep 2022 09:05:22 +0100

Hi Wouter,

Following up on this: should the NBD server perhaps set
PR_SET_IO_FLUSHER, and the kernel freeze tasks with this flag last?

Best,
-Nikolaus

On Sep 02 2022, Wouter Verhelst <w@xxxxxxx> wrote:
> Hi Nikolaus,
>
> I do not know how FUSE works, so can't comment on that.
>
> NBD, however, is a message-passing protocol: the client sends a message
> to request something over a network socket, which causes the server to
> do some processing, and then to send a message back. As far as the
> kernel is concerned (at least outside nbd.ko), there is no connection
> between the request message and the reply message.
>
> As such, when the kernel suspends the nbd server, it has no way of
> knowing that the in-kernel client is still waiting on a reply for a
> message that was sent earlier.
>
> I'm guessing that for FUSE, there is such a link?
>
> On Tue, Aug 30, 2022 at 07:31:31AM +0100, Nikolaus Rath wrote:
>> Hello,
>> 
>> I am comparing the behavior of FUSE and NBD when attempting to hibernate
>> the system.
>> 
>> FUSE seems to be mostly compatible, I am able to suspend the system even
>> when there is ongoing I/O on the fuse filesystem.
>> 
>> With NBD, on the other hand, most I/O seems to prevent hibernation the
>> system. Example hibernation error:
>> 
>>   kernel: Freezing user space processes ... 
>>   kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
>>   kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
>>   kernel: Call Trace:
>>   kernel:  <TASK>
>>   kernel:  __schedule+0x308/0x9e0
>>   kernel:  schedule+0x4e/0xb0
>>   kernel:  schedule_timeout+0x88/0x150
>>   kernel:  ? __bpf_trace_tick_stop+0x10/0x10
>>   kernel:  io_schedule_timeout+0x4c/0x80
>>   kernel:  __cv_timedwait_common+0x129/0x160 [spl]
>>   kernel:  ? dequeue_task_stop+0x70/0x70
>>   kernel:  __cv_timedwait_io+0x15/0x20 [spl]
>>   kernel:  zio_wait+0x129/0x2b0 [zfs]
>>   kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
>>   kernel:  zap_lockdir+0x4e/0xb0 [zfs]
>>   kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
>>   kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
>>   kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
>>   kernel:  zfs_readdir+0x12a/0x440 [zfs]
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? aa_file_perm+0x120/0x4c0
>>   kernel:  ? rrw_exit+0x65/0x150 [zfs]
>>   kernel:  ? _copy_to_user+0x21/0x30
>>   kernel:  ? cp_new_stat+0x150/0x180
>>   kernel:  zpl_iterate+0x4c/0x70 [zfs]
>>   kernel:  iterate_dir+0x171/0x1c0
>>   kernel:  __x64_sys_getdents64+0x78/0x110
>>   kernel:  ? __ia32_sys_getdents64+0x110/0x110
>>   kernel:  do_syscall_64+0x38/0xc0
>>   kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>   kernel: RIP: 0033:0x7f03c897a9c7
>>   kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
>>   kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
>>   kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
>>   kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
>>   kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
>>   kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
>>   kernel:  </TASK>
>> 
>> (this is with ZFS on top of the NBD device).
>> 
>> 
>> As far as I can tell, the problem is that while an NBD request is
>> pending, the atsk that waits for the result (in this case *rsync*) is
>> refusing to freeze. This happens even when setting a 5 minute timeout
>> for freezing (which is more than enough time for the NBD request to
>> complete), so I suspect that the NBD server task (in this case nbdkit)
>> has already been frozen and is thus unable to make progress.
>> 
>> However, I do not understand why the same is not happening for FUSE
>> (with FUSE requests being stuck because the FUSE daemon is already
>> frozen). Was I just very lucky in my tests? Or are tasks waiting for
>> FUSE request in a different kind of state? Or is NBD a red-herring here,
>> and the real trouble is with ZFS?
>> 
>> It would be great if someone  could shed some light on what's going on.
>> 
>> 
>> Best,
>> -Nikolaus
>> 
>> -- 
>> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>> 
>>              »Time flies like an arrow, fruit flies like a Banana.«
>> 
>> 
>
> -- 
>      w@uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
>
> I will have a Tin-Actinium-Potassium mixture, thanks.

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«