Re: Cephfs write fail when node goes down

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 15 May 2018 08:57:10 +0800

On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
<josef.zelenka@xxxxxxxxxxxxxxxx> wrote:
> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday,
> we were doing a HW upgrade of the nodes, so they went down one by one - the
> cluster was in good shape during the upgrade, as we've done this numerous
> times and we're quite sure that the redundancy wasn't screwed up while doing
> this. However, during this upgrade one of the clients that does backups to
> cephfs(mounted via the kernel driver) failed to write the backup file
> correctly to the cluster with the following trace after we turned off one of
> the nodes:
>
> [2585732.529412]  ffff8800baa279a8 ffffffff813fb2df ffff880236230e00
> ffff8802339c0000
> [2585732.529414]  ffff8800baa28000 ffff88023fc96e00 7fffffffffffffff
> ffff8800baa27b20
> [2585732.529415]  ffffffff81840ed0 ffff8800baa279c0 ffffffff818406d5
> 0000000000000000
> [2585732.529417] Call Trace:
> [2585732.529505]  [<ffffffff813fb2df>] ? cpumask_next_and+0x2f/0x40
> [2585732.529558]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60
> [2585732.529560]  [<ffffffff818406d5>] schedule+0x35/0x80
> [2585732.529562]  [<ffffffff81843825>] schedule_timeout+0x1b5/0x270
> [2585732.529607]  [<ffffffff810642be>] ? kvm_clock_get_cycles+0x1e/0x20
> [2585732.529609]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60
> [2585732.529611]  [<ffffffff8183fc04>] io_schedule_timeout+0xa4/0x110
> [2585732.529613]  [<ffffffff81840eeb>] bit_wait_io+0x1b/0x70
> [2585732.529614]  [<ffffffff81840c6e>] __wait_on_bit_lock+0x4e/0xb0
> [2585732.529652]  [<ffffffff8118f3cb>] __lock_page+0xbb/0xe0
> [2585732.529674]  [<ffffffff810c4460>] ? autoremove_wake_function+0x40/0x40
> [2585732.529676]  [<ffffffff8119078d>] pagecache_get_page+0x17d/0x1c0
> [2585732.529730]  [<ffffffffc056b3a8>] ? ceph_pool_perm_check+0x48/0x700
> [ceph]
> [2585732.529732]  [<ffffffff811907f6>] grab_cache_page_write_begin+0x26/0x40
> [2585732.529738]  [<ffffffffc056a6a8>] ceph_write_begin+0x48/0xe0 [ceph]
> [2585732.529739]  [<ffffffff8118fd6e>] generic_perform_write+0xce/0x1c0
> [2585732.529763]  [<ffffffff8122bdb9>] ? file_update_time+0xc9/0x110
> [2585732.529769]  [<ffffffffc05651c9>] ceph_write_iter+0xf89/0x1040 [ceph]
> [2585732.529792]  [<ffffffff81199c19>] ? __alloc_pages_nodemask+0x159/0x2a0
> [2585732.529808]  [<ffffffff8120fedb>] new_sync_write+0x9b/0xe0
> [2585732.529811]  [<ffffffff8120ff46>] __vfs_write+0x26/0x40
> [2585732.529812]  [<ffffffff812108c9>] vfs_write+0xa9/0x1a0
> [2585732.529814]  [<ffffffff81211585>] SyS_write+0x55/0xc0
> [2585732.529817]  [<ffffffff818447f2>] entry_SYSCALL_64_fastpath+0x16/0x71
>
>

is there any hang osd request in /sys/kernel/debug/ceph/xxxx/osdc?

> I have encountered this behavior on Luminous, but not on Jewel. Anyone who
> has a clue why the write fails? As far as i'm concerned, it should always
> work if all the PGs are available. Thanks
> Josef
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com