Re: Cephfs write fail when node goes down

Paul Emmerich <paul.emmerich@xxxxxxxx> · Mon, 14 May 2018 20:41:05 +0200

Which kernel version are you using? If it's an older kernel: consider using the cephfs-fuse client instead

Paul

2018-05-14 11:37 GMT+02:00 Josef Zelenka <josef.zelenka@xxxxxxxxxxxxxxxx>:
Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48 OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday, we were doing a HW upgrade of the nodes, so they went down one by one - the cluster was in good shape during the upgrade, as we've done this numerous times and we're quite sure that the redundancy wasn't screwed up while doing this. However, during this upgrade one of the clients that does backups to cephfs(mounted via the kernel driver) failed to write the backup file correctly to the cluster with the following trace after we turned off one of the nodes:

[2585732.529412]  ffff8800baa279a8 ffffffff813fb2df ffff880236230e00 ffff8802339c0000

[2585732.529414]  ffff8800baa28000 ffff88023fc96e00 7fffffffffffffff ffff8800baa27b20

[2585732.529415]  ffffffff81840ed0 ffff8800baa279c0 ffffffff818406d5 0000000000000000

[2585732.529417] Call Trace:

[2585732.529505]  [<ffffffff813fb2df>] ? cpumask_next_and+0x2f/0x40

[2585732.529558]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60

[2585732.529560]  [<ffffffff818406d5>] schedule+0x35/0x80

[2585732.529562]  [<ffffffff81843825>] schedule_timeout+0x1b5/0x270

[2585732.529607]  [<ffffffff810642be>] ? kvm_clock_get_cycles+0x1e/0x20

[2585732.529609]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60

[2585732.529611]  [<ffffffff8183fc04>] io_schedule_timeout+0xa4/0x110

[2585732.529613]  [<ffffffff81840eeb>] bit_wait_io+0x1b/0x70

[2585732.529614]  [<ffffffff81840c6e>] __wait_on_bit_lock+0x4e/0xb0

[2585732.529652]  [<ffffffff8118f3cb>] __lock_page+0xbb/0xe0

[2585732.529674]  [<ffffffff810c4460>] ? autoremove_wake_function+0x40/0x40

[2585732.529676]  [<ffffffff8119078d>] pagecache_get_page+0x17d/0x1c0

[2585732.529730]  [<ffffffffc056b3a8>] ? ceph_pool_perm_check+0x48/0x700 [ceph]

[2585732.529732]  [<ffffffff811907f6>] grab_cache_page_write_begin+0x26/0x40

[2585732.529738]  [<ffffffffc056a6a8>] ceph_write_begin+0x48/0xe0 [ceph]

[2585732.529739]  [<ffffffff8118fd6e>] generic_perform_write+0xce/0x1c0

[2585732.529763]  [<ffffffff8122bdb9>] ? file_update_time+0xc9/0x110

[2585732.529769]  [<ffffffffc05651c9>] ceph_write_iter+0xf89/0x1040 [ceph]

[2585732.529792]  [<ffffffff81199c19>] ? __alloc_pages_nodemask+0x159/0x2a0

[2585732.529808]  [<ffffffff8120fedb>] new_sync_write+0x9b/0xe0

[2585732.529811]  [<ffffffff8120ff46>] __vfs_write+0x26/0x40

[2585732.529812]  [<ffffffff812108c9>] vfs_write+0xa9/0x1a0

[2585732.529814]  [<ffffffff81211585>] SyS_write+0x55/0xc0

[2585732.529817]  [<ffffffff818447f2>] entry_SYSCALL_64_fastpath+0x16/0x71

I have encountered this behavior on Luminous, but not on Jewel. Anyone who has a clue why the write fails? As far as i'm concerned, it should always work if all the PGs are available. Thanks

Josef

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com