Re: wip-msgr2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luis Henriques <lhenriques@xxxxxxx> writes:

> Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>
>> On Mon, Dec 14, 2020 at 4:55 PM Luis Henriques <lhenriques@xxxxxxx> wrote:
>>>
>>> Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>>>
>>> > Hello,
>>> >
>>> > I've pushed wip-msgr2 and opened a dummy PR in ceph-client:
>>> >
>>> >   https://github.com/ceph/ceph-client/pull/22
>>> >
>>> > This set has been through a over a dozen krbd test suite runs with no
>>> > issues other than those with the test suite itself.  The diffstat is
>>> > rather big, so I didn't want to spam the list.  If someone wants it
>>> > posted, let me know.  Any comments are welcome!
>>>
>>> That's *awesome*!  Thanks for sharing, Ilya.  Obviously this will need a
>>> lot of time to digest but a quick attempt to do a mount using a v2 monitor
>>> is just showing me a bunch of:
>>>
>>> libceph: mon0 (1)192.168.155.1:40898 socket closed (con state V1_BANNER)
>>>
>>> Note that this was just me giving it a try with a dummy vstart cluster
>>> (octopus IIRC), so nothing that could be considered testing.  I'll try to
>>> find out what I'm doing wrong in the next couple of days or, worst case,
>>> after EOY vacations.
>>
>> Hi Luis,
>>
>> This is because the kernel continues to default to msgr1.  The socket
>> gets closed by the mon right after it sees msgr1 banner and you should
>> see "peer ... is using msgr V1 protocol" error in the log.
>>
>> For msgr2, you need to select a connection mode using the new ms_mode
>> option:
>>
>>   ms_mode=legacy        - msgr1 (default)
>>   ms_mode=crc           - crc mode, if denied fail
>>   ms_mode=secure        - secure mode, if denied fail
>>   ms_mode=prefer-crc    - crc mode, if denied agree to secure mode
>>   ms_mode=prefer-secure - secure mode, if denied agree to crc mode
>
> Ah, right.  I should have took a quick look at the patches first to check
> for any new parameters.  Thanks for pointing me at that, I'll retry my
> test using ms_mode.

Maybe you're already aware of this, but here it goes: I'm seeing the
following BUG when doing a mount using ms_mode=secure:

[  159.648716] ==================================================================                                                                                             
[  159.652328] BUG: KASAN: slab-out-of-bounds in prepare_head_secure_small+0x101/0x110                                                                                        
[  159.656203] Write of size 16 at addr ffff8881014eb5c0 by task kworker/0:1/12                                                                                               
[  159.659764]                                                                                                                                                                
[  159.660600] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc2+ #55                                                                                                   
[  159.663985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014                                              
[  159.669423] Workqueue: ceph-msgr ceph_con_workfn                                                                                                                           
[  159.671559] Call Trace:                                                                                                                                                    
[  159.672657]  dump_stack+0x9a/0xcc                                                                                                                                          
[  159.674115]  ? prepare_head_secure_small+0x101/0x110                                                                                                                       
[  159.676127]  print_address_description.constprop.0+0x1c/0x220                                                                                                              
[  159.678455]  ? prepare_head_secure_small+0x101/0x110                                                                                                                       
[  159.680334]  ? prepare_head_secure_small+0x101/0x110                                                                                                                       
[  159.682245]  kasan_report.cold+0x1f/0x37                                                                                                                                   
[  159.683730]  ? queue_zeros+0x141/0x1c0                                                                                                                                     
[  159.685069]  ? prepare_head_secure_small+0x101/0x110                                                                                                                       
[  159.686849]  check_memory_region+0x145/0x1a0                                                                                                                               
[  159.688306]  memset+0x20/0x40                                                                                                                                              
[  159.689318]  prepare_head_secure_small+0x101/0x110
[  159.691034]  ? gcm_crypt+0x190/0x190
[  159.692266]  ? memset+0x20/0x40
[  159.693349]  ? encode_preamble+0xc0/0xd0
[  159.694591]  __prepare_control+0x3f1/0x450
[  159.695463]  ? prepare_read_preamble+0x190/0x190
[  159.696450]  ? kasan_unpoison_shadow+0x33/0x40
[  159.697439]  ? ceph_kvmalloc+0x7e/0x110
[  159.698369]  __handle_control+0x1115/0x2120
[  159.699294]  ? process_hello+0x510/0x510
[  159.700130]  ? lock_chain_count+0x20/0x20
[  159.700995]  ? lock_acquire+0x1b1/0x580
[  159.701780]  ? create_object.isra.0+0x239/0x4e0
[  159.702776]  ? lockdep_enabled+0x39/0x50
[  159.703667]  ? find_held_lock+0x85/0xa0
[  159.704448]  ? __kmalloc+0x16d/0x330
[  159.705231]  ? mark_held_locks+0x65/0x90
[  159.706020]  ? lockdep_enabled+0x39/0x50
[  159.706856]  ? ceph_kvmalloc+0x7e/0x110
[  159.707588]  ceph_con_v2_try_read+0x13dc/0x1e00
[  159.708456]  ? populate_out_iter+0x14b0/0x14b0
[  159.709306]  ? __mutex_lock+0x2e9/0xb70
[  159.710000]  ? trace_hardirqs_on+0x3e/0x110
[  159.710770]  ? ceph_con_workfn+0x41/0xb00
[  159.711503]  ? lock_acquire+0x1b1/0x580
[  159.712187]  ? process_one_work+0x4b2/0xaa0
[  159.712935]  ? lock_release+0x410/0x410
[  159.713617]  ? lock_downgrade+0x380/0x380
[  159.714353]  ceph_con_workfn+0x2df/0xb00
[  159.715048]  process_one_work+0x599/0xaa0
[  159.715729]  ? pwq_dec_nr_in_flight+0x110/0x110
[  159.716489]  ? rwlock_bug.part.0+0x60/0x60
[  159.717187]  worker_thread+0x2bc/0x780
[  159.717832]  ? process_one_work+0xaa0/0xaa0
[  159.718542]  kthread+0x1de/0x200
[  159.719087]  ? kthread_create_worker_on_cpu+0xd0/0xd0
[  159.719892]  ret_from_fork+0x22/0x30
[  159.720488] 
[  159.720744] Allocated by task 12:
[  159.721285]  kasan_save_stack+0x1b/0x40
[  159.721901]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  159.722678]  ceph_kvmalloc+0x68/0x110
[  159.723249]  alloc_conn_buf+0x42/0xd0
[  159.723817]  __handle_control+0x10bb/0x2120
[  159.724456]  ceph_con_v2_try_read+0x13dc/0x1e00
[  159.725149]  ceph_con_workfn+0x2df/0xb00
[  159.725755]  process_one_work+0x599/0xaa0
[  159.726382]  worker_thread+0x2bc/0x780
[  159.726957]  kthread+0x1de/0x200
[  159.727445]  ret_from_fork+0x22/0x30
[  159.727976] 
[  159.728216] The buggy address belongs to the object at ffff8881014eb580
[  159.728216]  which belongs to the cache kmalloc-96 of size 96
[  159.730011] The buggy address is located 64 bytes inside of
[  159.730011]  96-byte region [ffff8881014eb580, ffff8881014eb5e0)
[  159.731648] The buggy address belongs to the page:
[  159.732333] page:00000000892842f3 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1014eb
[  159.733656] flags: 0x8000000000000200(slab)
[  159.734264] raw: 8000000000000200 ffffea000415eac0 0000001900000019 ffff888100041780
[  159.735355] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  159.736437] page dumped because: kasan: bad access detected
[  159.737229] 
[  159.737454] Memory state around the buggy address:
[  159.738130]  ffff8881014eb480: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  159.739533]  ffff8881014eb500: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  159.740898] >ffff8881014eb580: 00 00 00 00 00 00 00 00 04 fc fc fc fc fc fc fc
[  159.742221]                                            ^
[  159.743035]  ffff8881014eb600: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  159.744053]  ffff8881014eb680: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  159.745091] ==================================================================

Looks like the memset in prepare_head_secure_small() is cleaning behind
the base limits.  Unfortunately, I didn't really had time to dig deeper
into this.

Cheers,
--
Luis



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux