Re: Luminous cephfs maybe not to stable as expected?

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 11 Jul 2019 11:08:32 +0200

Forgot to add these

[@ ~]# cat 
/sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client357431
0/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
BACKOFFS

[@~]# cat 
/sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client358422
4/osdc
REQUESTS 38 homeless 0
317841  osd0    20.d6ec44c1     20.1    [0,28,5]/0      [0,28,5]/0      
e65040  10001b44a70.00000000    0x40001c        101139  read
317853  osd0    20.5956d31b     20.1b   [0,5,10]/0      [0,5,10]/0      
e65040  10001ad8962.00000000    0x40001c        39847   read
317835  osd3    20.ede889de     20.1e   [3,12,27]/3     [3,12,27]/3     
e65040  10001ad80f6.00000000    0x40001c        87758   read
317838  osd3    20.7b730a4e     20.e    [3,31,9]/3      [3,31,9]/3      
e65040  10001ad89d8.00000000    0x40001c        83444   read
317844  osd3    20.feead84c     20.c    [3,13,18]/3     [3,13,18]/3     
e65040  10001ad8733.00000000    0x40001c        77267   read
317852  osd3    20.bd2658e      20.e    [3,31,9]/3      [3,31,9]/3      
e65040  10001ad7e00.00000000    0x40001c        39331   read
317830  osd4    20.922e6d04     20.4    [4,16,27]/4     [4,16,27]/4     
e65040  10001ad80f2.00000000    0x40001c        86326   read
317837  osd4    20.fe93d4ab     20.2b   [4,14,25]/4     [4,14,25]/4     
e65040  10001ad80fb.00000000    0x40001c        78951   read
317839  osd4    20.d7af926b     20.2b   [4,14,25]/4     [4,14,25]/4     
e65040  10001ad80ee.00000000    0x40001c        77556   read
317849  osd5    20.5fcb95c5     20.5    [5,18,29]/5     [5,18,29]/5     
e65040  10001ad7f75.00000000    0x40001c        61147   read
317857  osd5    20.28764e9a     20.1a   [5,7,28]/5      [5,7,28]/5      
e65040  10001ad8a10.00000000    0x40001c        30369   read
317859  osd5    20.7bb79985     20.5    [5,18,29]/5     [5,18,29]/5     
e65040  10001ad7fe8.00000000    0x40001c        27942   read
317836  osd8    20.e7bf5cf4     20.34   [8,5,10]/8      [8,5,10]/8      
e65040  10001ad7d79.00000000    0x40001c        133699  read
317842  osd8    20.abbb9df4     20.34   [8,5,10]/8      [8,5,10]/8      
e65040  10001d5903f.00000000    0x40001c        125308  read
317850  osd8    20.ecd0034      20.34   [8,5,10]/8      [8,5,10]/8      
e65040  10001ad89b2.00000000    0x40001c        68348   read
317854  osd8    20.cef50134     20.34   [8,5,10]/8      [8,5,10]/8      
e65040  10001ad8728.00000000    0x40001c        57431   read
317861  osd8    20.3e859bb4     20.34   [8,5,10]/8      [8,5,10]/8      
e65040  10001ad8108.00000000    0x40001c        50642   read
317847  osd9    20.fc9e9f43     20.3    [9,29,17]/9     [9,29,17]/9     
e65040  10001ad8101.00000000    0x40001c        88464   read
317848  osd9    20.d32b6ac3     20.3    [9,29,17]/9     [9,29,17]/9     
e65040  10001ad8100.00000000    0x40001c        85929   read
317862  osd11   20.ee6cc689     20.9    [11,0,12]/11    [11,0,12]/11    
e65040  10001ad7d64.00000000    0x40001c        40266   read
317843  osd12   20.a801f0e9     20.29   [12,26,8]/12    [12,26,8]/12    
e65040  10001ad7f07.00000000    0x40001c        86610   read
317851  osd12   20.8bb48de9     20.29   [12,26,8]/12    [12,26,8]/12    
e65040  10001ad7e4f.00000000    0x40001c        46746   read
317860  osd12   20.47815f36     20.36   [12,0,28]/12    [12,0,28]/12    
e65040  10001ad8035.00000000    0x40001c        35249   read
317831  osd15   20.9e3acb53     20.13   [15,0,1]/15     [15,0,1]/15     
e65040  10001ad8978.00000000    0x40001c        85329   read
317840  osd15   20.2a40efdf     20.1f   [15,4,17]/15    [15,4,17]/15    
e65040  10001ad7ef8.00000000    0x40001c        76282   read
317846  osd15   20.8143f15f     20.1f   [15,4,17]/15    [15,4,17]/15    
e65040  10001ad89d1.00000000    0x40001c        61297   read
317864  osd15   20.c889a49c     20.1c   [15,0,31]/15    [15,0,31]/15    
e65040  10001ad89fb.00000000    0x40001c        24385   read
317832  osd18   20.f76227a      20.3a   [18,6,15]/18    [18,6,15]/18    
e65040  10001ad8020.00000000    0x40001c        82852   read
317833  osd18   20.d8edab31     20.31   [18,29,14]/18   [18,29,14]/18   
e65040  10001ad8952.00000000    0x40001c        82852   read
317858  osd18   20.8f69d231     20.31   [18,29,14]/18   [18,29,14]/18   
e65040  10001ad8176.00000000    0x40001c        32400   read
317855  osd22   20.b3342c0f     20.f    [22,18,31]/22   [22,18,31]/22   
e65040  10001ad8146.00000000    0x40001c        51024   read
317863  osd23   20.cde0ce7b     20.3b   [23,1,6]/23     [23,1,6]/23     
e65040  10001ad856c.00000000    0x40001c        34521   read
317865  osd23   20.702d2dfe     20.3e   [23,9,22]/23    [23,9,22]/23    
e65040  10001ad8a5e.00000000    0x40001c        30664   read
317866  osd23   20.cb4a32fe     20.3e   [23,9,22]/23    [23,9,22]/23    
e65040  10001ad8575.00000000    0x40001c        29683   read
317867  osd23   20.9a008910     20.10   [23,12,6]/23    [23,12,6]/23    
e65040  10001ad7d24.00000000    0x40001c        29683   read
317834  osd25   20.6efd4911     20.11   [25,4,0]/25     [25,4,0]/25     
e65040  10001ad8023.00000000    0x40001c        147589  read
317856  osd26   20.febb382a     20.2a   [26,0,18]/26    [26,0,18]/26    
e65040  10001ad8145.00000000    0x40001c        65169   read
317845  osd27   20.5b433067     20.27   [27,7,14]/27    [27,7,14]/27    
e65040  10001ad8965.00000000    0x40001c        124461  read
LINGER REQUESTS
BACKOFFS

-----Original Message-----
Subject:  Luminous cephfs maybe not to stable as expected?

Maybe this requires some attention. I have a default centos7 (maybe not 
the most recent kernel though), ceph luminous setup eg. no different 
kernels. 

This is 2nd or 3rd time that a vm is going into a high load (151) and 
stopping its services. I have two vm's both mounting the same 2 cephfs 
'shares'. After the last incident I dismounted the shares on the 2nd 
server. (Migrating to a new environment this 2nd server is not doing 
anything). Last time I thought maybe this could be related to my work on 
the switch from the stupid allocator to the bitmap.

Anyway yesterday I thought lets mount again the 2 shares on the 2nd 
server, see what happens. And this morning the high load was back. Afaik 
the 2nd server is only doing a cron job on the cephfs mounts, creating 
snapshots.

1) I have now still increased load on the osd nodes, from cephfs. How 
can I see what client is doing this? I don’t seem to get this from 
'ceph daemon mds.c session ls' however 'ceph osd pool stats | grep 
client -B 1' indicates it is cephfs.

2) ceph osd blacklist ls
No blacklist entries

3) the first server keeps generating such messages, while there is no 
issue with connectivity.

[Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session 
lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon1 
192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] 
libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] 
libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu 
Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session 
established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 
io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 
session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: 
mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu 
Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session 
established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 
io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 
session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: 
mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu 
Jul 11 10:41:22 2019] libceph: osd25 192.168.10.114:6804 io error [Thu 
Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session 
established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: 
mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu 
Jul 11 10:41:22 2019] libceph: osd18 192.168.10.112:6802 io error [Thu 
Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session 
established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: 
mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] 
libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu 
Jul 11 10:41:22 2019] libceph: osd22 192.168.10.111:6811 io error [Thu 
Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session 
established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 
session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: 
mon0 192.168.10.111:6789 session established

PS dmesg -T gives me strange times, as you can see these are in the 
future, os time is 2 min behind (which is the correct one, ntpd sync).
[@ ]# uptime
 10:39:17 up 50 days, 13:31,  2 users,  load average: 3.60, 3.02, 2.57

4) unmount the filesystem on the first server fails.

5) evicting the cephfs sessions of the first server, does not change the 
load of the cephfs on the osd nodes.

6) unmounting all cephfs clients, still leaves me with cephfs activity 
on the data pool and on the osd nodes.

[@c03 ~]# ceph daemon mds.c session ls
[] 

7) On the first server
[@~]# ps -auxf| grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      6716  3.0  0.0      0     0 ?        D    10:18   0:59  \_ 
[kworker/0:2]
root     20039  0.0  0.0 123520  1212 pts/0    D+   10:28   0:00  |      

 \_ umount /home/mail-archive/

[@ ~]# cat /proc/6716/stack
[<ffffffff8385e110>] __wait_on_freeing_inode+0xb0/0xf0 
[<ffffffff8385e1e9>] find_inode+0x99/0xc0 [<ffffffff8385e281>] 
ilookup5_nowait+0x71/0x90 [<ffffffff8385f09f>] ilookup5+0xf/0x60 
[<ffffffffc060fb35>] remove_session_caps+0xf5/0x1d0 [ceph] 
[<ffffffffc06158fc>] dispatch+0x39c/0xb00 [ceph] [<ffffffffc052afb4>] 
try_read+0x514/0x12c0 [libceph] [<ffffffffc052bf64>] 
ceph_con_workfn+0xe4/0x1530 [libceph] [<ffffffff836b9e3f>] 
process_one_work+0x17f/0x440 [<ffffffff836baed6>] 
worker_thread+0x126/0x3c0 [<ffffffff836c1d21>] kthread+0xd1/0xe0 
[<ffffffff83d75c37>] ret_from_fork_nospec_end+0x0/0x39 
[<ffffffffffffffff>] 0xffffffffffffffff

[@ ~]# cat /proc/20039/stack
[<ffffffff837b5e14>] __lock_page+0x74/0x90 [<ffffffff837c744c>] 
truncate_inode_pages_range+0x6cc/0x700
[<ffffffff837c74ef>] truncate_inode_pages_final+0x4f/0x60
[<ffffffff8385f02c>] evict+0x16c/0x180
[<ffffffff8385f87c>] iput+0xfc/0x190
[<ffffffff8385aa18>] shrink_dcache_for_umount_subtree+0x158/0x1e0
[<ffffffff8385c3bf>] shrink_dcache_for_umount+0x2f/0x60
[<ffffffff8384426f>] generic_shutdown_super+0x1f/0x100 
[<ffffffff838446b2>] kill_anon_super+0x12/0x20 [<ffffffffc05ea130>] 
ceph_kill_sb+0x30/0x80 [ceph] [<ffffffff83844a6e>] 
deactivate_locked_super+0x4e/0x70 [<ffffffff838451f6>] 
deactivate_super+0x46/0x60 [<ffffffff8386373f>] cleanup_mnt+0x3f/0x80 
[<ffffffff838637d2>] __cleanup_mnt+0x12/0x20 [<ffffffff836be88b>] 
task_work_run+0xbb/0xe0 [<ffffffff8362bc65>] do_notify_resume+0xa5/0xc0 
[<ffffffff83d76134>] int_signal+0x12/0x17 [<ffffffffffffffff>] 
0xffffffffffffffff

What to do now? In ceph.conf I only have these entries, not sure if I 
still should keep them.

# 100k+ files in 2 folders
mds bal fragment size max = 120000
mds_session_blacklist_on_timeout = false mds_session_blacklist_on_evict 
= false mds_cache_memory_limit = 8000000000

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com