problem with RGW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everybody

We have ceph cluster that consist of 8 host with 12 osd per each host. It's 2T SATA disks.

[13:23]:[root@se087  ~]# ceph osd tree
ID  WEIGHT    TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 182.99203 root default                                                       
 -2 182.99203     region RU                                                      
 -3  91.49487         datacenter ru-msk-comp1p                                   
 -9  22.87500             host 1                                             
 48   1.90599                 osd.48                up  1.00000          1.00000 
 49   1.90599                 osd.49                up  1.00000          1.00000 
 50   1.90599                 osd.50                up  1.00000          1.00000 
 51   1.90599                 osd.51                up  1.00000          1.00000 
 52   1.90599                 osd.52                up  1.00000          1.00000 
 53   1.90599                 osd.53                up  1.00000          1.00000 
 54   1.90599                 osd.54                up  1.00000          1.00000 
 55   1.90599                 osd.55                up  1.00000          1.00000 
 56   1.90599                 osd.56                up  1.00000          1.00000 
 57   1.90599                 osd.57                up  1.00000          1.00000 
 58   1.90599                 osd.58                up  1.00000          1.00000 
 59   1.90599                 osd.59                up  1.00000          1.00000 
-10  22.87216             host 2                                             
 60   1.90599                 osd.60                up  1.00000          1.00000 
 61   1.90599                 osd.61                up  1.00000          1.00000 
 62   1.90599                 osd.62                up  1.00000          1.00000 
 63   1.90599                 osd.63                up  1.00000          1.00000 
 64   1.90599                 osd.64                up  1.00000          1.00000 
 65   1.90599                 osd.65                up  1.00000          1.00000 
 66   1.90599                 osd.66                up  1.00000          1.00000 
 67   1.90599                 osd.67                up  1.00000          1.00000 
 69   1.90599                 osd.69                up  1.00000          1.00000 
 70   1.90599                 osd.70                up  1.00000          1.00000 
 71   1.90599                 osd.71                up  1.00000          1.00000 
 68   1.90627                 osd.68                up  1.00000          1.00000 
-11  22.87500             host 3                                             
 72   1.90599                 osd.72                up  1.00000          1.00000 
 73   1.90599                 osd.73                up  1.00000          1.00000 
 74   1.90599                 osd.74                up  1.00000          1.00000 
 75   1.90599                 osd.75                up  1.00000          1.00000 
 76   1.90599                 osd.76                up  1.00000          1.00000 
 77   1.90599                 osd.77                up  1.00000          1.00000 
 78   1.90599                 osd.78                up  1.00000          1.00000 
 79   1.90599                 osd.79                up  1.00000          1.00000 
 80   1.90599                 osd.80                up  1.00000          1.00000 
 81   1.90599                 osd.81                up  1.00000          1.00000 
 82   1.90599                 osd.82                up  1.00000          1.00000 
 83   1.90599                 osd.83                up  1.00000          1.00000 
-12  22.87271             host 4                                             
 84   1.90599                 osd.84                up  1.00000          1.00000 
 86   1.90599                 osd.86                up  1.00000          1.00000 
 89   1.90599                 osd.89                up  1.00000          1.00000 
 90   1.90599                 osd.90                up  1.00000          1.00000 
 91   1.90599                 osd.91                up  1.00000          1.00000 
 92   1.90599                 osd.92                up  1.00000          1.00000 
 93   1.90599                 osd.93                up  1.00000          1.00000 
 94   1.90599                 osd.94                up  1.00000          1.00000 
 95   1.90599                 osd.95                up  1.00000          1.00000 
 85   1.90627                 osd.85                up  1.00000          1.00000 
 88   1.90627                 osd.88                up  1.00000          1.00000 
 87   1.90627                 osd.87                up  1.00000          1.00000 
 -4  91.49716         datacenter ru-msk-vol51                                    
 -5  22.87216             host 5                                             
  1   1.90599                 osd.1                 up  1.00000          1.00000 
  2   1.90599                 osd.2                 up  1.00000          1.00000 
  3   1.90599                 osd.3                 up  1.00000          1.00000 
  4   1.90599                 osd.4                 up  1.00000          1.00000 
  5   1.90599                 osd.5                 up  1.00000          1.00000 
  6   1.90599                 osd.6                 up  1.00000          1.00000 
  7   1.90599                 osd.7                 up  1.00000          1.00000 
  8   1.90599                 osd.8                 up  1.00000          1.00000 
  9   1.90599                 osd.9                 up  1.00000          1.00000 
 10   1.90599                 osd.10                up  1.00000          1.00000 
 11   1.90599                 osd.11                up  1.00000          1.00000 
  0   1.90627                 osd.0                 up  1.00000          1.00000 
 -6  22.87500             host 6                                             
 12   1.90599                 osd.12                up  1.00000          1.00000 
 13   1.90599                 osd.13                up  1.00000          1.00000 
 14   1.90599                 osd.14                up  1.00000          1.00000 
 15   1.90599                 osd.15                up  1.00000          1.00000 
 16   1.90599                 osd.16                up  1.00000          1.00000 
 17   1.90599                 osd.17                up  1.00000          1.00000 
 18   1.90599                 osd.18                up  1.00000          1.00000 
 19   1.90599                 osd.19                up  1.00000          1.00000 
 20   1.90599                 osd.20                up  1.00000          1.00000 
 21   1.90599                 osd.21                up  1.00000          1.00000 
 22   1.90599                 osd.22                up  1.00000          1.00000 
 23   1.90599                 osd.23                up  1.00000          1.00000 
 -7  22.87500             host 7                                             
 24   1.90599                 osd.24                up  1.00000          1.00000 
 25   1.90599                 osd.25                up  1.00000          1.00000 
 26   1.90599                 osd.26                up  1.00000          1.00000 
 27   1.90599                 osd.27                up  1.00000          1.00000 
 28   1.90599                 osd.28                up  1.00000          1.00000 
 29   1.90599                 osd.29                up  1.00000          1.00000 
 30   1.90599                 osd.30                up  1.00000          1.00000 
 31   1.90599                 osd.31                up  1.00000          1.00000 
 32   1.90599                 osd.32                up  1.00000          1.00000 
 33   1.90599                 osd.33                up  1.00000          1.00000 
 34   1.90599                 osd.34                up  1.00000          1.00000 
 35   1.90599                 osd.35                up  1.00000          1.00000 
 -8  22.87500             host 8                                             
 36   1.90599                 osd.36                up  1.00000          1.00000 
 37   1.90599                 osd.37                up  1.00000          1.00000 
 38   1.90599                 osd.38                up  1.00000          1.00000 
 39   1.90599                 osd.39                up  1.00000          1.00000 
 40   1.90599                 osd.40                up  1.00000          1.00000 
 41   1.90599                 osd.41                up  1.00000          1.00000 
 42   1.90599                 osd.42                up  1.00000          1.00000 
 43   1.90599                 osd.43                up  1.00000          1.00000 
 44   1.90599                 osd.44                up  1.00000          1.00000 
 45   1.90599                 osd.45                up  1.00000          1.00000 
 46   1.90599                 osd.46                up  1.00000          1.00000 
 47   1.90599                 osd.47                up  1.00000          1.00000

Now we have two problems:
1) Problems with blocked requests in osd
ceph -s
.....
18 requests are blocked > 32 sec
....

ceph health detail

...
1 ops are blocked > 4194.3 sec
34 ops are blocked > 2097.15 sec
1 ops are blocked > 4194.3 sec on osd.0
34 ops are blocked > 2097.15 sec on osd.0
1 osds have slow requests
....

In log osd.0

2015-07-31 14:03:24.490774 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 35 slow requests, 9 included below; oldest blocked for > 3003.952332 secs
2015-07-31 14:03:24.490782 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 960.179599 seconds old, received at 2015-07-31 13:47:24.311080: osd_op(client.67321.0:7856 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~0] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:24.490791 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 960.179357 seconds old, received at 2015-07-31 13:47:24.311323: osd_op(client.67321.0:7857 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:24.490794 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 960.167539 seconds old, received at 2015-07-31 13:47:24.323141: osd_op(client.67321.0:7858 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 524288~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:24.490797 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 960.155554 seconds old, received at 2015-07-31 13:47:24.335126: osd_op(client.67321.0:7859 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1048576~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:24.490801 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 960.145867 seconds old, received at 2015-07-31 13:47:24.344813: osd_op(client.67321.0:7860 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1572864~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:25.491062 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 35 slow requests, 4 included below; oldest blocked for > 3004.952621 secs
2015-07-31 14:03:25.491078 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 961.140790 seconds old, received at 2015-07-31 13:47:24.350178: osd_op(client.67321.0:7861 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2097152~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:25.491084 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 961.097870 seconds old, received at 2015-07-31 13:47:24.393098: osd_op(client.67321.0:7862 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2621440~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:25.491089 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 961.093229 seconds old, received at 2015-07-31 13:47:24.397740: osd_op(client.67321.0:7863 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3145728~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached
2015-07-31 14:03:25.491095 7f2cd95c5700  0 log_channel(cluster) log [WRN] : slow request 961.002957 seconds old, received at 2015-07-31 13:47:24.488012: osd_op(client.67321.0:7864 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3670016~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached

How I can avoid these blocked requests? What is root cause of this problem?

2) Strange logs in RGW
2015-07-31 13:22:20.204781 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:25.204941 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:30.205093 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:35.205318 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:40.205428 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:45.205567 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:50.205742 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:22:55.205908 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:23:00.206063 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600
2015-07-31 13:23:05.206251 7f9c02a5e700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600

Could you help me to solve these problems?

-- 
Best Regards,
Stanislav Butkeev
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux