Re: Fwd: High IOWait Issue

Budai Laszlo <laszlo.budai@xxxxxxxxx> · Mon, 26 Mar 2018 02:41:42 +0300

Besides checking what David told you, you can tune the scrub operation. (your ceph -s shows 2 deep scrub operations being performed that could have an impact on your user traffic).
For instance you could set the following parameters:

osd scrub chunk max = 5
osd scrub chunk min=1
osd scrub sleep = 0.1

you can display the current values for an osd using the "ceph -n osd.xy --show-config" command. There you could check the current values of the above parameters.

Kind regards,
Laszlo

On 25.03.2018 17:30, David Turner wrote:
> I recommend that people check their disk controller caches/batteries as well as checking for subfolder splitting on filestore (which is the only option on Jewel). The former leads to high await, the later contributes to blocked requests.
>
> On Sun, Mar 25, 2018, 3:36 AM Sam Huracan <nowitzki.sammy@xxxxxxxxx <mailto:nowitzki.sammy@xxxxxxxxx>> wrote:
>
>     Thank you all.
>
>     1. Here is my ceph.conf file:
>     https://pastebin.com/xpF2LUHs
>
>     2. Here is result from ceph -s:
>     root@ceph1:/etc/ceph# ceph -s
>         cluster 31154d30-b0d3-4411-9178-0bbe367a5578
>          health HEALTH_OK
>          monmap e3: 3 mons at {ceph1=10.0.30.51:6789/0,ceph2=10.0.30.52:6789/0,ceph3=10.0.30.53:6789/0 <http://10.0.30.51:6789/0,ceph2=10.0.30.52:6789/0,ceph3=10.0.30.53:6789/0>}
>                 election epoch 18, quorum 0,1,2 ceph1,ceph2,ceph3
>          osdmap e2473: 63 osds: 63 up, 63 in
>                 flags sortbitwise,require_jewel_osds
>           pgmap v34069952: 4096 pgs, 6 pools, 21534 GB data, 5696 kobjects
>                 59762 GB used, 135 TB / 194 TB avail
>                     4092 active+clean
>                        2 active+clean+scrubbing
>                        2 active+clean+scrubbing+deep
>       client io 36096 kB/s rd, 41611 kB/s wr, 1643 op/s rd, 1634 op/s wr
>
>
>
>     3. We use 1 SSD for journaling 7 HDD (/dev/sdi), I set 16GB for each journal,  here is result from ceph-disk list command:
>
>     /dev/sda :
>      /dev/sda1 ceph data, active, cluster ceph, osd.0, journal /dev/sdi1
>     /dev/sdb :
>      /dev/sdb1 ceph data, active, cluster ceph, osd.1, journal /dev/sdi2
>     /dev/sdc :
>      /dev/sdc1 ceph data, active, cluster ceph, osd.2, journal /dev/sdi3
>     /dev/sdd :
>      /dev/sdd1 ceph data, active, cluster ceph, osd.3, journal /dev/sdi4
>     /dev/sde :
>      /dev/sde1 ceph data, active, cluster ceph, osd.4, journal /dev/sdi5
>     /dev/sdf :
>      /dev/sdf1 ceph data, active, cluster ceph, osd.5, journal /dev/sdi6
>     /dev/sdg :
>      /dev/sdg1 ceph data, active, cluster ceph, osd.6, journal /dev/sdi7
>     /dev/sdh :
>      /dev/sdh3 other, LVM2_member
>      /dev/sdh1 other, vfat, mounted on /boot/efi
>     /dev/sdi :
>      /dev/sdi1 ceph journal, for /dev/sda1
>      /dev/sdi2 ceph journal, for /dev/sdb1
>      /dev/sdi3 ceph journal, for /dev/sdc1
>      /dev/sdi4 ceph journal, for /dev/sdd1
>      /dev/sdi5 ceph journal, for /dev/sde1
>      /dev/sdi6 ceph journal, for /dev/sdf1
>      /dev/sdi7 ceph journal, for /dev/sdg1
>
>     4. With iostat, we just run "iostat -x 2", /dev/sdi is journal SSD, /dev/sdh is OS Disk, and the rest is OSD Disks.
>     root@ceph1:/etc/ceph# lsblk
>     NAME                             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>     sda                                8:0    0   3.7T  0 disk 
>     └─sda1                             8:1    0   3.7T  0 part /var/lib/ceph/osd/ceph-0
>     sdb                                8:16   0   3.7T  0 disk 
>     └─sdb1                             8:17   0   3.7T  0 part /var/lib/ceph/osd/ceph-1
>     sdc                                8:32   0   3.7T  0 disk 
>     └─sdc1                             8:33   0   3.7T  0 part /var/lib/ceph/osd/ceph-2
>     sdd                                8:48   0   3.7T  0 disk 
>     └─sdd1                             8:49   0   3.7T  0 part /var/lib/ceph/osd/ceph-3
>     sde                                8:64   0   3.7T  0 disk 
>     └─sde1                             8:65   0   3.7T  0 part /var/lib/ceph/osd/ceph-4
>     sdf                                8:80   0   3.7T  0 disk 
>     └─sdf1                             8:81   0   3.7T  0 part /var/lib/ceph/osd/ceph-5
>     sdg                                8:96   0   3.7T  0 disk 
>     └─sdg1                             8:97   0   3.7T  0 part /var/lib/ceph/osd/ceph-6
>     sdh                                8:112  0 278.9G  0 disk 
>     ├─sdh1                             8:113  0   512M  0 part /boot/efi
>     └─sdh3                             8:115  0 278.1G  0 part 
>       ├─hnceph--hdd1--vg-swap (dm-0) 252:0    0  59.6G  0 lvm  [SWAP]
>       └─hnceph--hdd1--vg-root (dm-1) 252:1    0 218.5G  0 lvm  /
>     sdi                                8:128  0 185.8G  0 disk 
>     ├─sdi1                             8:129  0  16.6G  0 part 
>     ├─sdi2                             8:130  0  16.6G  0 part 
>     ├─sdi3                             8:131  0  16.6G  0 part 
>     ├─sdi4                             8:132  0  16.6G  0 part 
>     ├─sdi5                             8:133  0  16.6G  0 part 
>     ├─sdi6                             8:134  0  16.6G  0 part 
>     └─sdi7                             8:135  0  16.6G  0 part 
>
>     Could you give me some idea to continue check?
>
>
>     2018-03-25 12:25 GMT+07:00 Budai Laszlo <laszlo.budai@xxxxxxxxx <mailto:laszlo.budai@xxxxxxxxx>>:
>
>         could you post the result of "ceph -s" ? besides the health status there are other details that could help, like the status of your PGs., also the result of "ceph-disk list" would be useful to understand how your disks are organized. For instance with 1 SSD for 7 HDD the SSD could be the bottleneck.
>         >From the outputs you gave us we don't know which are the spinning disks and which is the ssd (looking at the numbers I suspect that sdi is your SSD). we also don't kow what parameters were you using when you've ran the iostat command.
>
>         Unfortunately it's difficult to help you without knowing more about your system.
>
>         Kind regards,
>         Laszlo
>
>         On 24.03.2018 20:19, Sam Huracan wrote:
>         > This is from iostat:
>         >
>         > I'm using Ceph jewel, has no HW error.
>         > Ceph  health OK, we've just use 50% total volume.
>         >
>         >
>         > 2018-03-24 22:20 GMT+07:00 <ceph@xxxxxxxxxx <mailto:ceph@xxxxxxxxxx> <mailto:ceph@xxxxxxxxxx <mailto:ceph@xxxxxxxxxx>>>:
>         >
>         >     I would Check with Tools like atop the utilization of your Disks also. Perhaps something Related in dmesg or dorthin?
>         >
>         >     - Mehmet
>         >
>         >     Am 24. März 2018 08:17:44 MEZ schrieb Sam Huracan <nowitzki.sammy@xxxxxxxxx <mailto:nowitzki.sammy@xxxxxxxxx> <mailto:nowitzki.sammy@xxxxxxxxx <mailto:nowitzki.sammy@xxxxxxxxx>>>:
>         >
>         >
>         >         Hi guys,
>         >         We are running a production OpenStack backend by Ceph.
>         >
>         >         At present, we are meeting an issue relating to high iowait in VM, in some MySQL VM, we see sometime IOwait reaches  abnormal high peaks which lead to slow queries increase, despite load is stable (we test with script simulate real load), you can see in graph.
>         >         https://prnt.sc/ivndni
>         >
>         >         MySQL VM are place on Ceph HDD Cluster, with 1 SSD journal for 7 HDD. In this cluster, IOwait on each ceph host is about 20%.
>         >         https://prnt.sc/ivne08
>         >
>         >
>         >         Can you guy help me find the root cause of this issue, and how to eliminate this high iowait?
>         >
>         >         Thanks in advance.
>         >
>         >
>         >     _______________________________________________
>         >     ceph-users mailing list
>         >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
>         >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > ceph-users mailing list
>         > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>         > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>         _______________________________________________
>         ceph-users mailing list
>         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com