Re: Ceph RBD LIO ESXi Advice?

Timofey Titovets <nefelim4ag@xxxxxxxxx> · Mon, 9 Nov 2015 17:41:39 +0300



Great thanks, Alex, you give me a hope, i'll try SCST later in
configuration what you suggest

2015-11-09 16:25 GMT+03:00 Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>:
> Hi Timofey,
>
> With Nick's, Jan's, RedHat's and others' help we have a stable and, in my
> best judgement, well performing system using SCST as the iSCSI delivery
> framework.  SCST allows the use of Linux page cache when utilizing the
> vdisk_fileio backend.  LIO should be able to do this to using FILEIO
> backstore and the block device name as file name, but I have not tried that
> due to having switched to SCST for stability.
>
> The page cache will improve latency due to the reads and writes first
> occurring in RAM.  Naturally, all the usual considerations apply as to the
> loss of dirty pages on machine crash.  So tuning the vm.dirty* parameters is
> quite important.
>
> This setting was critically important to avoid hangs and major issues due to
> some problem with XFS and page cache on OSD nodes:
>
> sysctl vm.min_free_kbytes=1048576
>
> (reserved memory when using vm.swappiness = 1)
>
> 10 GbE networking seems to be helping a lot, it could be just the superior
> switch response on a higher end switch.
>
> Using blk_mq scheduler, it's been reported to improve performance on random
> IO.
>
> Good luck!
>
> --
> Alex Gorbachev
> Storcium
>
> On Sun, Nov 8, 2015 at 5:07 PM, Timofey Titovets <nefelim4ag@xxxxxxxxx>
> wrote:
>>
>> Big thanks Nick, any way
>> Now i catch hangs of ESXi and  Proxy =_=''
>> /* Proxy VM: Ubuntu 15.10/Kernel 4.3/LIO/Ceph 0.94/ESXi 6.0 Software
>> iSCSI*/
>> I've moved to NFS-RBD proxy and now try to make it HA
>>
>> 2015-11-07 18:59 GMT+03:00 Nick Fisk <nick@xxxxxxxxxx>:
>> > Hi Timofey,
>> >
>> > You are most likely experiencing the effects of Ceph's write latency in
>> > combination with the sync write behaviour of ESXi. You will probably
>> > struggle to get much under 2ms write latency with Ceph, assuming a minimum
>> > of 2 copies in Ceph. This will limit you to around 500iops for a QD of 1.
>> > Because of this you will also experience slow file/VM copies, as ESXi moves
>> > the blocks of data around in 64kb sync IO's. 500x64kb = ~30MB/s.
>> >
>> > Moving to 10GB end to end may get you a reasonable boost in performance
>> > as you will be removing a 1ms or so of latency from the network for each
>> > write. Also search the mailing list for small performance tweaks you can do,
>> > like disabling logging.
>> >
>> > Other than that the only thing I have found that has chance of giving
>> > you performance similar to other products and/or legacy SAN's is to use some
>> > sort of RBD caching with something like flashcache/enhanceio/bcache o nyour
>> > proxy nodes. However this brings its on challenges and I still haven't got
>> > to a point that I'm happy to deploy it.
>> >
>> > I'm surprised you are also not seeing LIO hangs, which several people
>> > including me experience when using RBD+LIO+ESXi, although I haven't checked
>> > recently to see if this is now working better. I would be interesting in
>> > hearing your feedback on this. They normally manifest themselves when an OSD
>> > drops out and IO is suspended for more than 5-10s.
>> >
>> > Sorry I couldn't be of more help.
>> >
>> > Nick
>> >
>> >> -----Original Message-----
>> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> >> Of
>> >> Timofey Titovets
>> >> Sent: 07 November 2015 11:44
>> >> To: ceph-users@xxxxxxxxxxxxxx
>> >> Subject:  Ceph RBD LIO ESXi Advice?
>> >>
>> >> Hi List,
>> >> I Searching for advice from somebody, who use Legacy client like ESXi
>> >> with
>> >> Ceph
>> >>
>> >> I try to build High-performance fault-tolerant storage with Ceph 0.94
>> >>
>> >> In production i have 50+ TB of VMs (~800 VMs)
>> >> 8 NFS servers each:
>> >> 2xIntel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 12xSeagate ST2000NM0023
>> >> 1xLSI Nytro™ MegaRAID® NMR 8110-4i
>> >> 96 GB of RAM
>> >> 4x 1 GBE links in Balance-ALB mode (I don't have problem with network
>> >> throughput)
>> >>
>> >> Now in lab. i have build 3 node cluster like:
>> >> Kernel 4.2
>> >> Intel(R) Xeon(R) CPU 5130  @ 2.00GHz
>> >> 16 Gb of RAM
>> >> 6xSeagate ST2000NM0033
>> >> 2x 1GBE in Balance-ALB
>> >> i.e. each node is a MON and 6 OSDs
>> >>
>> >>
>> >> Config like:
>> >> osd journal size = 16384
>> >> osd pool default size = 2
>> >> osd pool default min size = 2
>> >> osd pool default pg num = 256
>> >> osd pool default pgp num = 256
>> >> osd crush chooseleaf type = 1
>> >> filestore max sync interval = 180
>> >>
>> >> For attach RBD Storage to ESXi i create a 2 VMs:
>> >> 2 cores
>> >> 2 GB RAM
>> >> Kernel 4.3
>> >> Each vm map big RBD volume and proxy it by LIO to ESXi ESXi see VMs
>> >> like
>> >> iSCSI Target server in Active/Passive mode
>> >>
>> >> RBD created with --image-shared and --image-format 2 keys
>> >>
>> >> My Questions:
>> >> 1. I have architecture problem?
>> >> 2. May be you have ideas?
>> >> 3. ESXi working with iSCSI storage very slow(30-60 Mb/s read/write),
>> >> but this
>> >> is can be a ESXi problem, later i will test this with more modern
>> >> Hypervisor
>> >> server 4. Proxy VMs working not too bad with storage, but fio shows too
>> >> low
>> >> numbers:
>> >> [global]
>> >> size=128g   # File size
>> >> filename=/storage/testfile.fio
>> >> numjobs=1   # One tread
>> >> runtime=600 # 10m for each test
>> >> ioengine=libaio # Use async io
>> >>         # Pseude random data, can be compressed by 15%
>> >> buffer_compress_percentage=15
>> >> overwrite=1 # Overwrite data in file
>> >> end_fsync=1 # Doing fsync, at the and of test, for sync OS buffers
>> >> direct=1    # Bypass OS cache
>> >> startdelay=30   # Pause between tests
>> >> bs=4k       # Block size for io requests
>> >> iodepth=64  # Count of IO request, what can be requested asynchronously
>> >> rw=randrw   # Random Read/Write
>> >> ####################################################
>> >> # IOMeter defines the server loads as the following:
>> >> # iodepth=1   # Linear
>> >> # iodepth=4   # Very Light
>> >> # iodepth=8   # Light
>> >> # iodepth=64  # Moderate
>> >> # iodepth=256 # Heavy
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-1]
>> >> rwmixread=50
>> >> iodepth=1
>> >> stonewall # Do each test separated
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-8]
>> >> rwmixread=50
>> >> iodepth=8
>> >> stonewall
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-64]
>> >> rwmixread=50
>> >> stonewall
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-256]
>> >> rwmixread=50
>> >> iodepth=256
>> >> stonewall
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-512]
>> >> rwmixread=50
>> >> iodepth=512
>> >> stonewall
>> >> ####################################################
>> >> [Disk-4k-randomrw-depth-1024]
>> >> rwmixread=50
>> >> iodepth=1024
>> >> stonewall
>> >> -- cut --
>> >>
>> >> RBD-LIO-PROXY:
>> >> -- cut --
>> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=10601:
>> >> Sat Nov  7 13:59:49 2015
>> >>   read : io=770772KB, bw=1282.1KB/s, iops=320, runt=600813msec
>> >>     clat (msec): min=141, max=8456, avg=715.87, stdev=748.55
>> >>   write: io=769400KB, bw=1280.7KB/s, iops=320, runt=600813msec
>> >>     clat (msec): min=158, max=9862, avg=878.73, stdev=905.47
>> >> -- cut --
>> >> One of node in Raid0:
>> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=4652: Fri
>> >> Oct
>> >> 30 16:29:00 2015
>> >>   read : io=258500KB, bw=2128.4KB/s, iops=532, runt=121455msec
>> >>     clat (msec): min=1, max=3983, avg=484.80, stdev=478.39
>> >>   write: io=257568KB, bw=2120.8KB/s, iops=530, runt=121455msec
>> >>     clat (usec): min=217, max=3976.1K, avg=478327.33, stdev=480695.05
>> >> -- cut --
>> >>
>> >> By me expirience with ScaleIO, must get on proxy node numbers like
>> >> ~1000
>> >> IOPs
>> >>
>> >> I can provide full  FIO config and logs if it needed, i just try to fix
>> >> perfomance
>> >> problem and search for advice
>> >>
>> >> 5. May be i must change my FIO config?
>> >> 6. May be i missing something?
>> >>
>> >> If someone have a expirience with similar solutions, story and links a
>> >> welcomed -.-
>> >>
>> >> --
>> >> Have a nice day,
>> >> Timofey.
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Have a nice day,
>> Timofey.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Have a nice day,
Timofey.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com