Great thanks, Alex, you give me a hope, i'll try SCST later in configuration what you suggest 2015-11-09 16:25 GMT+03:00 Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>: > Hi Timofey, > > With Nick's, Jan's, RedHat's and others' help we have a stable and, in my > best judgement, well performing system using SCST as the iSCSI delivery > framework. SCST allows the use of Linux page cache when utilizing the > vdisk_fileio backend. LIO should be able to do this to using FILEIO > backstore and the block device name as file name, but I have not tried that > due to having switched to SCST for stability. > > The page cache will improve latency due to the reads and writes first > occurring in RAM. Naturally, all the usual considerations apply as to the > loss of dirty pages on machine crash. So tuning the vm.dirty* parameters is > quite important. > > This setting was critically important to avoid hangs and major issues due to > some problem with XFS and page cache on OSD nodes: > > sysctl vm.min_free_kbytes=1048576 > > (reserved memory when using vm.swappiness = 1) > > 10 GbE networking seems to be helping a lot, it could be just the superior > switch response on a higher end switch. > > Using blk_mq scheduler, it's been reported to improve performance on random > IO. > > Good luck! > > -- > Alex Gorbachev > Storcium > > On Sun, Nov 8, 2015 at 5:07 PM, Timofey Titovets <nefelim4ag@xxxxxxxxx> > wrote: >> >> Big thanks Nick, any way >> Now i catch hangs of ESXi and Proxy =_='' >> /* Proxy VM: Ubuntu 15.10/Kernel 4.3/LIO/Ceph 0.94/ESXi 6.0 Software >> iSCSI*/ >> I've moved to NFS-RBD proxy and now try to make it HA >> >> 2015-11-07 18:59 GMT+03:00 Nick Fisk <nick@xxxxxxxxxx>: >> > Hi Timofey, >> > >> > You are most likely experiencing the effects of Ceph's write latency in >> > combination with the sync write behaviour of ESXi. You will probably >> > struggle to get much under 2ms write latency with Ceph, assuming a minimum >> > of 2 copies in Ceph. This will limit you to around 500iops for a QD of 1. >> > Because of this you will also experience slow file/VM copies, as ESXi moves >> > the blocks of data around in 64kb sync IO's. 500x64kb = ~30MB/s. >> > >> > Moving to 10GB end to end may get you a reasonable boost in performance >> > as you will be removing a 1ms or so of latency from the network for each >> > write. Also search the mailing list for small performance tweaks you can do, >> > like disabling logging. >> > >> > Other than that the only thing I have found that has chance of giving >> > you performance similar to other products and/or legacy SAN's is to use some >> > sort of RBD caching with something like flashcache/enhanceio/bcache o nyour >> > proxy nodes. However this brings its on challenges and I still haven't got >> > to a point that I'm happy to deploy it. >> > >> > I'm surprised you are also not seeing LIO hangs, which several people >> > including me experience when using RBD+LIO+ESXi, although I haven't checked >> > recently to see if this is now working better. I would be interesting in >> > hearing your feedback on this. They normally manifest themselves when an OSD >> > drops out and IO is suspended for more than 5-10s. >> > >> > Sorry I couldn't be of more help. >> > >> > Nick >> > >> >> -----Original Message----- >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >> >> Of >> >> Timofey Titovets >> >> Sent: 07 November 2015 11:44 >> >> To: ceph-users@xxxxxxxxxxxxxx >> >> Subject: Ceph RBD LIO ESXi Advice? >> >> >> >> Hi List, >> >> I Searching for advice from somebody, who use Legacy client like ESXi >> >> with >> >> Ceph >> >> >> >> I try to build High-performance fault-tolerant storage with Ceph 0.94 >> >> >> >> In production i have 50+ TB of VMs (~800 VMs) >> >> 8 NFS servers each: >> >> 2xIntel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 12xSeagate ST2000NM0023 >> >> 1xLSI Nytro™ MegaRAID® NMR 8110-4i >> >> 96 GB of RAM >> >> 4x 1 GBE links in Balance-ALB mode (I don't have problem with network >> >> throughput) >> >> >> >> Now in lab. i have build 3 node cluster like: >> >> Kernel 4.2 >> >> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz >> >> 16 Gb of RAM >> >> 6xSeagate ST2000NM0033 >> >> 2x 1GBE in Balance-ALB >> >> i.e. each node is a MON and 6 OSDs >> >> >> >> >> >> Config like: >> >> osd journal size = 16384 >> >> osd pool default size = 2 >> >> osd pool default min size = 2 >> >> osd pool default pg num = 256 >> >> osd pool default pgp num = 256 >> >> osd crush chooseleaf type = 1 >> >> filestore max sync interval = 180 >> >> >> >> For attach RBD Storage to ESXi i create a 2 VMs: >> >> 2 cores >> >> 2 GB RAM >> >> Kernel 4.3 >> >> Each vm map big RBD volume and proxy it by LIO to ESXi ESXi see VMs >> >> like >> >> iSCSI Target server in Active/Passive mode >> >> >> >> RBD created with --image-shared and --image-format 2 keys >> >> >> >> My Questions: >> >> 1. I have architecture problem? >> >> 2. May be you have ideas? >> >> 3. ESXi working with iSCSI storage very slow(30-60 Mb/s read/write), >> >> but this >> >> is can be a ESXi problem, later i will test this with more modern >> >> Hypervisor >> >> server 4. Proxy VMs working not too bad with storage, but fio shows too >> >> low >> >> numbers: >> >> [global] >> >> size=128g # File size >> >> filename=/storage/testfile.fio >> >> numjobs=1 # One tread >> >> runtime=600 # 10m for each test >> >> ioengine=libaio # Use async io >> >> # Pseude random data, can be compressed by 15% >> >> buffer_compress_percentage=15 >> >> overwrite=1 # Overwrite data in file >> >> end_fsync=1 # Doing fsync, at the and of test, for sync OS buffers >> >> direct=1 # Bypass OS cache >> >> startdelay=30 # Pause between tests >> >> bs=4k # Block size for io requests >> >> iodepth=64 # Count of IO request, what can be requested asynchronously >> >> rw=randrw # Random Read/Write >> >> #################################################### >> >> # IOMeter defines the server loads as the following: >> >> # iodepth=1 # Linear >> >> # iodepth=4 # Very Light >> >> # iodepth=8 # Light >> >> # iodepth=64 # Moderate >> >> # iodepth=256 # Heavy >> >> #################################################### >> >> [Disk-4k-randomrw-depth-1] >> >> rwmixread=50 >> >> iodepth=1 >> >> stonewall # Do each test separated >> >> #################################################### >> >> [Disk-4k-randomrw-depth-8] >> >> rwmixread=50 >> >> iodepth=8 >> >> stonewall >> >> #################################################### >> >> [Disk-4k-randomrw-depth-64] >> >> rwmixread=50 >> >> stonewall >> >> #################################################### >> >> [Disk-4k-randomrw-depth-256] >> >> rwmixread=50 >> >> iodepth=256 >> >> stonewall >> >> #################################################### >> >> [Disk-4k-randomrw-depth-512] >> >> rwmixread=50 >> >> iodepth=512 >> >> stonewall >> >> #################################################### >> >> [Disk-4k-randomrw-depth-1024] >> >> rwmixread=50 >> >> iodepth=1024 >> >> stonewall >> >> -- cut -- >> >> >> >> RBD-LIO-PROXY: >> >> -- cut -- >> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=10601: >> >> Sat Nov 7 13:59:49 2015 >> >> read : io=770772KB, bw=1282.1KB/s, iops=320, runt=600813msec >> >> clat (msec): min=141, max=8456, avg=715.87, stdev=748.55 >> >> write: io=769400KB, bw=1280.7KB/s, iops=320, runt=600813msec >> >> clat (msec): min=158, max=9862, avg=878.73, stdev=905.47 >> >> -- cut -- >> >> One of node in Raid0: >> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=4652: Fri >> >> Oct >> >> 30 16:29:00 2015 >> >> read : io=258500KB, bw=2128.4KB/s, iops=532, runt=121455msec >> >> clat (msec): min=1, max=3983, avg=484.80, stdev=478.39 >> >> write: io=257568KB, bw=2120.8KB/s, iops=530, runt=121455msec >> >> clat (usec): min=217, max=3976.1K, avg=478327.33, stdev=480695.05 >> >> -- cut -- >> >> >> >> By me expirience with ScaleIO, must get on proxy node numbers like >> >> ~1000 >> >> IOPs >> >> >> >> I can provide full FIO config and logs if it needed, i just try to fix >> >> perfomance >> >> problem and search for advice >> >> >> >> 5. May be i must change my FIO config? >> >> 6. May be i missing something? >> >> >> >> If someone have a expirience with similar solutions, story and links a >> >> welcomed -.- >> >> >> >> -- >> >> Have a nice day, >> >> Timofey. >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > >> > >> >> >> >> -- >> Have a nice day, >> Timofey. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Have a nice day, Timofey. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com