Alex, are you use ESXi? If yes, you use iSCSI Software adapter? If yes, you use active/passive, fixed, RoundRobin MPIO? Do you tune something on Initiator side? If possible can you give more details? Please 2015-11-09 17:41 GMT+03:00 Timofey Titovets <nefelim4ag@xxxxxxxxx>: > Great thanks, Alex, you give me a hope, i'll try SCST later in > configuration what you suggest > > 2015-11-09 16:25 GMT+03:00 Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>: >> Hi Timofey, >> >> With Nick's, Jan's, RedHat's and others' help we have a stable and, in my >> best judgement, well performing system using SCST as the iSCSI delivery >> framework. SCST allows the use of Linux page cache when utilizing the >> vdisk_fileio backend. LIO should be able to do this to using FILEIO >> backstore and the block device name as file name, but I have not tried that >> due to having switched to SCST for stability. >> >> The page cache will improve latency due to the reads and writes first >> occurring in RAM. Naturally, all the usual considerations apply as to the >> loss of dirty pages on machine crash. So tuning the vm.dirty* parameters is >> quite important. >> >> This setting was critically important to avoid hangs and major issues due to >> some problem with XFS and page cache on OSD nodes: >> >> sysctl vm.min_free_kbytes=1048576 >> >> (reserved memory when using vm.swappiness = 1) >> >> 10 GbE networking seems to be helping a lot, it could be just the superior >> switch response on a higher end switch. >> >> Using blk_mq scheduler, it's been reported to improve performance on random >> IO. >> >> Good luck! >> >> -- >> Alex Gorbachev >> Storcium >> >> On Sun, Nov 8, 2015 at 5:07 PM, Timofey Titovets <nefelim4ag@xxxxxxxxx> >> wrote: >>> >>> Big thanks Nick, any way >>> Now i catch hangs of ESXi and Proxy =_='' >>> /* Proxy VM: Ubuntu 15.10/Kernel 4.3/LIO/Ceph 0.94/ESXi 6.0 Software >>> iSCSI*/ >>> I've moved to NFS-RBD proxy and now try to make it HA >>> >>> 2015-11-07 18:59 GMT+03:00 Nick Fisk <nick@xxxxxxxxxx>: >>> > Hi Timofey, >>> > >>> > You are most likely experiencing the effects of Ceph's write latency in >>> > combination with the sync write behaviour of ESXi. You will probably >>> > struggle to get much under 2ms write latency with Ceph, assuming a minimum >>> > of 2 copies in Ceph. This will limit you to around 500iops for a QD of 1. >>> > Because of this you will also experience slow file/VM copies, as ESXi moves >>> > the blocks of data around in 64kb sync IO's. 500x64kb = ~30MB/s. >>> > >>> > Moving to 10GB end to end may get you a reasonable boost in performance >>> > as you will be removing a 1ms or so of latency from the network for each >>> > write. Also search the mailing list for small performance tweaks you can do, >>> > like disabling logging. >>> > >>> > Other than that the only thing I have found that has chance of giving >>> > you performance similar to other products and/or legacy SAN's is to use some >>> > sort of RBD caching with something like flashcache/enhanceio/bcache o nyour >>> > proxy nodes. However this brings its on challenges and I still haven't got >>> > to a point that I'm happy to deploy it. >>> > >>> > I'm surprised you are also not seeing LIO hangs, which several people >>> > including me experience when using RBD+LIO+ESXi, although I haven't checked >>> > recently to see if this is now working better. I would be interesting in >>> > hearing your feedback on this. They normally manifest themselves when an OSD >>> > drops out and IO is suspended for more than 5-10s. >>> > >>> > Sorry I couldn't be of more help. >>> > >>> > Nick >>> > >>> >> -----Original Message----- >>> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >>> >> Of >>> >> Timofey Titovets >>> >> Sent: 07 November 2015 11:44 >>> >> To: ceph-users@xxxxxxxxxxxxxx >>> >> Subject: Ceph RBD LIO ESXi Advice? >>> >> >>> >> Hi List, >>> >> I Searching for advice from somebody, who use Legacy client like ESXi >>> >> with >>> >> Ceph >>> >> >>> >> I try to build High-performance fault-tolerant storage with Ceph 0.94 >>> >> >>> >> In production i have 50+ TB of VMs (~800 VMs) >>> >> 8 NFS servers each: >>> >> 2xIntel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 12xSeagate ST2000NM0023 >>> >> 1xLSI Nytro™ MegaRAID® NMR 8110-4i >>> >> 96 GB of RAM >>> >> 4x 1 GBE links in Balance-ALB mode (I don't have problem with network >>> >> throughput) >>> >> >>> >> Now in lab. i have build 3 node cluster like: >>> >> Kernel 4.2 >>> >> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz >>> >> 16 Gb of RAM >>> >> 6xSeagate ST2000NM0033 >>> >> 2x 1GBE in Balance-ALB >>> >> i.e. each node is a MON and 6 OSDs >>> >> >>> >> >>> >> Config like: >>> >> osd journal size = 16384 >>> >> osd pool default size = 2 >>> >> osd pool default min size = 2 >>> >> osd pool default pg num = 256 >>> >> osd pool default pgp num = 256 >>> >> osd crush chooseleaf type = 1 >>> >> filestore max sync interval = 180 >>> >> >>> >> For attach RBD Storage to ESXi i create a 2 VMs: >>> >> 2 cores >>> >> 2 GB RAM >>> >> Kernel 4.3 >>> >> Each vm map big RBD volume and proxy it by LIO to ESXi ESXi see VMs >>> >> like >>> >> iSCSI Target server in Active/Passive mode >>> >> >>> >> RBD created with --image-shared and --image-format 2 keys >>> >> >>> >> My Questions: >>> >> 1. I have architecture problem? >>> >> 2. May be you have ideas? >>> >> 3. ESXi working with iSCSI storage very slow(30-60 Mb/s read/write), >>> >> but this >>> >> is can be a ESXi problem, later i will test this with more modern >>> >> Hypervisor >>> >> server 4. Proxy VMs working not too bad with storage, but fio shows too >>> >> low >>> >> numbers: >>> >> [global] >>> >> size=128g # File size >>> >> filename=/storage/testfile.fio >>> >> numjobs=1 # One tread >>> >> runtime=600 # 10m for each test >>> >> ioengine=libaio # Use async io >>> >> # Pseude random data, can be compressed by 15% >>> >> buffer_compress_percentage=15 >>> >> overwrite=1 # Overwrite data in file >>> >> end_fsync=1 # Doing fsync, at the and of test, for sync OS buffers >>> >> direct=1 # Bypass OS cache >>> >> startdelay=30 # Pause between tests >>> >> bs=4k # Block size for io requests >>> >> iodepth=64 # Count of IO request, what can be requested asynchronously >>> >> rw=randrw # Random Read/Write >>> >> #################################################### >>> >> # IOMeter defines the server loads as the following: >>> >> # iodepth=1 # Linear >>> >> # iodepth=4 # Very Light >>> >> # iodepth=8 # Light >>> >> # iodepth=64 # Moderate >>> >> # iodepth=256 # Heavy >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-1] >>> >> rwmixread=50 >>> >> iodepth=1 >>> >> stonewall # Do each test separated >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-8] >>> >> rwmixread=50 >>> >> iodepth=8 >>> >> stonewall >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-64] >>> >> rwmixread=50 >>> >> stonewall >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-256] >>> >> rwmixread=50 >>> >> iodepth=256 >>> >> stonewall >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-512] >>> >> rwmixread=50 >>> >> iodepth=512 >>> >> stonewall >>> >> #################################################### >>> >> [Disk-4k-randomrw-depth-1024] >>> >> rwmixread=50 >>> >> iodepth=1024 >>> >> stonewall >>> >> -- cut -- >>> >> >>> >> RBD-LIO-PROXY: >>> >> -- cut -- >>> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=10601: >>> >> Sat Nov 7 13:59:49 2015 >>> >> read : io=770772KB, bw=1282.1KB/s, iops=320, runt=600813msec >>> >> clat (msec): min=141, max=8456, avg=715.87, stdev=748.55 >>> >> write: io=769400KB, bw=1280.7KB/s, iops=320, runt=600813msec >>> >> clat (msec): min=158, max=9862, avg=878.73, stdev=905.47 >>> >> -- cut -- >>> >> One of node in Raid0: >>> >> Disk-4k-randomrw-depth-512: (groupid=4, jobs=1): err= 0: pid=4652: Fri >>> >> Oct >>> >> 30 16:29:00 2015 >>> >> read : io=258500KB, bw=2128.4KB/s, iops=532, runt=121455msec >>> >> clat (msec): min=1, max=3983, avg=484.80, stdev=478.39 >>> >> write: io=257568KB, bw=2120.8KB/s, iops=530, runt=121455msec >>> >> clat (usec): min=217, max=3976.1K, avg=478327.33, stdev=480695.05 >>> >> -- cut -- >>> >> >>> >> By me expirience with ScaleIO, must get on proxy node numbers like >>> >> ~1000 >>> >> IOPs >>> >> >>> >> I can provide full FIO config and logs if it needed, i just try to fix >>> >> perfomance >>> >> problem and search for advice >>> >> >>> >> 5. May be i must change my FIO config? >>> >> 6. May be i missing something? >>> >> >>> >> If someone have a expirience with similar solutions, story and links a >>> >> welcomed -.- >>> >> >>> >> -- >>> >> Have a nice day, >>> >> Timofey. >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@xxxxxxxxxxxxxx >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Have a nice day, >>> Timofey. >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > > -- > Have a nice day, > Timofey. -- Have a nice day, Timofey. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com