Hi Ilya, We can change sector size from 512 to 4096. This can reduce the count of write. I did a simple test: for 900G, mkfs.xfs -f For default: 1m10s Physical sector size = 4096: 0m10s. But if change sector size, we need rbd meta record this. Thanks! Jianpeng > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of huang jun > Sent: Thursday, August 27, 2015 8:44 AM > To: Ilya Dryomov > Cc: Haomai Wang; ceph-devel > Subject: Re: format 2TB rbd device is too slow > > hi,llya > > 2015-08-26 23:56 GMT+08:00 Ilya Dryomov <idryomov@xxxxxxxxx>: > > On Wed, Aug 26, 2015 at 6:22 PM, Haomai Wang <haomaiwang@xxxxxxxxx> > wrote: > >> On Wed, Aug 26, 2015 at 11:16 PM, huang jun <hjwsm1989@xxxxxxxxx> > wrote: > >>> hi,all > >>> we create a 2TB rbd image, after map it to local, then we format it > >>> to xfs with 'mkfs.xfs /dev/rbd0', it spent 318 seconds to finish, > >>> but local physical disk with the same size just need 6 seconds. > >>> > >> > >> I think librbd have two PR related to this. > >> > >>> After debug, we found there are two steps in rbd module during formating: > >>> a) send 233093 DELETE requests to osds(number_of_requests = 2TB / > 4MB), > >>> this step spent almost 92 seconds. > >> > >> I guess this(https://github.com/ceph/ceph/pull/4221/files) may help > > > > It's submitting deletes for non-existent objects, not zeroing. The > > only thing that will really help here is the addition of rbd object > > map support to the kernel client. That could happen in 4.4, but 4.5 > > is a safer bet. > > > >> > >>> b) send 4238 messages like this: [set-alloc-hint object_size 4194304 > >>> write_size 4194304,write 0~512] to osds, that spent 227 seconds. > >> > >> I think kernel rbd also need to use > >> https://github.com/ceph/ceph/pull/4983/files > > > > set-alloc-hint may be a problem, but I think a bigger problem is the > > size of the write. Are all those writes 512 bytes long? > > > In another test to format 2TB rbd device, there are : > 2 messages,each write 131072 bytes > 4000 messages, each write 262144 bytes > 112 messages, each write 4096 bytes > 194 messages, each write 512 bytes > > the xfs info: > meta-data = /dev/rbd/rbd/test2t isize=256 agcount=33, > agsize=16382976 blks > = sectsz=512 > attr=2, > projid32bit=1 > = crc=0 > data = bsize=4096 > blocks=524288000, imaxpct=5 > = sunit=1024 > swidth=1024 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=256000, > version=2 > = sectsz=512 > sunit=8 > blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, > rtextents=0 > > > Thanks, > > > > Ilya > > > > -- > thanks > huangjun > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body > of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f