> -----Original Message----- > From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] On Behalf Of Boaz > Harrosh > Sent: Wednesday, November 30, 2011 5:34 AM > To: Peng Tao > Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; bhalevy@xxxxxxxxxx > Subject: Re: [PATCH 0/4] nfs41: allow layoutget at pnfs_do_multiple_writes > > On 12/02/2011 08:52 PM, Peng Tao wrote: > > Issuing layoutget at .pg_init will drop the IO size information and ask for 4KB > > layout every time. However, the IO size information is very valuable for MDS to > > determine how much layout it should return to client. > > > > The patchset try to allow LD not to send layoutget at .pg_init but instead at > > pnfs_do_multiple_writes. So that real IO size is preserved and sent to MDS. > > > > Tests against a server that does not aggressively pre-allocate layout, shows > > that the IO size informantion is really useful to block layout MDS. > > > > The generic pnfs layer changes are trival to file layout and object as long as > > they still send layoutget at .pg_init. > > > > I have a better solution for your problem. Which is a much smaller a change and > I think gives you much better heuristics. > > Keep the layout_get exactly where it is, but instead of sending PAGE_SIZE send > the amount of dirty pages you have. > > If it is a linear write you will be exact on the money with a single lo_get. If > it is an heavy random write then you might need more lo_gets and you might be getting > some unused segments. But heavy random write is rare and slow anyway. As a first > approximation its fine. (We can later fix that as well) I would say no to the above... For objects/files MDS, it may not hurt much to allocate wasting layout. But for blocklayout server, each layout allocation consumes much more resource than just giving out stripping information like objects/files. So helping MDS to do the correct decision is the right thing for client to do. > > The .pg_init is done after .write_pages call from VFS and all the to-be-written > pages are already staged to be written. So there should be a way to easily extract > that information. > > > iozone cmd: > > ./iozone -r 1m -s 4G -w -W -c -t 10 -i 0 -F /mnt/iozone.data.1 /mnt/iozone.data.2 /mnt/iozone.data.3 > /mnt/iozone.data.4 /mnt/iozone.data.5 /mnt/iozone.data.6 /mnt/iozone.data.7 /mnt/iozone.data.8 > /mnt/iozone.data.9 /mnt/iozone.data.10 > > > > Befor patch: around 12MB/s throughput > > After patch: around 72MB/s throughput > > > > Yes Yes that stupid Brain dead Server is no indication for anything. The server > should know best about optimal sizes and layouts. Please don't give me that stuff > again. > Actually the server is already doing layout pre-allocation. It is just that it doesn't know what client really wants so cannot do it too aggressively. That's why I wanted to make client to send the REAL IO size information to server. From performance perspective, dropping IO size information is always a BAD THING(TM) to do. > But just do the above and you'll see that it is perfect. > > BTW don't limit the lo_segment size by the max_io_size. This is why you > have .bg_test to signal when IO is maxed out. > Actually lo_segment size is never limited by max_io_size. Server is always entitled to send larger layout than client asks from. > - The read segments should be as big as possible (i_size long) > - The Write segments should ideally be as big as the Application > wants to write to. (Amount of dirty pages at time of nfs-write-out > is a very good first approximation). > > So I guess it is: I hate these patches, to much mess, too little goodness. I'm afraid I can't agree with you... Thanks, Tao > > Thank > Boaz > > > Peng Tao (4): > > nfsv41: export pnfs_find_alloc_layout > > nfsv41: add and export pnfs_find_get_layout_locked > > nfsv41: get lseg before issue LD IO if pgio doesn't carry lseg > > pnfsblock: do ask for layout in pg_init > > > > fs/nfs/blocklayout/blocklayout.c | 54 ++++++++++++++++++++++++++- > > fs/nfs/pnfs.c | 74 +++++++++++++++++++++++++++++++++++++- > > fs/nfs/pnfs.h | 9 +++++ > > 3 files changed, 134 insertions(+), 3 deletions(-) > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥