Re: exofs/ore: allocation of _ore_get_io_state()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/24/2012 02:23 PM, Idan Kedar wrote:

> On Thu, May 24, 2012 at 12:00 AM, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
>>> Is there any point to check if the memory is greater than 32MB?
>>>
>>>
>>
>>
>> In theory it can allocate 32MB, in slab. I'm not sure about slob and slub.
>>
>> But in practice contiguous physical pages allocation tends to fail very
>> fast on a system that was up a couple of hours. So we avoid it as plage.
>>
>> Past testing with tables bigger than PAGE_SIZE on the IO path gave
>> catastrophic results. (Again once the system is up for a while and
>> had a chance to fragment physical address space)
> 
> What allocation sizes (of struct __alloc_all_io_state) are we talking
> about? how many devices per I/O did you encounter?
> 


Personally I had it with scsi-lib's sg_table bigger than PAGE_SIZE
allocation. (Because of a bug) It is currently MAXed at PAGE_SIZE.
Other people reported same failures and great performance degradation
when allocating BIOs and BIO_VECs larger then PAGE_SIZE. 

It's simply the old and known page-fragmentation problem. It's
why virtual memory was invented in the first place.
kmalloc is not a virtual allocator.

>>
>> The all Kernel point of the use of sg-lists is so not to allocate
>> contiguous physical pages and to not have to use virtual-memory.
>>
>> This is done all over the Kernel. MAX_BIO_SIZE max-sg-table ...
> 
> Why not use virtual memory? Is this limitation imposed by the OSD
> initiator or by some other layer in the OSD stack?
> 


Welcome to Linux Kernel 101. vmalloc is ten fold slower than
kmalloc. And in principal the same will happen, multiple discrete
pages will be allocated, and collected together but now you will need
to set up a TLB entries, and make sure they are mapped in when needed.
(Every interrupt every context switch)

This single fact of "Linux Kernel code does not use VM" is a
10 fold speed gain over Windows Kernel, measured.

>>
>> (BTW I saw this mail by chance. If you direct it to me I see it
>>  for sure)
>>
>> Cheers
>> Boaz
> 


Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux