On Wed, 2009-06-03 at 12:19 -0400, James Bottomley wrote: > On Wed, 2009-06-03 at 14:04 +0100, Catalin Marinas wrote: > > On Tue, 2009-06-02 at 15:22 +0000, James Bottomley wrote: > > > So what we're looking for is a proposal to discuss the issues > > > most affecting embedded architectures, or preview any features affecting > > > the main kernel which embedded architectures might need ... or any other > > > topics from embedded architectures which might need discussion or > > > debate. > > > > Some issues that come up on embedded systems (and not only): > > > > * Multiple coherency domains for devices - the system may have > > multiple bus levels, coherency ports, cache levels etc. Some > > devices in the system (but not all) may be able to "see" various > > cache levels but the DMA API (at least on ARM) cannot handle > > this. It may be useful to discuss how other embedded > > architectures handle this and come up with a unified solution > > So this is partially what the dma_sync_for_{device|cpu} is supposed to > be helping with. By and large, the DMA API tries to hide the > complexities of coherency domains from the user. The actual API, as far > as it goes, seems to do this OK. Yes, the dma_sync_* API is probably OK. The actual implementation should become aware of various coherency domains on the same system (it could hold this information in one of the bus-related structures). Currently, devices that can access the CPU (inner or outer) cache have drivers modified to avoid calling the dma_sync_* functions (since other devices need such functions). If other embedded architectures face similar issues, it is worth discussing and maybe come up with a common solution (of course, like most topics, they could simply be discussed on the mailing lists rather than at the KS). > > * Better support for coherent DMA mask - currently ZONE_DMA is > > assumed to be in the bottom part of the memory which isn't > > always the case. Enabling NUMA may help but it is overkill for > > some systems. As above, a more unified solution across > > architectures would help > > So ZONE_DMA and coherent memory allocation as represented by the > coherent mask are really totally separate things. The idea of ZONE_DMA > was really that if you had an ISA device, allocations from ZONE_DMA > would be able to access the allocated memory without bouncing. Since > ISA is really going away, this definition has been hijacked. If your > problem is just that you need memory allocated on a certain physical > mask and neither GFP_DMA or GFP_DMA32 cut it for you, then we could > revisit the kmalloc_mask() proposal again ... but the consensus last > time was that no-one really had a compelling use case that couldn't be > covered by GFP_DMA32. Russell already commented on this. As an example, I have a platform with two blocks of RAM - 512MB @ 0x20000000 and 512MB @ 0x70000000 - but only the higher one allows DMA. > > * PIO block devices and non-coherent hardware - code like mpage.c > > assumes that the either the hardware is coherent or the device > > driver performs the cache flushing. The latter is true for > > DMA-capable device but not for PIO. The issue becomes visible > > with write-allocate caches and the device driver may not have > > the struct page information to call flush_dcache_page(). A > > proposed solution on the ARM lists was to differentiate (via > > some flags) between PIO and DMA block devices and use this > > information in mpage.c > > flush_dcache_page() is supposed to be for making the data visible to the > user ... that coherency is supposed to be managed by the block layer. I'm referring to kernel<->user coherency issues and yes, flush_dcache_page() is the function supposed to handle this. It's only that it isn't always called in the block or VFS layers (for example, to be able to use ext2 over compact flash using pata I had to add a hack so that flush_dcache_page is called from mpage_end_io_read). Some devices like Russell's mmci.c use scatter lists and they have access to the page structure and perform the flushing. I noticed that for some block devices you can't easily retrieve the page structure (I would need to check the code for more precise references). But if the driver is somehow marked as PIO, the VFS layer could ensure the coherency. > > * Mixed endianness devices in the same system - this may only need > > dedicated readl_be/writel_be etc. macros but it could also be > > done by having bus-aware readl/writel-like macros > > We have ioreadXbe for this exact case (similar problem on parisc) OK, probably not worth a new topic. As it was mentioned on linux-embedded already, it may just need better documention (there is no reference to ioread* in Documentation/ and most devices seem to use readl/writel etc.). > > * Asymmetric MP: > > * Different CPU frequencies > > * Different CPU features (e.g. floating point only one > > some CPUs): scheduler awareness, per-CPU hwcap bits (in > > case user space wants to set the affinity) > > * Asymmetric workload balancing for power consumption (may > > be better to load 1 CPU at 60% than 4 at 15%) > > This actually just works(tm) for me on a voyager system running SMP with > a mixed 486/586 set of processors ... what's the problem? The only > issue I see is that you have to set the capabilities of the boot CPU to > the intersection of the mixture otherwise setup goes wrong, but > otherwise it seems to work OK. You can set the capabilities to the intersection of the CPU features but that's not the goal. We'll see multiprocessor systems with only one (out of 2, 4 etc.) of the CPUs having some features (like media processing instructions). That's the case on embedded where the number of gates is limited and the battery saving is important but you want to use the extra features and not limit them. So the code I currently have for such configuration is to trap the undefined instructions and set the CPU affinity to the faulty threads (the affinity could be reset after some time). Could it be done better? I think that's worth discussing. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-embedded" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html