I would prefer not to put the cold blocks in separate sections, because that may require unnecessarily larger branch instructions. I'm worrying primarily about the L1 cache. For the L1 cache, most of the possible benefits are achieved by dropping all the cold blocks right after the final return of the function. There is no need to push them to a separate section. For virtual memory locality, pushing them to a separate section would be important.
When I used __builtin_expect to tell the compiler about a cold block, it did push that cold block to right after the final return of the function. So without changing any command line option it does know what to do with cold blocks. The question seems to be whether it recognizes a block is cold (based on that block calling a cold function). The option you suggest deals with what to do with a cold block not with how to recognize a cold block.
Brian Dessent wrote:
@item -freorder-blocks-and-partition @opindex freorder-blocks-and-partition In addition to reordering basic blocks in the compiled function, in order to reduce number of taken branches, partitions hot and cold basic blocks into separate sections of the assembly and .o files, to improve paging and cache locality performance.