Over on the Pluralsight blog, Joe Hummel talks about supercomputing in 2007 and some of the worrying problems the chip manufacturers are starting to encounter as we move to dual-, quad- (and above) cores.
In a nutshell, the chip manufacturers seem to have hit a brick wall in terms of CPU speed (levelling off at around 3GHz) and are therefore focusing on the number of cores on a chip. However, compiler optimisation has brought us to a point with the current chip technology where optimised code needs between 16Gb and 24Gb bandwidth to memory which simply doesn’t exist (even in high-end corporate servers). As a result, CPU’s spend a lot of time hanging-around waiting for data to come from RAM or cache; factor in dual, quad or the new range of eight-core processors and you’ve got one massive waste of CPU cycles waiting for data to come from memory (the cost of each memory level read is roughly a factor of 10, so CPU to L1 is 10 cycles, CPU to L2 is 100, CPU to L3 is 1,000, and CPU to RAM is 10,000 cycles).
Hummel argues that optimising compilers shouldn’t just look at reducing the number of cycles to accomplish the task, but should rather look at how best to use the multiple-core technology. One such trick might be to have one core reading data into the cache and a second performing compute functions, swapping roles once there is no more data. Initial calculation suggest that a performance increase of between 1.5x and 1.7x are possible using this method.
If you’re interested in hardware and all that stuff, well worth a read (hey, the Pluralsight blog is worth a read just for the great BizTalk content if nothing else!)