Gunnar Carlstedt

and 1 more

Gunnar Carlstedt

and 1 more

Manycore processors may generally be implemented as an array of small processing elements (PE) interconnected by a communication mesh (NoC). This article describes a clock system for such chips, with many thousands of high frequency PEs. Each PE contains a low energy oscillator. It synchronizes with the four neighbors by an additional low voltage wire parallel to the communication links, which carries a sinusoidal signal. This wire is part of a resonant circuit that extends to all PE oscillators. Theoretically, in an infinite mesh the oscillators will all be phase locked, but in a limited mesh there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, the skew can be reduced to 1.2 ps. Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 μm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 μW per PE. This corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances. Scheduling the local oscillators gradually along one of the grid’s axes reduces the power noise. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.