Robert Kent - 21DOCS Test Area

We have recently introduced a general system for building fast single-stage hardware N-sorters and N-filters, with N>=3. When N-sorter/N-filters designed with this system are constructed using a design logic block in the FPGA central to the Amazon AWS EC2 F1 instance, they were shown to be significantly faster than the prior state-of-the-art sorting devices, which are networks of 2-sorters and 2-filters. Here we show that N-sorter/N-filters are even faster when they are implemented using carry chain design logic blocks, featuring lookahead acceleration and dedicated routing, also found in the same FPGA product. A new 32-bit carry chain 7-sorter operates in 1.469 nS, a speedup of 1.23 versus our original state-of-the-art 7-sorter. Faster and larger N-sorters and much larger N-max/N-min filters are constructed when product term splitting and a new Sum-of-Products output multiplexer equation are added to the general design system, and then combined with carry chain logic. While we were previously limited to sorting 10 or fewer input values, we now fully sort 16 32-bit values in 2.024 nS, a speedup of 4.61 versus a comparable 2-sorter network. Our largest prior max pooling 32-bit N-max filter was a 9-max 3x3 2D image filter, but we have now produced a 125-max 5x5x5 3D video filter that operates in 2.075 nS, a speedup of 3.43 versus a 2-max network. Using 32-max single-stage filters, a 32-bit 1024-max 2-stage network operates in 3.557 nS, less than a 280 MHz clock’s period, and with a 2.85 speedup versus the equivalent 10-stage 2-max network.