Mehdi Safarpour - 21DOCS Test Area

Mehdi Safarpour

Public Documents 5

Algorithm Level Error Detection in Low Voltage Systolic Array

Mehdi Safarpour

and 2 more

November 08, 2021

An energy efficient architecture for TPUs that is based on reduced voltage operation. The errors are captured and corrected by utilizing ABFT and hence aggressive voltage scaling is made possible.

A High-Level Approach for Energy Efficiency Improvement of FPGAs by Voltage Trimming

Mehdi Safarpour

and 3 more

July 14, 2021

This paper proposes a solution that makes voltage scaling possible by simply using HLS tools provided by vendor to improve energy efficiency of FPGAs by 2x. Chip manufacturers define voltage margins on top of the “best-case” operational voltage of their chips to ensure reliable worst case functionality. The margins guarantee correct-ness of operation, but at the cost of performance and power efficiency. Violating the margins is tempting to save energy, but might lead to timing errors. This paper proposes an algorithmic solution that enables reliable removal of the margins by detecting errors on-the-fly. In contrast to previous approaches that require special hardware to detect timing errors, the proposed method is fully implementable using High Level Synthesis tools without reliance on additional hardware. The approach is demonstrated using a32×32matrix-matrix multiplication and a simple multi-layer neural network implemented on two Xilinx ZC702Field-Programmable-Gate-Array (FPGA) System-on-Chip (SoC)platforms, showcasing its utility in detecting errors that may originate from different sources of logic circuity, clock tree and memory. Results show that the energy dissipation is halved, while the implementation is clocked at 2.5x faster than specified by the design tool of the vendor.

LoFFT: Low-voltage FFT Using Lightweight Fault Detection for Energy Efficiency

Mehdi Safarpour

and 1 more

July 19, 2022

A simple method for enabling low-voltage energy efficiect operation such that is provided by near-threshold and sub-threshold votlage regions. This method supports Fast Fourier Transform to operate at very low voltage and detects if any computational errors occur. When an error is detected, the voltage is adjusted so that errors are removed. In this manner, always the lowest votlage and highest freqeuncy of operation is achieved. In this paper, on an exprimental platform we demonstrated the utility of the approach. Other platforms will have more or less same results.

Low-Voltage Energy Efficient Neural Inference by Leveraging Fault Detection Technique...

Mehdi Safarpour

and 5 more

November 18, 2021

This paper presents simple techniques to significantly reduced energy consumption of DNNs: Operating at reduced voltages offers substantial energy efficiency improvement but at the expense of increasing the probability of computational errors due to hardware faults. In this context, we targeted Deep Neural Networks (DNN) as emerging energy hungry building blocks in embedded applications. Without an error feedback mechanism, blind voltage down-scaling will result in degraded accuracy or total system failure. To enable safe voltage down-scaling, in this paper two solutions based on Self-Supervised Learning (SSL) and Algorithm Based Fault Tolerance (ABFT) were developed. A DNN model trained on MNIST data-set was deployed on a Field Programmable Gate Array (FPGA) that operated at reduced voltages and employed the proposed schemes. The SSL approach provides extremely low-overhead (≈0.2%) fault detection at the cost of lower error coverage and extra training, while ABFT incurs less than 8%overheads at run-time with close to 100% error detection rate. By using the solutions, substantial energy savings, i.e., up to 40.3%,without compromising the accuracy of the model was achieved

Transport Triggered Array Processor for Vision Applications: Near-threshold Performan...

Mehdi Safarpour

November 03, 2021

Operating at reduced voltages promises substantial energy efficiency improvement, however the downside is significant down-scaling of clock frequency. This paper propose vision chips as excellent fit for low-voltage operation. Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, near-threshold/sub-threshold operational points or ultra-low-leakage processes in fabrication are employed. Those limit the clocking rates significantly, reducing the computing throughputs of individual processing cores. In this contribution we explore compensating for the performance loss of operating in near-threshold region ($V_{dd}=$0.6V) through massive parallelization. Benefits of near-threshold operation and massive parallelism are optimum energy consumption per instruction operation and minimized memory round-trips, respectively. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations. Our results demonstrates that the inherent properties of chip processor and vision applications allow voltage and clock frequency aggressively without having to compromise performance.