Shanshan Liu - 21DOCS Test Area

Shanshan Liu

Public Documents 5

On the Dependability of Bidirectional Encoder Representations from Transformers (BERT...

Zhen Gao

and 7 more

January 29, 2024

Transformers are widely used in natural language processing and computer vision, and Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pre-trained transformer models for many applications. This paper studies the dependability and impact of soft errors on BERT implemented with different floating-point formats using two case studies: sentence emotion classification and question & answering. Simulation by error injection is conducted to assess the impact of errors on different parts of the BERT model and different bits of the parameters. The analysis of the results leads to the following findings: 1) in both single and half precision, there is a Critical Bit (CB) on which errors significantly affect the performance of the model; 2) in single precision, errors on the CB may cause overflow in many cases, which leads to a fixed result regardless of the input; 3) in half precision, the errors do not cause overflow but they may still introduce a large accuracy loss. In general, the impact of errors is significantly larger in singleprecision than half-precision parameters. Error propagation analysis is also considered to further study the effects of errors on different types of parameters and reveal the mitigation effects of the activation function and the intrinsic redundancy of BERT.

Analysis, Design and Evaluation of a High-Performance Stochastic Multilayer Perceptro...

Ziheng Wang

and 4 more

October 05, 2022

The mini-batch technique is widely used in neural network training with conventional arithmetic for its efficiency; however, its feasibility and performance in SC MLPs have rarely been studied. This paper analyzes by theory and simulation the performance of the mini-batch technique in SC MLPs; the results show that it potentially has a larger benefit in MLPs using SC than the conventional version. Moreover, a pipelined SC MLP implementation is also pursued in this paper for performing the inference process. All findings and designs provided in this paper leverage the advantages of the mini-batch technique and SC implementation to design high-performance MLPs.

Fault Tolerant Triplet Networks for Training and Inference

Ziheng Wang

and 5 more

October 05, 2022

This paper deals with the fault tolerance of Triplet Networks (TNs). Results based on extensive analysis and simulation by fault injection are presented for new schemes. As in accordance with technical literature, stuck-at faults are considered in the fault model for the training process. Simulation by fault injection shows that the TNs are not sensitive to this type of fault in the general case; however, an unexcepted failure (leading to network convergence to false solutions) can occur when the faults are in the negative subnetwork. Analysis for this specific case is provided and remedial solutions are proposed (namely the use of the loss function with regularized anchor outputs for stuck-at 0 faults and a modified margin for stuck-at 1/-1 faults). Simulation proves that false solutions can be very efficiently avoided by utilizing the proposed techniques. Random bit-flip faults are then considered in the fault model for the inference process. This paper analyzes the error caused by bit-flips on different bit positions in a TN with Floating-Point (FP) format and compares it with a fault- tolerant Stochastic Computing (SC) implementation. Analysis and simulation of the TNs confirm that the main degradation is caused by bit-flips on the exponent bits. Therefore, protection schemes are proposed to handle those errors; they replace least significant bits of the FP numbers with parity bits for both single- and multi-bit errors. The proposed methods achieve superior performance compared to other low-cost fault tolerant schemes found in the technical literature by reducing the classification accuracy loss of TNs by 96.76% (97.74%) for single-bit (multi-bit errors).

A Technique for Approximate Communication in Network-on-Chips for Image Classificatio...

Yuechen Chen

and 3 more

August 31, 2021

Approximation is an effective technique for reducing power consumption and latency of on-chip communication in many computing applications. However, existing approximation techniques either achieve modest improvements in these metrics or require retraining after approximation, such when convolutional neural networks (CNNs) are employed. Since classifying many images introduces intensive on-chip communication, reductions in both network latency and power consumption are highly desired. In this paper, we propose an approximate communication technique (ACT) to improve the efficiency of on-chip communications for image classification applications. The proposed technique exploits the error-tolerance of the image classification process to reduce power consumption and latency of on-chip communications, resulting in better overall performance for image classification computation. This is achieved by incorporating novel quality control and data approximation mechanisms that reduce the packet size. In particular, the proposed quality control mechanisms identify the error-resilient variables and automatically adjust the error thresholds of the variables based on the image classification accuracy. The proposed data approximation mechanisms significantly reduce packet size when the variables are transmitted. The proposed technique reduces the number of flits in each data packet as well as the on-chip communication, while maintaining an excellent image classification accuracy. The cycle-accurate simulation results show that ACT achieves 23% in network latency reduction and 24% in dynamic power reduction compared to the existing approximate communication technique with less than 0.99% classification accuracy loss.

Stochastic Dividers for Low Latency Neural Networks

Shanshan Liu

and 5 more

March 20, 2021

Stochastic computing (SC) is attractive for hardware implementation due to its low complexity in arithmetic unit design; therefore, SC has attracted considerable interest to implement Artificial Neural Networks (ANNs) for resources-limited applications, because ANNs must usually perform a large number of arithmetic operations. To attain a high computation accuracy in an SC-based ANN, extended stochastic logic is utilized together with standard SC units and thus, a stochastic divider is required to perform the conversion between these logic representations. However, as the most complex SC arithmetic unit, the conventional divider incurs in a large computation latency; this limits an SC implementation for ANNs used in applications needing high performance. Therefore, there is a need to design fast stochastic dividers for SC-based ANNs. Recent works (e.g., a binary searching and triple modular redundancy (BS-TMR) based stochastic divider) are targeting a reduction in computation latency, while keeping nearly the same accuracy compared with the traditional (conventional) design. However, this divider still requires N iterations to deal with 2N-bit stochastic sequences, and thus the latency increases in proportion to the sequence length. In this paper, a decimal searching and TMR (DS-TMR) based stochastic divider is initially proposed to further reduce the computation latency; it only requires two iterations to calculate the quotient, so regardless of the sequence length. Moreover, a second trade-off design between accuracy and hardware is also presented. An SC-based Multi-Layer Perceptron (MLP) is then considered to show the effectiveness of the proposed dividers; results show that when utilizing the proposed dividers, MLP achieves the lowest computation latency while keeping the classification results at the same accuracy. When using as combined metric the product of the latency and power dissipation, the proposed designs are also shown to be superior to the SC-based MLPs employing other dividers found in the technical literature as well as the commonly used 32-bit floating point implementation. This makes the proposed dividers very attractive compared with the existing schemes for SC-based ANNs.