2.4 Deep learning architectures
A widely used deep learning architecture in bioinformatics is deep
convolutional neural networks (CNN). Convolutional neural networks have
some distinctive advantages over the traditional neural networks for the
bioinformatics problems in several ways: (1) it can learn informative
representation directly from sequence features without requiring
segmentation (e.g. sliding window) or dimension reduction (e.g.
principle component analysis) techniques; (2) the convolutional network
can learn both local and global features to discover complex patterns;
and (3) the architecture is independent of input size (i.e. length or
volume). In this work, we design a standard CNN and five advanced deep
learning architectures based on both convolutional and other useful
operations as in Figure 2.
Figure 2(A) illustrates our standard convolutional neural
network (CNN) for secondary structure prediction, consisting of a
sequence of convolutional blocks, each of which contains a convolutional
layer, a batch-normalization layer, and an activation layer. The
original input is a L × K vector (X ), where L is sequence length
and K is the number of features per residue position in the sequence.
For each convolution block, the feature maps are obtained after the
convolution operation is applied by multiplying the weight matrices
(called filters, W ) with a window of local features on the
previous input layer and adding bias vectors (b ) according to the
formula: \(X^{l+1}=\ W^{l+1}*X^{l}+b^{l+1}\) , where lis the layer number. The batch normalization layer is added to obtain a
Gaussian normalization of convolved features coming out of each
convolutional layer. Then an activation function such as rectified
linear function (i.e. ReLU) is applied to extract non-linear patterns of
the normalized hidden features. To avoid overfitting, regularization
approaches such as dropout 53 can be applied in the
hidden layers. The final output node (also a filter) in the output cell
uses the softmax function to classify the input of each residue position
from its previous layer into one of three secondary structure states.
The output is a L × 3 vector, holding the predicted probability of three
secondary structure states for each of L positions in a sequence.
The
final optimal CNN architecture includes 6 convolutional blocks, in which
the filter size (window size) for each convolutional layer is 6, and the
number of filters (feature maps) in each convolution layer is 40.
The residual network (ResNet) was designed to make traditional
convolutional neural network deeper without gradient vanishing. The
architecture constructs many residual blocks and stacked up them to form
a deeper network, as shown in Figure 2 (B) . In each residual
block, the input \(X^{l}\) is fed into a few convolutional layers to
obtain the non-linear transformation outputG( \(X^{l+1}\)) . In order to make the network deeper, an
extra skip connection (i.e. short-cut) is added to copy the input\(X^{l}\) to the output of non-linear transformation layer, where\(X^{\left(l+1\right)*}\)can be represented as \(X^{\left(l+1\right)*}\) = \(X^{l}+\)G( \(X^{l+1}\)) before applying another ReLU
non-linearity. This process makes neural network deeper by adding
shortcuts to facilitate gradient back-propagation during training and
achieve better performance.
The
residual blocks with different configuration can be stacked to achieve
higher accuracy. For instance, the final best architecture in DNSS2 is
made up of 13 residual blocks, each of which includes 3 convolutional
layers with filter size 1, 3, 1 respectively. The first three residual
blocks used 37 filters to learn features, while the middle four blocks
used 74 filters for each convolution layer, and the last six residual
blocks used 148 filters. In total, 39 convolutional layers are included
in the final residual network. In the network, the dropout and batch
normalization were also added to prevent network from overfitting.
Inception network is an advanced architecture for building deeper
networks by repeating a bunch of inception modules, as shown inFigure 2(c) . Instead of trying to determine the best values for
certain hyper-parameters (i.e. number of filter size, number of layers,
inclusion of pooling layer), inception network proposes to concatenate
outputs of hidden layers with different configuration through an
inception module and trains the network to learn patterns from the
combination of diverse hyper-parameters. Despite its high computation
cost, inception network has performed remarkably well in many
applications 38, 46. For secondary structure
prediction, a combination of three filter sizes \(1\times K\),\(3\times K\) and \(5\times K\) was applied to convolve feature input,
where K is the number of original input features for each residue
position. The concatenation of the convolution outputs is fed into an
activation layer for non-linear activation calculation. This kind of
inception module is repeated to make a deeper network. After the
parameter tuning, the optimal inception network is comprised of three
inception blocks with 24 convolution layers included.
In addition, we designed three more deep learning architectures:
recurrent convolutional neural network (RCNN) 42,
convolutional residual memory networks (CRMN) 44, and
fractal network for secondary structure prediction. The recurrent
convolutional neural network (RCNN) was designed to model sequential
dependency hidden inside the sequential features (Figure 2(D)) ,
It firstly extracts the higher-level feature maps by a convolution
block, and then uses a recurrent neural network (i.e. bi-directional
Long-Short-Term Memory (LSTM) network) for modeling the inter-dependence
among the convolved features. Such a recurrent convolutional block with
4 convolutional layers included is repeated 5 times to build a deep
recurrent convolutional neural network for secondary structure
prediction in this work. The CRMN network augmented the architectures by
integrating convolutional residual networks with LSTM (Figure
2(E)) (e.g., 2 residual blocks and 2 LSTM in the network). Both methods
advanced the convolutional neural network by introducing the memory
mechanisms of recurrent neural network (RNN). Moreover, inspired by
ResNet and Inception Network, we built a Fractal network stacking up
different number of convolution blocks in both parallel and hierarchical
fashion by adding several shortcut paths to connect lower-level layers
and higher-level layers, as shown in Figure 2(F) .
After
tuning, the fractal network was assembled with 16 convolution layers for
one fractal block.