GNN
A GNN is a connection model that captures the dependency between graph
nodes by connecting the edges of nodes13,14,15, which
can be roughly divided into graph convolutional
networks16,17 and graph attention
networks18,19. In 2019, Yao et al . proposed
text graph convolutional network (TextGCN)4, which
applies the GNN to the text classification task for the first time.
TextGCN first constructs a symmetric adjacency matrix based on the given
diagram, then fuses the representation of each node with its neighbors
through convolution operations, and finally sends the representation of
the node to the softmax layer for classification. However, TextGCN
assigns the same weight to each node, which is inconsistent with the
actual contribution of each node to the final classification. To solve
this problem, Petar et al . proposed graph attention network
(GAT)18, which uses masked self-attention methods to
assign different weights to each node according to the characteristics
of adjacent nodes. The problem of using graph convolutional network for
text classification is solved. Only the text information is considered
when constructing the graph for the pre-ordering work, but the
heterogeneous information such as text labels is ignored. In 2020, Xinet al . established a GNN based on label
fusion20. This method combines label information by
adding “text-tag-text” paths while constructing graphs, through which
supervisory information can be transmitted more directly between graphs.
Chang et al . designed a local aggregation
function21, which is a shareable non-linear operation
for aggregating local inputs with disordered arrangement and unequal
dimensions over non-Euclidean domains. It can fit non-linear functions
without activation functions and can be easily trained using standard
back propagation. In 2021, Wang et al . proposed a new short text
classification method based on GNN the better to utilize the interaction
between nodes of the same type and capture the similarity between short
texts22. This method first models the short text
dataset as a hierarchical heterogeneous graph, then dynamically learns a
short document graph to make label propagation between similar short
documents more effective.
With the emergence of large-scale pre-training models in recent years,
Devlin et al . proposed the pre-training model BERT (Bidirectional
Encoder Representations from Transformers) based on self-attention
mechanism6. BERT enhances a new representation of the
input data at each layer of the encoder, obtaining a text representation
with contextual information using multiple attention operations on
different parts. Liu et al . made improvements on this
basis7, cancelling the next sentence prediction task,
using more diverse data for training and achieving better results. Some
recent studies combined GCN and BERT. Jeong et al . proposed a
citation graph model for paper recommendation tasks23,
which combines the output of GCN and BERT to make the interaction
between local information and global information conducive to downstream
prediction tasks. Lu et al . established a BERT model based on
graph embedding24, which connects word embedding with
node representation, and makes local information and global information
interact through BERT, to determine the final text representation.