2. Materials and Methods
2.1. Data set and preprocessing. The image data of urinary
particles were obtained from the urine samples of 384 patients at the
Shenzhen Sixth People’s Hospital of Guangdong Province. Before the
picture was collected, informed consent was obtained from each patient.
The study was approved by the Ethics Review Committee of Shenzhen
University. The method of database establishment was simple. An
appropriate amount of urine sample was taken into the U-shaped area of
the urine smear device (Figure 1a). Then, the U-shaped area with the
urine sample was placed into the microscopic imaging system (Figure 1b)
for data acquisition. When the magnification of the microscope was too
low, the size of the cells in the image was very small, and thus, not
conducive to network training and object detection. When the
magnification of the microscope was too high, the number of cells in a
single image was small and was, thus, again not conducive to the
establishment of database. Therefore, we choose a 40× objective lens for
data collection. The acquired images (Figure 1c) had a resolution of
1536 × 1024, which are all RGB three-channel color images, in which each
sample randomly acquires 20-cell morphology images under a 40-fold
objective lens.
We invited three clinically experienced experts to label the cells in
the morphological images using the commonly used labeling software
LabeImg for deep learning, as shown in Figure 1d. Fifteen different cell
types of image data were obtained. We randomly divided the data into
training and test sets. The ratio of the number of images in the
training and test sets was 7:3. To enhance the robustness of the network
model and make it highly generalizable for different test images, we had
to enhance the image data. The data enhancement methods used were
geometric transformation, adding noise, and changing contrast and
brightness. After data enhancement, we could train the data on the
network.
2.2. Network model. The RetinaNet network22includes one backbone network and two sub-networks. The structure
diagram of the network model is shown in Figure 2. The backbone network
comprises the resnet and FPN modules24 that are
responsible for feature extraction of the cells and generate many
different sizes of the feature maps. On the other hand, the sub-network
comprises a classification sub-network and a regression sub-network for
the classification and location of objects, respectively.
The FPN network is the core module of the network, and its structure is
shown in Figure 3. It mainly comprises two processes: top-up and
top-down. In the top-up process, as the network deepens, the spatial
dimension is gradually halved, whereas in the top-down path, the
corresponding convolutional module layer is output through a 1 × 1
convolution filter and is then added to the upper-level up-sampling (top
level exception). Finally, the feature map of each layer is obtained by
the convolution of 3 × 3. The FPN network combines multiple layers of
feature information to enable the network to better handle small objects
like cells.
In the classification sub-network, for each layer of the feature pyramid
output in the FPN, a 4-layer 3 × 3 convolution is used, followed by a
ReLU activation function, which is then input into a 1-layer 3 × 3
convolution, and the number of convolution kernels is KA(K represents
the number of categories. In the experiment, we take K=15, A=9).
Finally, the sigmoid activation function is used for category output.
The regression sub-network has the same structure as the classification
sub-network; however, each uses different parameters.
It is noteworthy that to solve the problem of imbalance between the
background class (no object) and the foreground class (including the
object) in the one-stage target detection algorithm, the network
introduces an optimized loss function, as shown in Equation 1. This loss
function adjusts the weight of the easily categorized sample (background
class), thus improving the detection speed of the model.
\(\text{FL}\left(\text{Pt}\right)={-\alpha_{t}\left(1-\text{Pt}\right)}^{\gamma}\log(Pt)\)(1)
Where Pt represents the probability that the category is predicted, αt
is a weighting factor between 0 and 1, and γ is a modulation factor that
can control the weighting rate of the easily categorized samples. In the
experiment, αt was 0.25 and γ was 2.0.
2.3. Model training. After network construction was completed, we
normalized the image data of the training set containing 15 types of
urine cells and then input them into the network model for training in
batches. In the initialization of the network parameters, we use the
Gaussian weight initialization method with a standard deviation of 0.01
and bias of 0. In the training parameter setting, we set the momentum to
0.9, weight attenuation to 0.0005, learning rate to 1e−4, and input
eight images each time for network training. The model optimization
method used was Adam25. In the experiment, we used the
Keras deep learning model framework to perform network training on a
64-bit Ubuntu 16.04.5 system. The computer had the following
configurations: Nvidia 1080 GPU, i7-6600 CPU, and 16G memory deep
learning server. We used the training set to iterate the entire model,
and after one epoch, we tested the model parameters using the validation
set and then saved the best model parameters.
2.4. Model evaluation method. We used mAP, which is commonly used
in deep learning, to evaluate the performance of the model. In addition,
the time taken by the computer to process a single image was also
considered to be an indicator of the evaluation. For a certain type of
cells in an image (for example, crystals, replaced by the letter C
below), the model could correctly detect the number of C as x and the
total number of C as y; the accuracy of category C in this image can be
expressed as P, and the average accuracy of n pictures as AP. Using this
method, we could calculate the accuracy rates of the 15 types of urine
cells (AP1, AP2, … AP15). mAP is the average of the accuracy
rates of all 15 types. It is calculated as follows.
\(p=x/y\) (2)
\(AP=(\sum_{t=1}^{n}\text{Pt})/n\) (3)
\(mAP=(\sum_{t=1}^{m}\text{APt})/m\) (4)