The required number of learning samples grows with the number of parameters that have to be estimated. In a deep convolutional neural network, while the total number of parameters (weights and biases) grows with the number of synapses between neurons, the number of independent parameters can be many orders of magnitude smaller than the number of neurons. Furthermore, the early layers are detecting features, many samples of which appear in every ample sentence or picture. The deeper layers (with far fewer neurons) are detecting thoughts or objects that rarely appear in the learning samples.