Existing convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks, but a large number of parameters in convolutional layers requires huge storage and computation resources which makes it difficult to deploy CNNs on memory-constraint embedded devices. In this paper, we propose a novel compression method that generates the convolution filters in each layer by combining a set of learnable low-dimensional binary filter bases. The proposed method designs more compact convolution filters by stacking the linear combinations of these filter bases. Because of binary filters, the compact filters can be represented using less number of bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficient through L1-ball projection when conducting linear combination to avoid overfitting. In addition, we analyze the compression performance of the proposed method in detail. Evaluations on four benchmark datasets under VGG-16 and ResNet-18 structures show that the proposed method can achieve a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.