Machine learning has quickly become an integral part of modern business. A lot of effort is put into developing and deploying machine learning models used in estimation, prediction of economic factors, or even in interacting with clients. This effort includes planning for tools and platforms to be used in training and validating these models, as well as allocating resources, time, and budget for these tasks. However, this planning remainslargely dependent on human acumen and is expensive to determine in a systematic fashion with automated tools. Benchmarking is the process of efficiently running experiments to determine a system's performance requirements, among others, in order to aid planning and resource allocation. Benchmarking intelligent and data-intensive systems remains in its infancy and does not cover fully realistic or very specific case studies. In this work, we propose SparkPerf, a benchmarking tool specifically designed for machine learning applications deployed with Apache Spark. SparkPerf focuses on longitudinal transactional workloads, which represent a more realistic class of case studies for enterprises, with high customizability, allowing users to test their own applications with their own, synthetically augmented, datasets. Our experiments demonstrate the benchmark's reliability, consistency, portability, and customizability.