Aim. The multi-part Unified Parkinson’s Disease Rating Scale is the standard instrument in clinical trials. A sum of scores for all items in one or more parts of the instrument is usually analyzed. Without accounting for relative importance of individual items, this sum of scores conceivably does not optimize the power of the instrument. The aim was to compare the ability to detect drug effect in slowing down motor function deterioration, as measured by Part III of the Scale - motor examinations - between the item scores and the sum of scores. Methods. We used data from 423 patients in a Parkinson’s disease progression trial to estimate the symptom severity by item response modelling; modelled symptom progression using the severity and the sum of scores; and conducted simulations to compare the sensitivity of detecting a broad range of hypothetical drug effects on progression using the severity and the sum of scores. Results. The severity endpoint was far more sensitive than the sum of scores for detecting treatment effects, e.g., requiring 280 versus 570 patients per arm to achieve 60% Probability of Success for detecting a range of potential effects in a 2-year trial. Items related to the left side of the body were most informative; and the domain relevance of tremor items was questionable. Conclusion. This analysis generated clear evidence that longitudinal modelling of item scores can enhance trial efficiency and success. It also prompted the needs for a consensus on the placement of the tremor items in the instrument.