Celeste Combrinck - 21DOCS Test Area

IntroductionRensis Likert originally designed his rating scales to reflect the underlying construct through the level of agreement on a numerical continuum (Likert, 1974). His original idea was that the level of agreement could be summed or averaged and that each item contributes to the measured central dimension (Boone Jr & Boone, 2012). The assumption that aggregated responses represent the underlying construct is based on Classical Test Theory (C.T.T.) (South et al., 2022; Warmbrod, 2014). Critique of the Likert-type scale includes the assumption that they lead to interval variables, whether parametric tests are appropriate for such formats and the oversimplification of complex attitudes created from the response structure (Harpe, 2015; Sullivan & Artino Jr, 2013). The other critique of Likert scales is that they can lead to response biases such as social desirability and central tendency responding (Kusmaryono et al., 2022), addressed in the current article. Self-report data of any type can be vulnerable to acquiescence bias, social desirability and neutral responding, which Likert-type response options can potentially aggravate (Anna Brown & Albert Maydeu-Olivares, 2018; Combrinck & Inglis, 2020; Wang et al., 2017). The interpretation of Likert options and ratings may also vary among respondents, causing other potential biases due to the ambiguities of language (Hancock & Volante, 2020). Using a questionnaire which has shown acceptable and high-reliability indices will not necessarily reduce over-reporting of “good” behaviour, and the halo effect might persist (Douglas & Tramonte, 2015; Kreitchmann et al., 2019). Respondents might be tempted to endorse statements they perceive as socially acceptable (Latkin et al., 2017; Stepan & Christian, 2020). Careless or inattentive survey responses could be due to several factors, including poor targeting, inappropriate sampling or poor item writing (Jaeger & Cardello, 2022). Berry et al. (2019) found that inattentive responses correlate with being male, low on personality factors such as conscientiousness and agreeableness, and less sensitive themes included in the items (whereas susceptible topics, for example, crime, drew more attention from respondents). The number of categories in a Likert scale could increase inattentive responding, and Revilla et al. (2014) found that five categories were the maximum number that should be employed. Even transient psychological states (for example, negative affect) could influence a respondent (Huang & Wang, 2021). While we have little to no control over the respondent’s traits and feelings when completing a survey, the instrument’s quality is the researcher’s responsibility. This is why using Rasch models is strongly advocated to assess any format type of instrument, including Likert-types (Knoch & McNamara, 2015; Retief et al., 2013). From a measurement perspective, many of these critiques levelled at Likert formats can be addressed by testing the hypothesis that respondent ability to endorse more of the construct aligns with the difficulty of item endorsability, i.e. applying the Rasch rating scale model, and testing the claim that an underlying construct is consistently being represented by all items (Combrinck, 2020; Fisher Jr, 2009). The researcher should be aware of the shortcomings of using any particular format and examine underlying assumptions of measurement (Fisher Jr, 2022). Likert item types remain immensely popular in survey design due to familiarity with the format and ease of analysis. Alternative formats have been suggested, such as dichotomous scales (Dolnicar et al., 2011), fuzzy rating scales (Ana et al., 2020), slider scales (Kemper et al., 2020) and rankings (Yannakakis & Martínez, 2015). Forced-choice questions present the respondents with a list of statements or items, and they choose the most suitable options (Anna Brown & Albert Maydeu-Olivares, 2018). Compared to Likert-style formats, the advantages of forced-choice questions have been explored in various studies with potential advantages well explored (Chan, 2003; Cheung & Chan, 2002; Lee et al., 2019). The Forced-multiple choice format is popular in psychological research, especially personality and career assessments (Brown & Maydeu-Olivares, 2013). Forced-choice formats have disadvantages, such as the complexity of administration and analysis. Administering forced-choice would be especially difficult if a paper and pen route is followed, as the respondents may not understand what is required to answer the question (Buchanan & Morrison, 1985; Zhang et al., 2020). Analysing traditional Likert-type scales is also more accessible, with established techniques and parametric analysis options after applying logarithmic transformations (Hontangas et al., 2015; Smyth et al., 2006). Deriving interpretations from forced-choice formats can be complicated, making it challenging to compare groups or longitudinal assessments, and this places an additional burden on the researcher (Salgado et al., 2015; van Eijnatten et al., 2015). Some constructs and ratings do not work well with forced-choice formats, and here, the Likert type is preferred as it is more versatile (Lee et al., 2019; Miller et al., 2018). Bäckström and Björklund (2023) found that Likert type items and forced-choice options yield similar information. A mixture of items format solution which leaves a backdoor for the researchers could be very beneficial and should be considered (Schulte et al., 2021). Using forced-choice might be valuable in certain instances, for example, constructs that lead to mid-choice favouring (Nadler et al., 2015). At the same time, using Likert-type formats may be advantageous when a response is required for each item, when respondents are more familiar with the format and when analysis requires the numerical range of the items for analysis and comparison purposes (Hall et al., 2016; Nemoto & Beglar, 2014; Wu & Leung, 2017). Despite the advantages of Likert formats, social desirability remains a serious concern. Therefore, the paper presents a case for forced-choice formats in social and human science settings to offer more usable data. Forced-choice questions also have a definitive practical advantage – the format requires less time and can be less attention-intensive for respondents than rating each statement on a Likert-type (Brown, 2016; Brown & Maydeu-Olivares, 2011). The short attention span of humans has been well documented, and we researchers want accurate answers, which our respondents are more likely to give if we do not overburden them (Revilla & Ochoa, 2017). The usefulness of item response models (I.R.T.), including the Rasch model, for analysing ipsative data is well documented (Brown & Maydeu-Olivares, 2013; Anna Brown & Albert Maydeu-Olivares, 2018; Anna Brown & Alberto Maydeu-Olivares, 2018). Test-retest and comparative studies using Likert-type items have been used to validate ipsative results (Calderón Carvajal et al., 2021). However, the current paper argues that this is unnecessary when using log-transformed measures such as those produced by the Rasch model (Rasch, 1960, 1993; Van Alphen et al., 1994). A valid and reliable representation of a construct requires a range of relevant aspects to be examined, and Rasch models offer strategies to evaluate instrument functioning (Bond et al., 2021; Boone, 2016). Instruments are valid if their results inform decisions, changes and growth (Boone et al., 2014). When responses lack variance, i.e., skewed values, the data quality is severely impacted, and limited inferences can be drawn (Kreitchmann et al., 2019; Xiao et al., 2017). In the current study, I demonstrate how the Rasch model can be used to check the functioning of forced-choice questionnaires, which could lead to increased awareness and application of the format to gain more valuable inferences and enhance meaningful measurement. I offer guidelines for researchers who want to use alternative formats. In previous studies, Rasch models have been applied to ipsative data with promising results (Andrich, 1989; Van Zile-Tamsen, 2017; Wang et al., 2017).Research questionsResearch question 1: What does the evidence show us about forced-choice questions as viable alternatives to Likert-types?Research question 2: How can we use Rasch models to evaluate the psychometric properties of forced-choice questions?Research question 3: What does a Rasch analysis reveal about the usefulness of forced-choice questions compared to Likert-types?