In psychometrics, we generally check that (1) unidimensionnality of the scale holds, (2) scale reliability is sufficient. Under special circumstances, you may be able to treat the responses as if they fell on an interval scale.
When comparing two such scale scores (for two different instruments), we might even consider using attenuated correlation measures instead of classical Pearson correlation coefficient. To do this, typically the respondents need to be in close agreement regarding the meaning of the scale responses and the analysis (or the decisions made based on the analysis) should be relatively insensitive to problems that may arise.
Some areas are completely willing to treat a single Likert item as interval even though it clearly is ordinal. You may also have a look at Applications of latent trait and latent class models in the social sciences, from Rost & Langeheine, and W. When validating a psychometric scale, it is important to look at so-called ceiling/floor effects (large asymmetry resulting from participants scoring at the lowest/highest response category), which may seriously impact on any statistics computed when treating them as numeric variable (e.g., country aggregation, t-test).
In all these cases each aggregate measure (perhaps the mean) is based on many individual responses (e.g., n=50, 100, 1000, etc.).In these cases the original Likert item begins to take on properties that resemble an interval scale at the aggregate level.Likert items may be regarded as true ordinal scale, but they are often used as numeric and we can compute their mean or SD. The intervals between positions on the scale are monotonic but never so well-defined as to be numerically uniform increments.This is often done in attitude surveys, although it is wise to report both mean/SD and % of response in, e.g. When using summated scale scores (i.e., we add up score on each item to compute a "total score"), usual statistics may be applied, but you have to keep in mind that you are now working with a latent variable so the underlying construct should make sense! That said, the distinction between ordinal and interval is based on the specific demands of the analysis being [email protected] - So, multiple items serve as a measurement triangulation for construct scales?
If yes, what are the criteria for determining that a researcher has enough relevant data points (i.e., items) to use the scale as an interval measurement?