Measurement, reliability and scales construction in a view of classical true-score theory

Measurement, reliability and scales construction in a view of classical true-score theory


Piotr Tarka


Keywords: reliability, methods of reliability, classical true-score theory, item response theory, scales


Summary: The main objective of this article was to describe issues related to theory of measurement and discuss significant role of selected reliability methods in a view of Classical True-Score Theory - CTM. In consequence the emphasis was put on description and comparison between different methods of reliability assessment such as: test-rest, parallel-test, split-half or internal consistency. The author investigated them primarily in context of psychometrics and its general applications in the area of marketing and customers research studies. In the second part of article, a new perspective on the measurement (Item Response Theory - IRT) was discussed. Later on, both concepts of measurement CTM and IRT were compared. Although researchers in the field of psychometrics have paid a considerable attention to measurement theory, in another field (e.g. marketing) these topics somehow have been neglected. And the recent advances in statistical analysis have drawn increasing attention to these nagging problems of measurement. Therefore in the article the author decided to review and stress the great importance of various concepts in reliability measurement. The author hopes that for researchers in the above field who wish to familiarize themselves with current debates over the right choice of an appropriate measurement design and strategies, it will be a good starting point for their own research and reliability estimation, especially when making a decision on how to develop an appropriate scale for measurement and choose for that scale respective estimate reliability. As a result, the description will be useful for managers, marketers, who want to study reliability and problems associated with scales construction to study customers behavior or market trends and those who want to attempt integrate measurement theory into their aggregate models of the business.



Allen, M. J., Yen, W. M., Introduction to Measurement Theory. Waveland Press, Illinois 1979. Andrich, D., A Rating Formulation for Ordered Response Categories, “Psychometrika”, Vol. 43., pp. 561-573, 1978.

Andrich, D., An Index of Person Separation in Latent Trait Theory, the Traditional KR 20 Index and the Guttman Scale Response Pattern, “Educational Research and Perspectives”, Vol., 9, pp. 95-104, 1982.

Aranowska, E., Pomiar ilościowy w psychologii [Quantitative Measurement in Psychology]. Scholar, Warszawa 2005.

Armor, D. J., Theta Reliability and Factor Scaling [in:] Costner H. L., (ed.), Sociological Methodology. Jossey-Bass, San Francisco 1974.

Bearden, W. O., Netemeyer, R. G., Handbook of Marketing Scales - Multi-item Measures for Marketing and Consumer Behavior Research. Sage Publication, London 1999.

Birnbaum, A., Some Latent Trait Models and their Uses in Inferring an Examinee’s Ability [in:] Lord, F. M., Novick, M. R. (eds.), Statistical Theories of Mental Test Scores, pp. 397-479. Addison-Wesley, Reading, MA 1968.

Blalock, H. M., The Measurement Problem, [in:] Blalock H. M., Blalock A., (eds.) Methodology in Social Research. McGraw-Hill, New York 1968.

Bollen, K. A., Structural Equations with Latent Variables. Wiley and Sons, New York 1989. Brzezinski, J., Metodologia badań psychologicznych [Methodology of Psychological Research]. PWN, Warszawa 2007.

Cortina, J. M., What Is Coefficient Alpha? An Examination of Theory and Applications. “Journal of Applied Psychology”, Vol. 78, pp. 98-104, 1993.

Cronbach, L. J., Test Reliability: Its Meaning and Determination, “Psychometrika”, Vol. 12, pp. 1-16, 1947.

Cronbach, L. J., Coefficient Alpha and the Internal Structure of Tests, “Psychometrika”, Vol. 16, pp. 297-334, 1951.

Davison, M. L., Sharma, A. R., Parametric Statistics and Levels of Measurement: Factorial Designs and Multiple Regressions, “Psychological Bulletin”, Vol. 107, pp. 394-400, 1990.

DeVellis, R. F., Scale Development - Theory and Applications. Sage Publications, London 2003.

Duncan, O. D., Notes On Social Measurement - Historical and Critical. Russell Sage, New York 1984.

Embretson, S. E., A General Latent Trait Model for Response Processes, “Psychometrika”, Vol. 49. pp. 175-186, 1984.

Embretson, S. E., A Multidimensional Latent Trait Model for Measuring Learning and Change, “Psychometrika”, Vol. 56, pp. 495-516, 1991.

Embretson, S. E., Reise, S. P. Item Response Theory for Psychologists. Lawrence Erlbaum Associates, New Jersey 2000.

Ferguson, G. A, Takane, Y., Analiza statystyczna w psychologii i pedagogice [Statistical Analysis in Psychology and Pedagogics], PWN, Warszawa 2009.

Graham, J. M., Congeneric and (Essentially) Tau-Equivalent Estimates of Score Reliability, “Psychological Measurement”, Vol. 66, 6, pp. 930-944, 2006.

Guilford, J. P., Psychometric Methods,McGraw-Hill, New York 1936.

Gulliksen, H., Theory of Mental Tests. Wiley, New York 1950.

Hair, J. F., Anderson, R. E., Tatham, R. L., Black W. C., Multivariate Data Analysis with Readings, 3rd ed. Macmillan, New York 1992.

Hattie, J., Methodology Review: Assessing Unidimensionality of Tests and Items, “Applied Psychological Measurement”, Vol. 9. pp. 139-164, 1985.

Iacobucci, D., Coughlan, A. T., Duhachek, A., Results on the Standard Error of the Coefficient Alpha Index of Reliability, “Marketing Science”, Vol. 24, No. 2, pp. 294-301, Spring 2005.

Jones, L. V., The Nature of Measurement [in:] Thorndike R. L., (ed.) Educational Measurement, 2nd ed., American Council on Education, Washington D.C.1971.

Kelderman, H., Rijkes, C. P. M., Loglinear Multidimensional IRT Models for Polytomously Scored Items, “Psychometrika”, Vol. 59, pp. 149-176, 1994.

Kaydos, W. J., Operational Performance Measurement. CRC Press, Florida 1999.

Kelly, G. A., The Theory and Technique of Assessment, “Annual Review of Psychology”, Volume 9, pp. 323-352, February 1958.

Kozyra, C. Metody analizy i oceny jakości usług [Methods of Analysis and Evaluation of Service Quality] - PhD Thesis, Akademia Ekonomiczna we Wrocławiu, 2004

Kuder, G., Richardson, M., The Theory of the Estimation of Test Reliability, “Psychometrika” pp. 151-160, Vol. II, September 1937.

Lord, F. N., Novick, M. R., Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA 1968.

Magnusson, D., Wprowadzenie do teorii testów [Introduction to Test Theory]. PWN, Warszawa 1981.

Malhotra, N. K., Basic Marketing Research - A Decision-Making Approach, 3rd ed. Pearson, London 2009.

Masters, G. N., A Rasch Model for Partial Credit Scoring, “Psychometrika”, Vol. 47, pp. 149­174, 1982.

Michell, J., An Introduction to the Logic of Psychological Measurement. Lawrence Erlbaum Associates, Hillsdale, NJ 1990.

Mokken, R. J. A Theory and Procedure of Scale Analysis. De Gruyter, Berlin 1971.

Netemeyer, R. G., Bearden, W. O., Sharma, S., Scaling Procedures - Issues and Applications. Sage Publications, London 2003.

Nunnally, J. C., Psychometric Theory. McGraw-Hill, New York 1978.

Ostasiewicz, W., Statistical Modeling of Survey Data. Technical Reports of Department of Statistics and Economic Cybernetics, No. 37, Wrocław 2002.

Ostasiewicz, W., Istota pomiaru statystycznego [Essence of Statistical Measurement], [in:] Ostasiewicz W., (ed.) Pomiar statystyczny. Akademia Ekonomiczna, Wrocław 2003.

Peter, J. P., Reliability: A Review of Psychometric Basics and Recent Marketing Practices, “Journal of Marketing Research”, Vol. 16, pp. 6-17, February 1979.

Raju, N. S. The Area Between Two Item Characteristic Curves, “Psychometrika”, Vol. 53, pp. 495-502, 1988.

Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. University of Chicago Press, Chicago 1960.

Rosenbaum, P. R. Comparing Item Characteristic Curves, “Psychometrika”, Vol. 52, pp. 217-233, 1987.

Rossiter, J. R., The C-OAR-SE Procedure for Scale Development in Marketing, “International Journal of Research in Marketing”, Volume 19, Issue 4, pp. 305-335, December 2002.

Sagan, A, Zastosowanie wielowymiarowych skal czynnikowych i skal Rascha w badaniach marketingowych [Application of Multidimensional Factor Scales and Rasch Scales in Marketing Research], Zeszyty naukowe, No. 605, pp. 73-92, 2002.

Spearman, C., The Proof and Measurement of Association Between Two Things, “American Journal of Psychology”, Vol. 15, pp. 72-101, 1904a.

Spearman, C., General Intelligence Objectively Determined and Measured. “American Journal of Psychology”, pp. Vol. 15, 201-293, 1904b.

Spearman, C., Demonstration of Formulae for True Measurement of Correlation, “American Journal of Psychology”, Vol. 18, pp. 161-169, 1907.

Spector, P. E., Summated Rating Scale Construction, Sage Publications, London 1992.

Symonds. P. M., Factors Influencing Test Reliability, “Journal of Educational Psychology”, Vol. 19, pp. 73-87, February 1928.

Stevens, S. S., Mathematics, Measurement and Psychophysics, [in:] Stevens, S. S. (ed.) Measurement in Social Sciences: Theories and Strategies. John and Wiley, New York 1951.

Torgerson, W. S. Theory and Methods of Scaling. John Wiley and Sons, New York 1958.

Wertz, C., Linn, R., Jöreskog, K., Intraclass Reliability Estimates: Testing Structural Assumptions, “Educational and Psychological Measurement”, 34, 1, pp. 25-33, 1974. Wilson, M., Constructing Measures: An Item Response Modeling Approach. New York: Lawrence Erlbaum Associates, 2005.

Zeller, R. A., Carmines, E. G., Reliability and Validity Assessment. Sage Publications, New York 1979.

Rost J., Rasch Models in Latent Classes: An Integration of Two Approaches to Iitem Analysis, “Applied Psychological Measurement” Vol. 14., pp. 271-282. 1990.



Download PDF

Read 879 times Last modified on Thursday, 27 February 2014 10:45