Journal of Research in Science, Mathematics and Technology Education

Seeing Through the Eyes of Classroom Observers: The Case of Rating Contrasted Groups of Lessons With Classroom Observation Measures

Journal of Research in Science, Mathematics and Technology Education, Volume 7, Issue 2, May 2024, pp. 47-77
OPEN ACCESS VIEWS: 280 DOWNLOADS: 229 Publication date: 15 May 2024
Classroom observations are commonly employed to assess quality of instruction in research and practice in mathematics education. However, there is more to be learned about how sensitive classroom observation protocols are to exemplars of strong mathematics instruction, and continuous refinements to observation protocols or rating processes that may be warranted. In this study, we use the public-released mathematics videos from the Third International Mathematics and Science Study (TIMSS) to examine how classroom observers, using two contemporary classroom observation instruments, rate a set of lessons whose instructional quality is in theory expected to differ, also referred to as contrasted groups. We find that descriptively, the pattern of findings is distinct from prior studies’ conclusions about the relative instructional quality reflected in the TIMSS video pool. We provide qualitative examples to illustrate the findings, and discuss implications for future research. We point to the potential value of exploring classroom observation rubrics’ performance using ‘contrasted groups’ of lesson videos, as a tool to broaden our understanding of how observation instruments are functioning.
Mathematics teaching, classroom observations, TIMSS, mathematics instruction
Lynch, K. (2024). Seeing Through the Eyes of Classroom Observers: The Case of Rating Contrasted Groups of Lessons With Classroom Observation Measures. Journal of Research in Science, Mathematics and Technology Education, 7(2), 47-77.
  1. American Educational Research Association, American Psychological Association, & National Council on
  2. Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.
  3. Ayalon, M., & Rubel, L. H. (2022). Selecting and sequencing for a whole-class discussion: Teachers’
  4. considerations. The Journal of Mathematical Behavior, 66, 100958.
  5. Bell, C. A., Dobbelaer, M. J., Klette, K., & Visscher, A. (2019). Qualities of classroom observation systems. School
  6. Effectiveness and School Improvement, 30(1), 3-29.
  7. Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach
  8. to observation protocol validity. Educational Assessment, 17(2-3),62-87. doi: 10.1080/10627197.2012.715014
  9. Bell, C. A., Qi, Y., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2015). Improving
  10. observational score quality: Challenges in observer thinking. Designing teacher evaluation systems: New guidance from the measures of effective teaching project, 50-97.
  11. Berk, R. A. (1976). Determination of optional cutting scores in criterion-referenced measurement. The Journal of
  12. Experimental Educational, 4-9.
  13. Bostic, J., Lesseig, K., Sherman, M., & Boston, M. (2021). Classroom observation and mathematics education
  14. research. Journal of Mathematics Teacher Education, 24, 5-31.
  15. Brophy, J. (1986). Teacher influences on student achievement. American Psychologist, 41(10), 1069.
  16. Centers for Medicare and Medicaid Services Measures Management System. (2023). Measure evaluation criteria:
  17. validity.
  18. acceptability/validity
  19. Charalambous, C. Y., & Praetorius, A. K. (2018). Studying mathematics instruction through different lenses: Setting
  20. the ground for understanding instructional quality more comprehensively. ZDM, 50, 355-366.
  21. Charalambous, C. Y., & Praetorius, A. K. (2022). Synthesizing collaborative reflections on classroom observation
  22. frameworks and reflecting on the necessity of synthesized frameworks. Studies in Educational Evaluation, 75, 101202.
  23. Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance
  24. standards on tests. Thousand Oaks, CA: Sage.
  25. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.
  26. Delaney, S. (2012). A validation study of the use of mathematical knowledge for teaching measures in Ireland.
  27. ZDM, 44, 427-441.
  28. Fernandez, C. & Yoshida, M. (2004). Lesson study: A Japanese approach to improving mathematics teaching and
  29. learning. Mahwah, NJ: Erlbaum.
  30. Geist, E. A. (2000). Lessons from the TIMSS videotape study. Teaching Children Mathematics, 7(3), 180-185.
  31. Guerrero-Rosada, P., Weiland, C., McCormick, M., Hsueh, J., Sachs, J., Snow, C., & Maier, M. (2021). Null
  32. relations between CLASS scores and gains in children’s language, math, and executive function skills: A replication and extension study. Early Childhood Research Quarterly, 54, 1-12.
  33. Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., Jacobs, J., . . . Stigler, J. W. (2003).
  34. Teaching mathematics in seven countries: Results from the TIMSS 1999 Video Study (NCES Publication No. 2003-013). Washington, DC: U.S. Department of Education, National Center for Education Statistics.
  35. Hiebert, J., Gallimore, R., & Stigler, J. W. (2003). The new heroes of teaching. Education Week, 23(10), 42-56.
  36. Hiebert, J., Stigler, J. W., Jacobs, J. K., Givvin, K. B., Garnier, H., Smith, M., . . . Gallimore, R. (2005).
  37. Mathematics teaching in the United States today (and tomorrow): Results from the TIMSS 1999 video
  38. study. Educational Evaluation and Policy Analysis, 27(2), 111-132.
  39. Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., & Ball, D. L. (2008).
  40. Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study.
  41. Cognition and Instruction, 26(4), 430-511.
  42. Hill, H. C., Charalambous, C. Y., Blazar, D., McGinn, D., Kraft, M. A., Beisiegel, M., ... & Lynch, K. (2012).
  43. Validating arguments for observational instruments: Attending to multiple sources of variation. Educational Assessment, 17(2-3), 88-106.
  44. Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added
  45. scores. American Educational Research Journal, 48(3), 794-831.
  46. Hofkens, T., Pianta, R. C., & Hamre, B. (2023). Teacher-student interactions: theory, measurement, and evidence
  47. for universal properties that support students’ learning across countries and cultures. In Effective Teaching Around the World: Theoretical, Empirical, Methodological and Practical Insights (pp. 399-422). Cham: Springer International Publishing.
  48. Jacobs, J. K., Hollingsworth, H., & Givvin, K. B. (2007). Video-based research made “easy”:
  49. Methodological lessons learned from the TIMSS video studies. Field Methods, 19(3), 284-299.
  50. Kane, M. T. (1992). An argument-based approach to validity. Psychological bulletin, 112(3), 527.
  51. Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching.
  52. Klette, K., & Blikstad-Balas, M. (2018). Observation manuals as lenses to classroom teaching: Pitfalls and
  53. possibilities. European Educational Research Journal, 17(1), 129-146.
  54. Lampert, M. (2001). Teaching problems and the problems of teaching. New Haven: Yale University Press.
  55. La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The classroom assessment scoring system: Findings from the
  56. prekindergarten year. The Elementary School Journal, 104(5), 409-426.
  57. LessonLab. (n.d.-a). JP3 Lesson Graph.
  59. LessonLab. (n.d.-b). JP3 Lesson Plan.
  61. LessonLab. (n.d.-c). JP3 Teacher Comments.
  63. LessonLab. (n.d.-d). JP3 Transcript.
  65. LessonLab. (n.d.-e). US2 Lesson Graph.
  67. LessonLab. (n.d.-f). US2 Teachers Comments.
  69. LessonLab. (n.d.-g). US2 Transcript.
  70. Leung, F. K. S. (2005). Some characteristics of East Asian mathematics classrooms based on data from the TIMSS
  71. 1999 video study. Educational Studies in Mathematics, 60, 199-215.
  72. Leyva, D., Weiland, C., Barata, M., Yoshikawa, H., Snow, C., Treviño, E., & Rolla, A. (2015).
  73. Teacher–child interactions in Chile and their associations with prekindergarten outcomes. Child Development, 86(3), 781-799.
  74. Mackie, D.M., & Smith, E. (1998). Intergroup relations: insights from a theoretically integrative approach.
  75. Psychological Review, 105(4): 499–529.
  76. Mashburn, A. J., Pianta, R. C., Hamre, B. K., Downer, J. T., Barbarin, O. A., Bryant, D., ... & Howes, C. (2008).
  77. Measures of classroom quality in prekindergarten and children’s development of academic, language, and social skills. Child Development, 79(3), 732-749.
  78. McDoniel, M. E., Townley-Flores, C., Sulik, M. J., & Obradović, J. (2022). Widely used measures of classroom
  79. quality are largely unrelated to preschool skill development. Early Childhood Research Quarterly, 59, 243-253.
  80. National Council of Teachers of Mathematics (NCTM). (2014). Principles to actions: Ensuring mathematical
  81. success for all. Reston, VA: Author.
  82. National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010).
  83. Common Core State Standards for Mathematics. Washington, DC: Authors.
  84. Nittler, K. (2020). How evaluation ratings impact teacher pay.
  85. mpact-teacher-pay
  86. Office of Head Start. (2015). Use of CLASS in Head Start.
  87. Pakarinen, E., Lerkkanen, M. K., Poikkeus, A. M., Kiuru, N., Siekkinen, M., Rasku-Puttonen, H., & Nurmi, J. E.
  88. (2010). A validation of the classroom assessment scoring system in Finnish kindergartens. Early Education and development, 21(1), 95-124.
  89. Perlman, M., Falenchuk, O., Fletcher, B., McMullen, E., Beyene, J., & Shah, P. S. (2016). A systematic review and
  90. meta-analysis of a measure of staff/child interaction quality (the classroom assessment scoring system) in early childhood education and care settings and child outcomes. PloS One, 11(12), e0167660.
  91. Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes:
  92. Standardized observation can leverage capacity. Educational Researcher, 38(2), 109-119.
  93. Pianta, R.C., Hamre, B., & Mintz, S. (2012). Upper elementary and secondary CLASS technical manual.
  94. Praetorius, A. K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German
  95. framework of three basic dimensions. ZDM, 50, 407-426.
  96. Praetorius, A. K., Klieme, E., Kleickmann, T., Brunner, E., Lindmeier, A., Taut, S., & Charalambous, C. (2020).
  97. Towards developing a theory of generic teaching quality. Origin, current status, and necessary next steps regarding the Three Basic Dimensions Model (pp. 15-36).
  98. Richland, L. E. (2015). Linking gestures: Cross-cultural variation during instructional analogies. Cognition and
  99. Instruction, 33(4), 295-321.
  100. Santagata, R., Kersting, N., Givvin, K. B., & Stigler, J. W. (2010). Problem implementation as a lever for change:
  101. An experimental study of the effects of a professional development program on students’ mathematics
  102. learning. Journal of Research on Educational Effectiveness, 4(1), 1-24.
  103. Schoen, H. L., Cebulla, K. J., Finn, K. F., & Fi, C. (2003). Teacher variables that relate to student achievement when
  104. using a standards-based curriculum. Journal for Research in Mathematics Education, 34(3), 228-259.
  105. Schoenfeld, A. S. (1998). Toward a theory of teaching-in-context. Issues in Education, 4(1), 1-95.
  106. Smith, M. (2011). A procedural focus and a relationship focus to algebra: How US teachers and Japanese teachers treat systems of equations. In J. Cai & E. Knuth (Eds.), Early algebraization: A global dialogue from multiple perspectives (pp. 511-528). Springer Berlin Heidelberg.
  107. Smith, M., & Stein, M. K. (2018). 5 Practices for orchestrating productive mathematics discussion. National
  108. Council of Teachers of Mathematics.
  109. Star, J. R., Pollack, C., Durkin, K., Rittle-Johnson, B., Lynch, K., Newton, K., & Gogolen, C. (2015). Learning from
  110. comparison in algebra. Contemporary Educational Psychology, 40, 41-54.
  111. Stein, M. K., Engle, R. A., Smith, M. S., & Hughes, E. K. (2008). Orchestrating productive mathematical
  112. discussions: Helping teachers learn to better incorporate student thinking. Mathematical Thinking and Learning, 10, 313-340.
  113. Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An
  114. analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50-80.
  115. Stigler, J. W. (n.d.). Collecting the public use lessons.
  116. Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from the world's teachers for improving education
  117. in the classroom. New York: Simon & Schuster.
  118. Tekkumru-Kisa, M., Stein, M. K., & Doyle, W. (2020). Theory and research on tasks revisited: Task as a context for
  119. students’ thinking in the era of ambitious reforms in mathematics and science. Educational Researcher, 49(8), 606-617.
  120. Verlegh, P. W. (2007). Home country bias in product evaluation: the complementary roles of economic and socio-
  121. psychological motives. Journal of International Business Studies, 38, 361-373.
  122. Walsh, K., & Ross, E. (2019). NCTQ State of the states 2019: Teacher and principal evaluation
  123. policy.
  124. Teacher-and-Principal-Evaluation-Policy
  125. White, M. C. (2018). Rater performance standards for classroom observation instruments. Educational Researcher,
  126. 47(8), 492-501.
  127. White, M., & Klette, K. (2024). Signal, error, or bias? exploring the uses of scores from observation systems.
  128. Educational Assessment, Evaluation and Accountability, 1-24.
Creative Commons License