Seeing Through the Eyes of Classroom Observers: The Case of Rating Contrasted Groups of Lessons With Classroom Observation Measures

Kathleen Lynch

doi:10.31756/jrsmte.723

Journal of Research in Science, Mathematics and Technology Education

Seeing Through the Eyes of Classroom Observers: The Case of Rating Contrasted Groups of Lessons With Classroom Observation Measures

Kathleen Lynch ¹ ^*

¹ Neag School of Education, University of Connecticut^* Corresponding Author

Download PDF

Journal of Research in Science, Mathematics and Technology Education, Volume 7, Issue 2, May 2024, pp. 47-77
OPEN ACCESS VIEWS: 986 DOWNLOADS: 700 Publication date: 15 May 2024

ABSTRACT

Classroom observations are commonly employed to assess quality of instruction in research and practice in mathematics education. However, there is more to be learned about how sensitive classroom observation protocols are to exemplars of strong mathematics instruction, and continuous refinements to observation protocols or rating processes that may be warranted. In this study, we use the public-released mathematics videos from the Third International Mathematics and Science Study (TIMSS) to examine how classroom observers, using two contemporary classroom observation instruments, rate a set of lessons whose instructional quality is in theory expected to differ, also referred to as contrasted groups. We find that descriptively, the pattern of findings is distinct from prior studies’ conclusions about the relative instructional quality reflected in the TIMSS video pool. We provide qualitative examples to illustrate the findings, and discuss implications for future research. We point to the potential value of exploring classroom observation rubrics’ performance using ‘contrasted groups’ of lesson videos, as a tool to broaden our understanding of how observation instruments are functioning.

KEYWORDS

Mathematics teaching, classroom observations, TIMSS, mathematics instruction

CITATION (APA)

Lynch, K. (2024). Seeing Through the Eyes of Classroom Observers: The Case of Rating Contrasted Groups of Lessons With Classroom Observation Measures. Journal of Research in Science, Mathematics and Technology Education, 7(2), 47-77. https://doi.org/10.31756/jrsmte.723

REFERENCES

American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.
Ayalon, M., & Rubel, L. H. (2022). Selecting and sequencing for a whole-class discussion: Teachers’
considerations. The Journal of Mathematical Behavior, 66, 100958.
Bell, C. A., Dobbelaer, M. J., Klette, K., & Visscher, A. (2019). Qualities of classroom observation systems. School
Effectiveness and School Improvement, 30(1), 3-29.
Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach
to observation protocol validity. Educational Assessment, 17(2-3),62-87. doi: 10.1080/10627197.2012.715014
Bell, C. A., Qi, Y., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2015). Improving
observational score quality: Challenges in observer thinking. Designing teacher evaluation systems: New guidance from the measures of effective teaching project, 50-97.
Berk, R. A. (1976). Determination of optional cutting scores in criterion-referenced measurement. The Journal of
Experimental Educational, 4-9.
Bostic, J., Lesseig, K., Sherman, M., & Boston, M. (2021). Classroom observation and mathematics education
research. Journal of Mathematics Teacher Education, 24, 5-31.
Brophy, J. (1986). Teacher influences on student achievement. American Psychologist, 41(10), 1069.
Centers for Medicare and Medicaid Services Measures Management System. (2023). Measure evaluation criteria:
validity. https://mmshub.cms.gov/measure-lifecycle/measure-testing/evaluation-criteria/scientific-
acceptability/validity
Charalambous, C. Y., & Praetorius, A. K. (2018). Studying mathematics instruction through different lenses: Setting
the ground for understanding instructional quality more comprehensively. ZDM, 50, 355-366.
Charalambous, C. Y., & Praetorius, A. K. (2022). Synthesizing collaborative reflections on classroom observation
frameworks and reflecting on the necessity of synthesized frameworks. Studies in Educational Evaluation, 75, 101202.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance
standards on tests. Thousand Oaks, CA: Sage.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.
Delaney, S. (2012). A validation study of the use of mathematical knowledge for teaching measures in Ireland.
ZDM, 44, 427-441.
Fernandez, C. & Yoshida, M. (2004). Lesson study: A Japanese approach to improving mathematics teaching and
learning. Mahwah, NJ: Erlbaum.
Geist, E. A. (2000). Lessons from the TIMSS videotape study. Teaching Children Mathematics, 7(3), 180-185.
Guerrero-Rosada, P., Weiland, C., McCormick, M., Hsueh, J., Sachs, J., Snow, C., & Maier, M. (2021). Null
relations between CLASS scores and gains in children’s language, math, and executive function skills: A replication and extension study. Early Childhood Research Quarterly, 54, 1-12.
Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., Jacobs, J., . . . Stigler, J. W. (2003).
Teaching mathematics in seven countries: Results from the TIMSS 1999 Video Study (NCES Publication No. 2003-013). Washington, DC: U.S. Department of Education, National Center for Education Statistics.
Hiebert, J., Gallimore, R., & Stigler, J. W. (2003). The new heroes of teaching. Education Week, 23(10), 42-56.
Hiebert, J., Stigler, J. W., Jacobs, J. K., Givvin, K. B., Garnier, H., Smith, M., . . . Gallimore, R. (2005).
Mathematics teaching in the United States today (and tomorrow): Results from the TIMSS 1999 video
study. Educational Evaluation and Policy Analysis, 27(2), 111-132.
Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., & Ball, D. L. (2008).
Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study.
Cognition and Instruction, 26(4), 430-511.
Hill, H. C., Charalambous, C. Y., Blazar, D., McGinn, D., Kraft, M. A., Beisiegel, M., ... & Lynch, K. (2012).
Validating arguments for observational instruments: Attending to multiple sources of variation. Educational Assessment, 17(2-3), 88-106.
Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added
scores. American Educational Research Journal, 48(3), 794-831.
Hofkens, T., Pianta, R. C., & Hamre, B. (2023). Teacher-student interactions: theory, measurement, and evidence
for universal properties that support students’ learning across countries and cultures. In Effective Teaching Around the World: Theoretical, Empirical, Methodological and Practical Insights (pp. 399-422). Cham: Springer International Publishing.
Jacobs, J. K., Hollingsworth, H., & Givvin, K. B. (2007). Video-based research made “easy”:
Methodological lessons learned from the TIMSS video studies. Field Methods, 19(3), 284-299.
Kane, M. T. (1992). An argument-based approach to validity. Psychological bulletin, 112(3), 527.
Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching. http://www.metproject.org/reports.php
Klette, K., & Blikstad-Balas, M. (2018). Observation manuals as lenses to classroom teaching: Pitfalls and
possibilities. European Educational Research Journal, 17(1), 129-146.
Lampert, M. (2001). Teaching problems and the problems of teaching. New Haven: Yale University Press.
La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The classroom assessment scoring system: Findings from the
prekindergarten year. The Elementary School Journal, 104(5), 409-426.
LessonLab. (n.d.-a). JP3 Lesson Graph.
https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fcd8769140b790c66b6422/1509742710519/JP3+Lesson+Graph_0.pdf
LessonLab. (n.d.-b). JP3 Lesson Plan.
https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fcd8940d9297c985fda01f/1509742740754/JP3+Lesson+Plan_0.pdf
LessonLab. (n.d.-c). JP3 Teacher Comments.
https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/5ca54f9015fcc0468d2c2739/1554337680488/JP3+Teacher+Comments.pdf
LessonLab. (n.d.-d). JP3 Transcript.
https://www.timssvideo.com/jp3-solving-inequalities
LessonLab. (n.d.-e). US2 Lesson Graph.
https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fd08836c3194747a386a9f/1509755012151/US2+Lesson+Graph.pdf
LessonLab. (n.d.-f). US2 Teachers Comments.
https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/5bc512b6104c7b27edd10d23/1539642038921/US2+Teacher+Comments.pdf
LessonLab. (n.d.-g). US2 Transcript. https://www.timssvideo.com/us2-writing-variable-expressions#tabs-2
Leung, F. K. S. (2005). Some characteristics of East Asian mathematics classrooms based on data from the TIMSS
1999 video study. Educational Studies in Mathematics, 60, 199-215.
Leyva, D., Weiland, C., Barata, M., Yoshikawa, H., Snow, C., Treviño, E., & Rolla, A. (2015).
Teacher–child interactions in Chile and their associations with prekindergarten outcomes. Child Development, 86(3), 781-799.
Mackie, D.M., & Smith, E. (1998). Intergroup relations: insights from a theoretically integrative approach.
Psychological Review, 105(4): 499–529.
Mashburn, A. J., Pianta, R. C., Hamre, B. K., Downer, J. T., Barbarin, O. A., Bryant, D., ... & Howes, C. (2008).
Measures of classroom quality in prekindergarten and children’s development of academic, language, and social skills. Child Development, 79(3), 732-749.
McDoniel, M. E., Townley-Flores, C., Sulik, M. J., & Obradović, J. (2022). Widely used measures of classroom
quality are largely unrelated to preschool skill development. Early Childhood Research Quarterly, 59, 243-253.
National Council of Teachers of Mathematics (NCTM). (2014). Principles to actions: Ensuring mathematical
success for all. Reston, VA: Author.
National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010).
Common Core State Standards for Mathematics. Washington, DC: Authors.
Nittler, K. (2020). How evaluation ratings impact teacher pay. https://www.nctq.org/blog/How-evaluation-ratings-i
mpact-teacher-pay
Office of Head Start. (2015). Use of CLASS in Head Start. http://eclkc.ohs.acf.hhs.gov/hslc/hs/sr/class
Pakarinen, E., Lerkkanen, M. K., Poikkeus, A. M., Kiuru, N., Siekkinen, M., Rasku-Puttonen, H., & Nurmi, J. E.
(2010). A validation of the classroom assessment scoring system in Finnish kindergartens. Early Education and development, 21(1), 95-124.
Perlman, M., Falenchuk, O., Fletcher, B., McMullen, E., Beyene, J., & Shah, P. S. (2016). A systematic review and
meta-analysis of a measure of staff/child interaction quality (the classroom assessment scoring system) in early childhood education and care settings and child outcomes. PloS One, 11(12), e0167660.
Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes:
Standardized observation can leverage capacity. Educational Researcher, 38(2), 109-119.
Pianta, R.C., Hamre, B., & Mintz, S. (2012). Upper elementary and secondary CLASS technical manual.
Praetorius, A. K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German
framework of three basic dimensions. ZDM, 50, 407-426.
Praetorius, A. K., Klieme, E., Kleickmann, T., Brunner, E., Lindmeier, A., Taut, S., & Charalambous, C. (2020).
Towards developing a theory of generic teaching quality. Origin, current status, and necessary next steps regarding the Three Basic Dimensions Model (pp. 15-36).
Richland, L. E. (2015). Linking gestures: Cross-cultural variation during instructional analogies. Cognition and
Instruction, 33(4), 295-321.
Santagata, R., Kersting, N., Givvin, K. B., & Stigler, J. W. (2010). Problem implementation as a lever for change:
An experimental study of the effects of a professional development program on students’ mathematics
learning. Journal of Research on Educational Effectiveness, 4(1), 1-24.
Schoen, H. L., Cebulla, K. J., Finn, K. F., & Fi, C. (2003). Teacher variables that relate to student achievement when
using a standards-based curriculum. Journal for Research in Mathematics Education, 34(3), 228-259.
Schoenfeld, A. S. (1998). Toward a theory of teaching-in-context. Issues in Education, 4(1), 1-95.
Smith, M. (2011). A procedural focus and a relationship focus to algebra: How US teachers and Japanese teachers treat systems of equations. In J. Cai & E. Knuth (Eds.), Early algebraization: A global dialogue from multiple perspectives (pp. 511-528). Springer Berlin Heidelberg.
Smith, M., & Stein, M. K. (2018). 5 Practices for orchestrating productive mathematics discussion. National
Council of Teachers of Mathematics.
Star, J. R., Pollack, C., Durkin, K., Rittle-Johnson, B., Lynch, K., Newton, K., & Gogolen, C. (2015). Learning from
comparison in algebra. Contemporary Educational Psychology, 40, 41-54.
Stein, M. K., Engle, R. A., Smith, M. S., & Hughes, E. K. (2008). Orchestrating productive mathematical
discussions: Helping teachers learn to better incorporate student thinking. Mathematical Thinking and Learning, 10, 313-340.
Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An
analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50-80.
Stigler, J. W. (n.d.). Collecting the public use lessons. http://www.timssvideo.com/
Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from the world's teachers for improving education
in the classroom. New York: Simon & Schuster.
Tekkumru-Kisa, M., Stein, M. K., & Doyle, W. (2020). Theory and research on tasks revisited: Task as a context for
students’ thinking in the era of ambitious reforms in mathematics and science. Educational Researcher, 49(8), 606-617.
Verlegh, P. W. (2007). Home country bias in product evaluation: the complementary roles of economic and socio-
psychological motives. Journal of International Business Studies, 38, 361-373.
Walsh, K., & Ross, E. (2019). NCTQ State of the states 2019: Teacher and principal evaluation
policy. https://www.nctq.org/publications/State-of-the-States-2019:-
Teacher-and-Principal-Evaluation-Policy
White, M. C. (2018). Rater performance standards for classroom observation instruments. Educational Researcher,
47(8), 492-501.
White, M., & Klette, K. (2024). Signal, error, or bias? exploring the uses of scores from observation systems.
Educational Assessment, Evaluation and Accountability, 1-24.

LICENSE

This work is licensed under a Creative Commons Attribution 4.0 International License.

[1] American Educational Research Association, American Psychological Association, & National Council on

[2] Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.

[3] Ayalon, M., & Rubel, L. H. (2022). Selecting and sequencing for a whole-class discussion: Teachers’

[4] considerations. The Journal of Mathematical Behavior, 66, 100958.

[5] Bell, C. A., Dobbelaer, M. J., Klette, K., & Visscher, A. (2019). Qualities of classroom observation systems. School

[6] Effectiveness and School Improvement, 30(1), 3-29.

[7] Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach

[8] to observation protocol validity. Educational Assessment, 17(2-3),62-87. doi: 10.1080/10627197.2012.715014

[9] Bell, C. A., Qi, Y., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2015). Improving

[10] observational score quality: Challenges in observer thinking. Designing teacher evaluation systems: New guidance from the measures of effective teaching project, 50-97.

[11] Berk, R. A. (1976). Determination of optional cutting scores in criterion-referenced measurement. The Journal of

[12] Experimental Educational, 4-9.

[13] Bostic, J., Lesseig, K., Sherman, M., & Boston, M. (2021). Classroom observation and mathematics education

[14] research. Journal of Mathematics Teacher Education, 24, 5-31.

[15] Brophy, J. (1986). Teacher influences on student achievement. American Psychologist, 41(10), 1069.

[16] Centers for Medicare and Medicaid Services Measures Management System. (2023). Measure evaluation criteria:

[17] validity. https://mmshub.cms.gov/measure-lifecycle/measure-testing/evaluation-criteria/scientific-

[18] acceptability/validity

[19] Charalambous, C. Y., & Praetorius, A. K. (2018). Studying mathematics instruction through different lenses: Setting

[20] the ground for understanding instructional quality more comprehensively. ZDM, 50, 355-366.

[21] Charalambous, C. Y., & Praetorius, A. K. (2022). Synthesizing collaborative reflections on classroom observation

[22] frameworks and reflecting on the necessity of synthesized frameworks. Studies in Educational Evaluation, 75, 101202.

[23] Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance

[24] standards on tests. Thousand Oaks, CA: Sage.

[25] Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.

[26] Delaney, S. (2012). A validation study of the use of mathematical knowledge for teaching measures in Ireland.

[27] ZDM, 44, 427-441.

[28] Fernandez, C. & Yoshida, M. (2004). Lesson study: A Japanese approach to improving mathematics teaching and

[29] learning. Mahwah, NJ: Erlbaum.

[30] Geist, E. A. (2000). Lessons from the TIMSS videotape study. Teaching Children Mathematics, 7(3), 180-185.

[31] Guerrero-Rosada, P., Weiland, C., McCormick, M., Hsueh, J., Sachs, J., Snow, C., & Maier, M. (2021). Null

[32] relations between CLASS scores and gains in children’s language, math, and executive function skills: A replication and extension study. Early Childhood Research Quarterly, 54, 1-12.

[33] Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., Jacobs, J., . . . Stigler, J. W. (2003).

[34] Teaching mathematics in seven countries: Results from the TIMSS 1999 Video Study (NCES Publication No. 2003-013). Washington, DC: U.S. Department of Education, National Center for Education Statistics.

[35] Hiebert, J., Gallimore, R., & Stigler, J. W. (2003). The new heroes of teaching. Education Week, 23(10), 42-56.

[36] Hiebert, J., Stigler, J. W., Jacobs, J. K., Givvin, K. B., Garnier, H., Smith, M., . . . Gallimore, R. (2005).

[37] Mathematics teaching in the United States today (and tomorrow): Results from the TIMSS 1999 video

[38] study. Educational Evaluation and Policy Analysis, 27(2), 111-132.

[39] Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., & Ball, D. L. (2008).

[40] Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study.

[41] Cognition and Instruction, 26(4), 430-511.

[42] Hill, H. C., Charalambous, C. Y., Blazar, D., McGinn, D., Kraft, M. A., Beisiegel, M., ... & Lynch, K. (2012).

[43] Validating arguments for observational instruments: Attending to multiple sources of variation. Educational Assessment, 17(2-3), 88-106.

[44] Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added

[45] scores. American Educational Research Journal, 48(3), 794-831.

[46] Hofkens, T., Pianta, R. C., & Hamre, B. (2023). Teacher-student interactions: theory, measurement, and evidence

[47] for universal properties that support students’ learning across countries and cultures. In Effective Teaching Around the World: Theoretical, Empirical, Methodological and Practical Insights (pp. 399-422). Cham: Springer International Publishing.

[48] Jacobs, J. K., Hollingsworth, H., & Givvin, K. B. (2007). Video-based research made “easy”:

[49] Methodological lessons learned from the TIMSS video studies. Field Methods, 19(3), 284-299.

[50] Kane, M. T. (1992). An argument-based approach to validity. Psychological bulletin, 112(3), 527.

[51] Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching. http://www.metproject.org/reports.php

[52] Klette, K., & Blikstad-Balas, M. (2018). Observation manuals as lenses to classroom teaching: Pitfalls and

[53] possibilities. European Educational Research Journal, 17(1), 129-146.

[54] Lampert, M. (2001). Teaching problems and the problems of teaching. New Haven: Yale University Press.

[55] La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The classroom assessment scoring system: Findings from the

[56] prekindergarten year. The Elementary School Journal, 104(5), 409-426.

[57] LessonLab. (n.d.-a). JP3 Lesson Graph.

[58] https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fcd8769140b790c66b6422/1509742710519/JP3+Lesson+Graph_0.pdf

[59] LessonLab. (n.d.-b). JP3 Lesson Plan.

[60] https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fcd8940d9297c985fda01f/1509742740754/JP3+Lesson+Plan_0.pdf

[61] LessonLab. (n.d.-c). JP3 Teacher Comments.

[62] https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/5ca54f9015fcc0468d2c2739/1554337680488/JP3+Teacher+Comments.pdf

[63] LessonLab. (n.d.-d). JP3 Transcript.

[64] https://www.timssvideo.com/jp3-solving-inequalities

[65] LessonLab. (n.d.-e). US2 Lesson Graph.

[66] https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/59fd08836c3194747a386a9f/1509755012151/US2+Lesson+Graph.pdf

[67] LessonLab. (n.d.-f). US2 Teachers Comments.

[68] https://static1.squarespace.com/static/59df81ea18b27ddf3bb4abb5/t/5bc512b6104c7b27edd10d23/1539642038921/US2+Teacher+Comments.pdf

[69] LessonLab. (n.d.-g). US2 Transcript. https://www.timssvideo.com/us2-writing-variable-expressions#tabs-2

[70] Leung, F. K. S. (2005). Some characteristics of East Asian mathematics classrooms based on data from the TIMSS

[71] 1999 video study. Educational Studies in Mathematics, 60, 199-215.

[72] Leyva, D., Weiland, C., Barata, M., Yoshikawa, H., Snow, C., Treviño, E., & Rolla, A. (2015).

[73] Teacher–child interactions in Chile and their associations with prekindergarten outcomes. Child Development, 86(3), 781-799.

[74] Mackie, D.M., & Smith, E. (1998). Intergroup relations: insights from a theoretically integrative approach.

[75] Psychological Review, 105(4): 499–529.

[76] Mashburn, A. J., Pianta, R. C., Hamre, B. K., Downer, J. T., Barbarin, O. A., Bryant, D., ... & Howes, C. (2008).

[77] Measures of classroom quality in prekindergarten and children’s development of academic, language, and social skills. Child Development, 79(3), 732-749.

[78] McDoniel, M. E., Townley-Flores, C., Sulik, M. J., & Obradović, J. (2022). Widely used measures of classroom

[79] quality are largely unrelated to preschool skill development. Early Childhood Research Quarterly, 59, 243-253.

[80] National Council of Teachers of Mathematics (NCTM). (2014). Principles to actions: Ensuring mathematical

[81] success for all. Reston, VA: Author.

[82] National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010).

[83] Common Core State Standards for Mathematics. Washington, DC: Authors.

[84] Nittler, K. (2020). How evaluation ratings impact teacher pay. https://www.nctq.org/blog/How-evaluation-ratings-i

[85] mpact-teacher-pay

[86] Office of Head Start. (2015). Use of CLASS in Head Start. http://eclkc.ohs.acf.hhs.gov/hslc/hs/sr/class

[87] Pakarinen, E., Lerkkanen, M. K., Poikkeus, A. M., Kiuru, N., Siekkinen, M., Rasku-Puttonen, H., & Nurmi, J. E.

[88] (2010). A validation of the classroom assessment scoring system in Finnish kindergartens. Early Education and development, 21(1), 95-124.

[89] Perlman, M., Falenchuk, O., Fletcher, B., McMullen, E., Beyene, J., & Shah, P. S. (2016). A systematic review and

[90] meta-analysis of a measure of staff/child interaction quality (the classroom assessment scoring system) in early childhood education and care settings and child outcomes. PloS One, 11(12), e0167660.

[91] Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes:

[92] Standardized observation can leverage capacity. Educational Researcher, 38(2), 109-119.

[93] Pianta, R.C., Hamre, B., & Mintz, S. (2012). Upper elementary and secondary CLASS technical manual.

[94] Praetorius, A. K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German

[95] framework of three basic dimensions. ZDM, 50, 407-426.

[96] Praetorius, A. K., Klieme, E., Kleickmann, T., Brunner, E., Lindmeier, A., Taut, S., & Charalambous, C. (2020).

[97] Towards developing a theory of generic teaching quality. Origin, current status, and necessary next steps regarding the Three Basic Dimensions Model (pp. 15-36).

[98] Richland, L. E. (2015). Linking gestures: Cross-cultural variation during instructional analogies. Cognition and

[99] Instruction, 33(4), 295-321.

[100] Santagata, R., Kersting, N., Givvin, K. B., & Stigler, J. W. (2010). Problem implementation as a lever for change:

[101] An experimental study of the effects of a professional development program on students’ mathematics

[102] learning. Journal of Research on Educational Effectiveness, 4(1), 1-24.

[103] Schoen, H. L., Cebulla, K. J., Finn, K. F., & Fi, C. (2003). Teacher variables that relate to student achievement when

[104] using a standards-based curriculum. Journal for Research in Mathematics Education, 34(3), 228-259.

[105] Schoenfeld, A. S. (1998). Toward a theory of teaching-in-context. Issues in Education, 4(1), 1-95.

[106] Smith, M. (2011). A procedural focus and a relationship focus to algebra: How US teachers and Japanese teachers treat systems of equations. In J. Cai & E. Knuth (Eds.), Early algebraization: A global dialogue from multiple perspectives (pp. 511-528). Springer Berlin Heidelberg.

[107] Smith, M., & Stein, M. K. (2018). 5 Practices for orchestrating productive mathematics discussion. National

[108] Council of Teachers of Mathematics.

[109] Star, J. R., Pollack, C., Durkin, K., Rittle-Johnson, B., Lynch, K., Newton, K., & Gogolen, C. (2015). Learning from

[110] comparison in algebra. Contemporary Educational Psychology, 40, 41-54.

[111] Stein, M. K., Engle, R. A., Smith, M. S., & Hughes, E. K. (2008). Orchestrating productive mathematical

[112] discussions: Helping teachers learn to better incorporate student thinking. Mathematical Thinking and Learning, 10, 313-340.

[113] Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An

[114] analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50-80.

[115] Stigler, J. W. (n.d.). Collecting the public use lessons. http://www.timssvideo.com/

[116] Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from the world's teachers for improving education

[117] in the classroom. New York: Simon & Schuster.

[118] Tekkumru-Kisa, M., Stein, M. K., & Doyle, W. (2020). Theory and research on tasks revisited: Task as a context for

[119] students’ thinking in the era of ambitious reforms in mathematics and science. Educational Researcher, 49(8), 606-617.

[120] Verlegh, P. W. (2007). Home country bias in product evaluation: the complementary roles of economic and socio-

[121] psychological motives. Journal of International Business Studies, 38, 361-373.

[122] Walsh, K., & Ross, E. (2019). NCTQ State of the states 2019: Teacher and principal evaluation

[123] policy. https://www.nctq.org/publications/State-of-the-States-2019:-

[124] Teacher-and-Principal-Evaluation-Policy

[125] White, M. C. (2018). Rater performance standards for classroom observation instruments. Educational Researcher,

[126] 47(8), 492-501.

[127] White, M., & Klette, K. (2024). Signal, error, or bias? exploring the uses of scores from observation systems.

[128] Educational Assessment, Evaluation and Accountability, 1-24.