
IOMW
Thursday, February 4, 12:30 pm-13:30pm (PST)
Poster Session A: https://berkeley.zoom.us/j/94853069761
Zoom host: perman@berkeley.edu; 310-848-8991
Poster Session B: https://berkeley.zoom.us/j/91308331934
Zoom host: shruti_bathia@berkeley.edu; 713-899-3715
Poster Session C: https://berkeley.zoom.us/j/95968339453
Zoom host: yukie.toyama@berkeley.edu; 415-533-4508
Poster Session A: Theory and Method
https://berkeley.zoom.us/j/94853069761
PSA.1. Federiakin, Denis
Rasch modelling of learning patterns: getting more from repeated measure
Walstad and Wagner (2016) described an elegant approach for disaggregation of value-
added test scores for assessing learning outcomes. Their idea is based on comparing
dichotomous item scores in pre- and post- test to capture one of four learning patterns: zero
learning (incorrect-incorrect), positive learning (incorrect-correct), negative learning (correct-
incorrect), and retained learning (correct-correct). However, their analysis was grounded in
Classical Test Theory, which (i) requires administration of the same item set in pre- and post-
test, and (ii) severely limits possibilities for skill-wise analysis instead of item-wise or test-wise
analysis. Moreover, identification of learning patterns requires comparability of items across pre-and post- test, which is not taken into account in literature to date. In this presentation, we
illustrate on simulated datasets the advantages of Item Response Theory (IRT) modelling of
learning patterns. Particularly, we use multidimensional Rasch model and Random Weights
LLTM (Rijmen & De Boeck, 2002) to model test-wise and skill-wise learning patterns
respectively. We discuss interpretation of the proposed modeling setup, as well as possibilities
and disadvantages of IRTree (De Boeck & Partchev, 2012) family in modelling learning
patterns. We also discuss how IRT-setup can help in identification of learning patterns across
different but equitable pre- and post- tests."
PSA.2. Bishop, Kyoungwon, Seo, Daeryong, & Gocer-Sahin, Sakine
Evaluation and redesign of multistage adaptive testing in an English language test
Evaluation of current language MST design for ACCESS to advance a new MST design will not only help understand WIDA’s test design but also offer practical ways to identify limitations existing in other MST designs. The current study will also help discover how MST can maximize the advantages of CAT.
PSA.3. Ernesto San Martin
On the Axiom of Local Independence
In this talk, I intend to discuss three topics related to the Axiom of Local Independence:
-
Rasch model is specified under a fixed-effect set-up. We show that the Axiom of Local Independence is not part of the model specification.
-
Local independence makes sense in Lord’s approach to IRT models. Nevertheless, the question is about its meaning: we show that local independence is an identification restriction leading to ensure that empirical Bayes representation of the latent variable is meaningful provided the axiom is written in a minimal form. This leads us to consider measurement from a geometric perspective (Greek tradition), not from an arithmetic perspective (Arabic tradition).
-
Finally, we explore what happens if local independence is not written in a minimal form and, therefore, we look for partial identification results.
PSA.4. Melin Jeanette, Fisher William, Pendrill Leslie
A Hierarchy of Construct Theories: Their Focus and Manifestations
Explanatory and predictive construct theories enable more fit-for-purpose, better targeted, and better administered measures. The significance of extending and adapting traditional metrological concepts and methods from the physical sciences towards social, psychological and health measurements is a growing topic of current focus in the scientific literature. Construct specification equations (CSEs) provide the highest level of construct theory in social, psychological and health measurements and resemble ‘recipes for certified reference materials’ for traceability in chemistry. In this work we elaborate on construct theories for both item attributes and person characteristics as means of developing qualitative, ordinal, and confirmatory theories hand in hand with a quantitative theory, en route to experimentally validated unit standards.
PSA.5. Niu, Chunling; Toland, Michael; Duber, David; Li, Nan
A Simulation Study: Comparing FDR Correction Methods in Using Rasch Trees Modeling for DIF Detection
Compared to other measurement invariance tests within the Rasch framework, Rasch trees (RT), based on model-based recursive partitioning, can detect DIF resulting from multiple covariates among the non-pre-specified groups. However, since the recursive partitioning steps involves multiple testing with multiple covariates, inflation of Type I errors needs to be controlled by adjusting the raw p-values. Presently, Bonferroni correction has been used as the default method; yet previous simulation studies show that it can be unnecessarily conservative. Thus this simulation study attempts to examine the comparative effects of using six FDR p-value correction methods on the performance (i.e., Type I error, Type II error, and power) of RT modeling in detecting DIF simultaneously with multiple covariates. Preliminary results show the Benjamini & Hochberg method demonstrates low Type I error rate, the lowest Type II error rate, and the highest power in RT DIF detection.
Poster Session B: Instrument Development
https://berkeley.zoom.us/j/91308331934
PSB.1. Bahry, Louise Marie
Using generalizability & Rasch Measurement Theory to Ensure Rigorous Measurement in an International Development Education Evaluation
Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) ) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of the Congo. These approaches allow the researcher to identify and select pertinent facets and look at them in relation to one another, allowing us to attribute smaller or larger sources of variability to a particular facet, and using both provides additional insight to instrument development and validation efforts.
PSB.2. McNeil, Rebecca
Measuring Teacher Competency in Error Analysis: Instrument Development
Preparing teachers for effective mathematics instruction is vital to producing high-quality teachers and improving student learning, and error analysis is a pedagogical process that proves instrumental to making such improvements (McGuire, 2003, Morris, et al., 2009). However, there is a lack of evidence that teachers are skilled in performing systematic error analysis, with pre-service teachers holding many of the same misconceptions that students hold, in addition to a tendency to focus on correcting factual errors, as opposed to conceptual or procedural knowledge (Riccomini, 2005; Ryan & McCrae, 2005; Woodward, Baxter, & Robinson, 1999). As there are few measures to date that focus on mathematics teachers’ competency in error analysis (TCEA), this study aims to develop an instrument for assessing this particular construct. The development of this instrument follows the four building blocks approach (Wilson, 2005) and is informed by multiple error analysis frameworks (Lannin, Townsend, and Barker, 2006; McGuire, 2003). This research is conducted with the intent of offering a valid and reliable instrument with uses both as a diagnostic assessment and as a learning tool for the training and professional development of pre-service and novice teachers; thus, empirical results of a pilot test of the instrument are also discussed.
PSB.3. Bradford, Allison
Development of a Measure for Self-Directed Reasoning from Evidence
This poster summarizes work in progress at developing an instrument to evaluate students’ ability to reason with evidence in a self-directed manner, which includes students’ ability to organize evidence, critically evaluate resources, and integrate ideas to develop a conclusion. This paper shares the initial construct conception and instrument development process. Further, results from a pilot study where 50 6th grade sciences students engaged with the instrument after an online inquiry science unit are included. An initial set of 30 items were tested, but 13 were removed from analysis for poor performance. The current instrument returned an EAP reliability of 0.851 and a WLE reliability of 0.846, indicating relatively good separation. Other evidence for the reliability and validity are discussed. The implications for construct revision, continued item development and instrument use are considered.
PSB.4. Dejanipont, Bunyong; Suksiri, Weeraphat; Wilson, Mark
Reliability and Validity Evidence of the Humor as an Acceptingly Helpful Attitude Measure (HAHA) Using Item Response Theory
Improved mental health outcomes, lowered stress levels, and positive social interactions are some of the positive effects of humor training. Particularly, perceptions of a potentially stressful situation or negative feelings can be changed and regulated by humorously framing the situation. Therefore, this humorous attitude is an important and useful mindset; however, many instruments that intended to measure humorous attitudes have not addressed their limitations, such as using subjective response options and overly simplifying definitions of humor.
We used the item responses theory to conceptualize Humor as an Acceptingly Helpful Attitude (HAHA) construct, along with its construct map, and create HAHA Measure to measure good-natured and humorous attitude that accepts imperfections and incompetence of self and others, while seeing some humorous aspect and feeling the need to alleviate a personal and interpersonal stressor from the imperfections or incompetence.
PSB.5. Donovan, Courtney, O'Brien, Shani, Forbes, Lisa, & Lamar, Margaret
Analysis of the Intensive Parenting Questionnaire Using Rasch Modeling
This report describes the analysis of the Intensive Parenting Attitudes Questionnaire (IPAQ) using Rasch modeling via the Winsteps software (Linacre, 2019). The IPAQ was developed in 2013 and subsequently modified and revalidated in 2017 using Classical Test Theory techniques (Liss, Schiffrin, Mackintosh, Miles-McLean, & Erchull, 2013; Loyal, Dallay & Rascle, 2017). The current sample includes 525 mothers responding from across the United States of America. Although originally demonstrating five factors (Liss et al., 2013), our data supported three factors: Rewarding Parenting (the fulfilling aspects of parenting and the importance of intellectual stimulation, education and play), Motherhood (mothers as better parents than fathers), and Challenges (the drawbacks of parenting). Three Rasch models are presented. We recommend a modified 4 point scale and note DIF on all but one item.
PSB.6. Junpeng, Putcharee; Marwiang, Metta; Chinjunthuk, Samruan; Suwannatrai, Prapawadee; Luanganggoon, Nuchwana; Chanayota, Kanokporn; Krotha, Jenrop; Tang, Keow Ngang; Wilson, Mark
Multidimensional Rasch Analysis for Validating a Measure of Mathematical Proficiency through Digital Technology
This study was aimed to validate a measure of mathematical proficiency (MP) in the Number and Algebra strand of 1,504 Thai seventh-grade students through digital technology. A construct modeling approach and design-based research method were adopted to create a tool which consists of four components, namely register system, input data, process system, and diagnostic feedback report. Researchers employed a multidimensional approach, an extension of the Rasch model to measure its quality. The MRCMLM was used to examine the internal structure based on the comparison of model fit to ensure that the MP measure in two dimensions is fit better than one dimension. A Wright map was used to support the validation tool. The low standard error of measurement and the acceptable values of infit and outfit means would determine whether digital technology has accuracy, consistency, and stability to measure in the multiple proficiencies.
Poster Session C: Analysis and Modeling
https://berkeley.zoom.us/j/95968339453
PSC.1. Sussman, Joshua, et al.
The Desired Results Developmental Profile (DRDP) Large Scale Assessment: A Case Study
This talk will explore issues related to the ongoing construction and implementation of the -DRDP assessment. The -DRDP is the current, 3rd generation, version. It is an observational assessment that infant/toddler, preschool, and kindergarten teachers use to assess about 300,000 children yearly in five U.S. states. This presentation describes the -DRDP as a large scale, state-supported assessment of early childhood development from a test developer’s (psychometrician’s) perspective. The assessment is assessment is described then three issues related to -DRDP implementation will be described: organizational issues, technical issues, and political issues. The discussion will reflect ongoing psychometric development and the use of the -DRDP results for both formative and summative assessment. Exploring certain achievements and lessons learned may offer general insights about test construction and use in educational settings.
PSC.2. Aramburo, Corrine, Renee Starowicz
A Systematic Review of Teachers’ Attributions Toward Students with Disabilities: Integrating the Social and Medical Model of Disability with Attribution Theory
The current review provides an overview of 15 published articles on special and general educators causal attributions for a student with a disability’s academic success or failure. The social and medical model of disability was seen as the underlying orientation for much of a teacher’s attribution. This paper then argues that an integration of attribution theory and the social and medical model would yield a more comprehensive understanding of how teachers’ perceive the academic success or failure of students with intellectual disabilities. The development of a construct map using teacher discourse excerpts from the literature review provided an exploration of a new conceptualization of the social-medical model of disability. The construct map integrates the social-medical model of disability, orienting it as an ordinal, unidimensional model that considers how teachers attribute success or failure to a student with a disability via four teaching values and practices: academic expectations for a child with a disability, responsibility, relationship, and skills or pedagogical practices. The implications of this model and future research are discussed.
PSC.3. Ge, Yuan; Wind, Stefanie
Exploring the Psychometric Properties of the Self-Efficacy for High School Students
In previous studies, researchers have focused on the development and interpretation of measurement tools related to self-efficacy. However, researchers have seldom investigated whether these instruments demonstrate acceptable psychometric properties, including similar item interpretations between subgroups of respondents. The purpose of this study is to explore the extent to which a self-efficacy measure has a consistent interpretation for two self-reported gender subgroups. The researcher utilized Rasch analysis to offer guidance to the design of self-efficacy related surveys and questionnaires. Results suggested gender difference was detected in certain self-efficacy items. Furthermore, suggestions are also provided for instruction and enhancing self-efficacy for future students.
PSC.4. Park, Sunhi
Explanatory Item Response Modeling of a Reading Comprehension Assessment
This research intended to examine the association of the textual and person factors with the item difficulty and different responses on a reading comprehension assessment, the Multiple-choice Online Causal Coherence Assessment (MOCCA). Under explanatory item response modeling, the linear logistic test model was applied to examining the effects of textual factors on item difficulty of MOCCA, finding the text length, word knowledge, background knowledge, and goal identification were significant predictors of item difficulty. The latent regression analysis was applied to examining the effects of person characteristics, finding socioeconomic status, special education status, and EL status having significant relationship with the responses of MOCCA. DIF analysis was conducted to detect if there were any items functioning differently between ELs and non-ELs on the responses of MOCCA. Twelve items were flagged as showing DIF, and explored by means of the textual features used as predictors in the linear logistic test modeling."
PSC.5. Shannon, Nathaniel; Shi, Qingzhou; Ntoh Yuh, Honorine
An Investigation of the Combined Effects of Parent Involvement, Language Use and SES in Predicting HS Graduation for ELL Students
In the United States, students identified as English Language Learners (ELLs) are increasingly falling behind in their likelihood to graduate from high school (HS). These students, typically non-native English speakers, are placed in remedial classes that are generally focused on improving language skills rather than learning standard grade-level material. This results in decreased learning objectives and graduation rates for ELLs. While there is ample work investigating the role of socioeconomic status (SES) in predicting HS graduation among ELLs, there is less research investigating the role of parent involvement and students’ use of language at home and in school. Using binary logistic regression, we will investigate the predictive power of parental involvement, language use at home (English/non-English), the type of school (public/private), and SES in predicting HS graduation among ELLs. The results of this study have implications for determining how to most efficiently allocate resources to the ELL population.
PSC.6. Thompson, James
Network Analysis of Didactic Examination Outcomes: Interrelationships of Accuracy, Response Time, Pace, and Fluency (Speed-Accuracy Tradeoff)
With the advent of computerized testing, ordinary didactic exams can capture both answer accuracy and answer response time. From this raw data, person ability, person speed, question difficulty, question time intensity, pace, fluency (speed-accuracy tradeoff), person skill for fluency, and question load for fluency can be derived. It would be useful to have a comprehensive view of the interrelationships of these observables. This proposal suggests that pairwise partial correlations between the observables can be considered to form a network. This network was predictive of the constituent variables at both the population and person levels. Interestingly, conventional person variables were not required for these predictions. Question difficulty was important to both global network strength and global structure impact. Question time intensity was also important in global strength while fluency was a major contributor to structure impact.