Behavioral Formative Assessment: Direct Behavior Rating Item Construction

Published By

Meagan Medley, PhD – Nicholls State University

Kristin Johnson, PhD & Ayesha Kurshid, PhD The Institute for Evidence-Based Reform (TIER)

Gary L. Cates, PhD – Illinois State University

Lit Review

Direct Behavior Rating (DBR): An umbrella term to include multiple tools that use similar design and procedures (Chafouleas, 2011). Over the years, several themes have emerged to connect these tools and define each as part of the DBR category (e.g., Chafouleas, Riley-Tillman, & McDougal, 2002):

1. DBR includes at least 1+ behaviors rated by someone with frequent target student interaction in a school setting. Behaviors can be defined either narrowly or defined more globally as clusters.

2. The DBR observer (e.g., teacher) immediately rates one or more times during an instructional day. Rating periods vary from 10 minute to half/whole instructional days (e.g., Chafouleas, Sanetti, Kilgus, & Maggin, 2012).

3. The DBR data are communicated across stakeholders either within classrooms or other settings (.e.g., playground, bus) or between school and home.

DBR Research & Use: Studies on the DBR have spanned nearly 50 years with well over 70 research articles examining the efficacy of the DBR as an intervention. The DBRC has been widely used across age ranges (preschool to secondary), multiple settings (e.g., in patient, home, and school), situations (Chafouleas, Riley-Tillman, Sassu, LaFrance, & Patwa, 2007), and behaviors (Vannest, Davis, Davis, Mason, & Burke, 2010). Only recently have researchers started to examine the DBR as an assessment tool for formative evaluation.

The DBR merges aspects of the systematic direct observation and likert scale (3-10 points) by allowing for frequent observed rating scale assessment of behavior with intervals. Like ratings scales, DBRs use summative evaluation instead of an actual count.

Psychometric Properties: Two studies arrived at their items (behaviors) and measurement system differently. The sample sizes in the above mentioned studies were extremely small and procedures were vastly different in each study.

1. Item wording influenced the rating accuracy of DBR data for some, but not all, behavior targets. Those findings suggested rating accuracy across conditions for behaviors including compliance and disruption, but improved accuracy for ratings of academically engaged when worded positively versus negatively (Riley-Tillman et al., 2009). Limitations: sample of raters were undergraduate students

2. Negative wording for disruption and positive wording for academic performances were rated accurately using a 6 point scale based on Chafouleas and colleagues (2009) when measuring a DBR multiple items scale (DBR-MIS). (Volpe & Briesch, 2012). Limitations: sample of raters were 9 graduate students

Summary & Purpose: DBR is a flexible method of assessment that might take a variety of forms, including that of single-item scales (e.g., Chafouleas, Christ, RileyTillman, Briesch, & Chanese, 2007) and multiple-item scales (e.g., Fabiano, Vujnovic, Naylor, Pariseau, & Robins, 2009). All of the studies have promising results for the DBR being an adequate, reliable, and defensible measure. However, no studies to date have examined the different methodologies employed on a larger, more diverse sample size as well as comparing the different methodologies against one another.

Current Study: Examined item construction of a single and multiple item scale using negative versus positive wording for academic performance. School psychologists and teachers used 3 dimensions (criterion relatedness, treatment validity, and observability; Volpe & Briesch, 2012).

Research Questions:

1. What behaviors do teachers and school psychologists perceive as socially important, observable, and measurable of active engagement?

2. Do School Psychologists and teachers rate positively and negatively worded items the same on active engagement?

3. Do teachers at the elementary and secondary level rate items the same?



Demographics are presented in table form. Participants were collected via online solicitation using Survey Monkey Southern states were primarily represented. ~75% from AL, MS, TN, LA, TX, SC, VA, KY, & FL. All other participants reside in 14 nonsouthern states.

Dependent Variables

Ratings on the 5-point Likert (strongly disagree (1) – strongly agree (5)) for measurability, observability and social importance on items related to active engagement. See table.

Independent Variables

Item Wording: Positive vs Negative
School Role: Teachers vs School Psychologists

Paired Sample T-test between wordings
Repeated Measures ANOVA between school role



Directionality was significant.
However, teachers and school psychologists did not differ significantly in their rating.
Overall, positively worded items were rated higher than negatively rated items. Both psychologists and teachers’ ratings were similar.

No significant findings.
Social Importance

There was no difference between the ratings of positive and negatively worded items. School psychologists overall ratings were higher than teachers (regardless of directionality). Directionality X position interaction was significant.
This study expands the literature base to include practitioners rather than graduate and undergraduate participants.
Limitations and Future Research

Limitations within the participants include:

lack of urban sample,
minimal minority group in the sample and
limited previous use of DBRC overall by the sample.