A summary of the book “Measurement
and Assessment in Teaching” by Robert L. Linn & M.David Miller,
Questions and answers
Chapter 1: Educational Testing and Assessment:
Context, Issues, and Trends
Chapter 2: The Role of Measurement and Assessment
in Teaching
1.
|
Define
"assessment." Name and define its two main components. What are the
main questions that assessment seeks to answer?
|
|||
|
Assessment
is the general (and overlying) term to describe the procedures by which
educators gain knowledge, insight, and information about student learning.
There are two major components of assessment; test and measurement.
A test is a type of assessment that contains questions administered over a
fixed time period under controlled and objective conditions for all students
tested. Measurement is the assigning of numbers or score to a test in using a
specific set of rules for scoring. Assessment wishes to answer the question;
"How well does the individual perform?" A test seeks to answer the
question "How well does the individual perform—either in comparison with
others or in comparison with a domain of performance tasks?" Finally,
measurement seeks to answer the question; "How much did the student
score on the test?"
|
|||
|
||||
2.
|
List
and discuss the five general principles of good assessment. What type of
error may occur when each of the principles are violated?
|
|||
|
There
are five general principles that govern good assessment. They are:
Clearly
specifying what is to be assessed should be done before any testing or
measurement takes place. This indicates that the assessment taking place is
meaningful and has a clear purpose. To violate this principle runs the risk
of assessing haphazardly or rendering any use of the assessment meaningless.
One must make sure that the assessment procedures chosen are relevant to what
is to be measured. If this principle is violated information gathered may
turn out to be useless. No one test or type of test can adequately answer all
the types of learning required in school. To violate this principle is to
perhaps overgeneralize the results of an assessment or measure too narrow a
sample of learning behavior. The proper procedures of assessment instruments
should be followed. This means that the assessments should be administered
and scored objectively and fairly for all students. It also means that the
educator must realize that all tests contain error. To violate test
procedures or not keep in mind testing error may lead to overgeneralization,
misinterpretation, or misuse of assessment results. Assessment must lead to a
stated goal or objective. It must be given for a purpose. To violate this
principle is simply to waste student and educator time and misuse funds.
|
|||
|
||||
3.
|
How
should assessment be used to meet sets of stated educational achievement
goals? Name and define the five steps in this process.
|
|||
|
The
main purpose of classroom instruction is to help students achieve a set of
intended learning goals. The first step in this process is to identify
exactly what those learning goals are going to be for the student. The next
step in the process is to preassess the student's level of current
competency. The purpose of this step is to guarantee that the student
possesses the prerequisite skills to benefit from instruction. The third step
is for the educator to provide relevant instruction. During this stage,
assessment should be ongoing to insure that the learner is mastering content
and to diagnosis any learning difficulties. The fourth step is to assess
learning outcomes. In this phase, assessment seeks to confirm that the child
has mastered the content and has reached the learning goal. The fifth stage
in the process is to use the assessment results in some relevant way. It
often involves feedback to educators, administrators, and the students
themselves of their progress and helps to set future learning goals.
|
|||
|
||||
4.
|
Identify
and define maximum performance, typical performance, fixed-choice tests and
complex-performance assessments. What is an appropriate use or educational
example for each type of assessment?
|
|||
|
Maximum
performance assessment occurs when the individual knows that he/she is going
to be assessed, has time to prepare for the assessment, and is motivated to
do well. An example of maximum performance assessment is a midterm
examination that counts toward one's final grade. Typical performance
assessment measures the types of behavior that an individual demonstrates
daily when the stakes to do well are not high and they do not have time to
prepare. An example of typical performance might be an observation of a
child's paying attention in class during a normal school day. Fixed-choice
tests refer to where the individual must choose his/her answers from those
provided. For example, multiple-choice and true-false tests are examples of
fixed format tests. Finally, complex-performance tests are those requiring
extended answers that are produced by the student rather than from choosing
from a fixed set of alternatives. Examples of complex-format assessment might
be essay assessment and completing science laboratory assignments.
|
|||
|
||||
5.
|
Identify
and define the four types of assessment outlined in the chapter. What would
be an educationally appropriate use of each type?
|
|||
|
The
four types of assessment are: placement, formative, diagnostic, and
summative. Placement assessment is used to determine student performance at
the beginning of instruction. An example of placement assessment would be
measuring a child's current reading skills before placing him/her in a
reading group or with a certain level of reading text. Formative assessment
is ongoing, day-to-day assessment of a child's educational progress. For
example, it might be used to make sure that the child understands a given
unit of math instruction before progressing on to further instruction.
Diagnostic assessment is related to formative instruction and is designed to
identify any difficulties that the child is having learning in day-to-day
instruction. An example would be to identify that a child has trouble in
two-digit addition before going on to three-digit addition. Summative
assessment is designed to occur at the end of a unit of instruction. Its
purpose is to assess if learning goals have been met. Examples of summative
instruction would be a unit test or a final exam.
|
|||
Chapter 3: Instructional Goals and Objectives:
Foundation for Assessment
1.
|
What
does it mean to say that an instructional objective should be expressed
behaviorally? Give an example of a behaviorally and non-behaviorally
expressed instructional objective. Contrast the terms "product" and
"process" and state which one is preferable in an instructional
objective.
|
|||
|
An
instructional objective is expressed behaviorally when it contains an action
(or verb) that is directly observable and measurable. Two people viewing the
learner should agree that the target behavior has taken place. An example of
a behaviorally stated objective would be: "The student will list the
four largest cities in Ohio." An example of a non-behaviorally stated
objective would be: "The student will know his spelling words."
"Product" refers to what the student is going to do (stated in
behavioral terms) in order to indicate mastery of the objective.
"Process" indicates the way that the child is going to reach the
stated behavior goal. Instructional objectives are usually concerned with products.
|
|||
|
||||
2.
|
Define
the three different domains in the taxonomy of educational objectives. Give
an example of each of the three domains.
|
|||
|
The
three domains in the taxonomy of educational objectives are: cognitive,
affective, and psychomotor. The cognitive domain deals with the kinds of
knowledge and skills that one might learn in school and are related to facts,
knowledge, and its applications and uses. The affective taxonomy is concerned
with interests, attitudes, and values. The psychomotor domain deals with
physical movements and skills. An example of a cognitive skill would be
completing a set of arithmetic problems. An example of an affective skill
would be liking and appreciating classical music. An example of a psychomotor
skill would be hitting a baseball.
|
|||
|
||||
3.
|
What
are the six levels that make up the cognitive domain? Give an example of
each.
|
|||
|
The
six levels of the cognitive domain are: knowledge, understanding,
application, analysis, synthesis, and evaluation. Examples of each are:
Knowledge: The student will know/recite the letters of the alphabet. Understand: The student will explain in his own words the main causes of the Civil War. Application: The child will correctly solve math problems. Analysis: The student will outline the major themes of Hamlet. Synthesis: The student will write an essay. Evaluation: The student will critique a work of art. |
|||
|
||||
4.
|
What
occurs when "unanticipated learning outcomes" occur? Give an
example of such an outcome. What type or form do these outcomes take? What
should the teacher do when such outcomes occur?
|
|||
|
Unanticipated
learning outcomes occur when student learning or behavior take place which
were not included in the learning goals or the objectives. An example of such
an outcome might be when a concept is presented in algebra and a student
unexpectedly relates the concept to a particular problem she had the night before
while trying to figure out a recipe. Unexpected learning outcomes can be
positive (as described above) or negative as when a child is rude to another
child in class. Teachers should view unexpected learning outcomes as an
opportunity to enrich material or teach new and needed concepts to students.
|
|||
|
||||
5.
|
Distinguish
"general instructional objectives" from "specific learning
outcomes." Give examples of acceptable verbs for each. Describe
when and how each are used.
|
|||
|
General
instructional objectives should be specific enough to provide direction for
instruction but not so specific that instruction is reduced to training.
Specific learning outcomes refer to the specific way that students will
demonstrate that they have achieved the general objective. Verbs acceptable
for general instructional objectives are "know,"
"understand," and "interpret." Examples of acceptable
verbs for specific learning outcomes are "list,"
"identify," and "solve." In creating programming for
students, general instructional objectives are stated first and then specific
learning outcomes are created to detail how the general objectives will be
reached.
|
|||
Chapter 4: Validity
1.
|
What
is validity? What is reliability? What is usability? What is the relationship
between validity and reliability?
|
|||
|
Validity
is the adequacy and appropriateness of the interpretations and uses of
assessment results. In common terms, a test is valid if it adequately
measures what it purports to measure. Reliability refers to consistency. The
idea is that a person taking the same test twice without intervening
variables (e.g., time to study) should score about the same on both of the
test administrations. Usability refers to the practicality of the test or
test procedures. In order for a test to be valid it must be reliable.
However, not all reliable tests are, by definition, valid.
|
|||
|
||||
2.
|
What
are the four types of validity considerations? Define each.
|
|||
|
The
four types of validity considerations are: content, construct,
assessment-criterion relationship, and consequences. Content validity refers
to the idea that a test adequately samples or measures a representative
sample of the content presented. Construct validity measures a hypothetical
attribute that we believe exists and which is inferred from behavior. The
assessment-criterion relationship refers to how well a test predicts future
behavior or adequately describes a group of people presently demonstrating
those behaviors. Consequences refer to the ways and purposes in which test
information is used.
|
|||
|
||||
3.
|
Give
an example of the four types of validity considerations. What techniques
might be used to adequately measure that a test possesses a high degree of
validity in each of the four considerations?
|
|||
|
A
test has content validity if the test adequately samples the content
presented. For example, a test might lack content validity if it only asked
questions from three of five book chapters assigned. Content validity is
often assessed by using a table of specifications that contains the content
presented over the six levels of the cognitive taxonomic objectives. An
example of construct validity would be a test of personality, self-esteem, or
reading comprehension. While these constructs cannot be directly seen they
are inferred by the behaviors that persons holding high levels of those
constructs may show. Assessment-criterion considerations measure either the
predictive or concurrent levels of validity. In predictive validity, the
attempt is made to see how a test given in the present can predict future
behavior. A test with concurrent validity is a test designed to measure the
extent that a group of persons (e.g., musicians) score highly on a present
test designed to measure that attribute (a test of musical ability).
Consequence considerations are most concerned with the useful, ethical, and
appropriate use of assessment results. A test should be used for inclusive
rather than exclusive purposes.
|
|||
|
||||
4.
|
What
is correlation? What does correlation measure? Give an example of
correlation. What is the relationship between correlation and
causation?
|
|||
|
Correlation
is the degree of relationship between two events or phenomena. That is, it is
the measure of how well one variable is in predicting another variable. An
example of correlation would be the relationship between grade level and math
achievement--that as students advance in grade levels, the math complexity
presented to them also increases. Correlation does not infer causation.
Causation is the 100% cause of one variable on another. Heat causes atoms to
move faster. Correlation is usually never 100%. An increase in grade usually
increases the complexity of math curriculum but that need not be always the
case.
|
|||
|
||||
5.
|
What
are some of the factors affecting validity? How or why are they important?
Give an example of each.
|
|||
|
A
number of factors affect validity. These include factors of the test itself,
factors of the task or teaching procedure, factors of administration and
scoring, factors of student responses, and the nature of the group and the
criterion. Examples of factors of the test itself are unclear directions or
words or vocabulary which are too complex for test takers. An example of the
teaching procedure is the teacher teaching in such a way as to actually
interfere with good achievement on the test. An example of factors of
administration or scoring is a teacher invalidating a standardized test by
his giving his own students extra time to complete the exam. Factors of
student responses include factors such as illness or high anxiety that would
not other times interfere with the child's achieving a better score on the
test. Finally, an example of factors of the group and criterion is a test
being given to a group of individuals for whom the test was not designed and
then misinterpreting these students' achievement based these factors.
Chapter 5: Reliability and Other Desired
Characteristics
|
|||
1.
|
What
is reliability? What is its nature and major properties?
|
|||
|
Reliability
refers to the consistency of measurement, that is, how consistent test scores
or other assessment results are from one measurement to another. Reliability
is the correlation between the same people on two similar assessments. As
correlation, it is measured by the Pearson Product Moment statistic (r).
Reliability can range from 0.00 to 1.00.
|
|||
|
||||
2.
|
What
are the six major types of reliability? How are they measured or assessed?
|
|||
|
The
six major types of reliability are: test-retest, equivalent forms,
test-retest with equivalent forms, split half, coefficient alpha, and
interrater. Test-retest reliability is assessed by giving the same person the
same test twice with as short of a time interval between the two test
administrations as possible. Equivalent forms reliability refers to the idea
that when a test producer creates two forms of a test (e.g., Form A and Form
B) the two forms should measure the same material in the same way but with
different questions. If this is the case then the reliability between the two
forms will be high. Test-retest with equivalent forms occurs when the same
person is given a test twice (as in test-retest) but instead of administering
the exact same test, equivalent forms of the test are administered.
Split-half reliability does not require a test being administered twice.
Rather, the single test is split in half with odd numbered items being
compared against even numbered items (as if two tests existed). Coefficient
alpha is a special case of split reliability where the test is not divided in
two but is looked at by use of the KR-20 formula. Finally, interrater
reliability is used in behavioral or performance assessments and refers to
the concept that agreement should occur between raters as to whether and to
what extent a given behavior did or did not occur.
|
|||
|
||||
3.
|
What
is error as it applies to reliability? How is error assessed? What are
confidence bands and what relationship do they have with error.
|
|||
|
Error
as it applies to reliability acknowledges that no test is 100% reliable
(e.g., r never equals 1.00). To the extent that perfect reliability is never
reached, there is error in the test or test situation. Error is measured by
the Standard Error of Measurement statistic. The Standard Error of
Measurement statistic allows a confidence band or interval to be placed
around a single test score. The lower to upper limits of that test score
represent the range of the person's "true" score taking error into
account.
|
|||
|
||||
4.
|
What
are four factors that influence reliability measures? How do they provide
such influences?
|
|||
|
The
four factors influencing reliability measures are: number of assessment
tasks, spread of scores, objectivity, and methods of estimating reliability.
In most cases, as the number of assessments are increased, the reliability of
the overall assessment also increases. Regarding spread of scores, as the
range or spread of scores in a distribution increases, the reliability of the
assessment also increases. Reliability increases when the assessment becomes
more objective. That is, as independent judges can more readily agree that
behaviors have taken place and to what extent they have occurred increases,
the reliability of the assessment also increases. Finally, the method used to
assess reliability can affect the reliability of the assessment. Types of reliability
assessment differ in their degree of liberalness and the conservatism of the
r statistic.
|
|||
|
||||
5.
|
What
is the concept of "acceptable reliability"? Under what
circumstances must the reliability be relatively high for assessment
decisions to take place with confidence?
|
|||
|
Reliability
can vary from a low of r=0.00 (no reliability) to r=1.00) perfect
reliability). Virtually no assessment is 100% reliable and thus contains
error. The degree of acceptable error depends on the use and decision-making
process to be made from the test results. Reliability should be relatively
high when decisions to be made are important, when the decision to be made is
final, when the decision is irreversible, when it is unconfirmable, when the
decision concerns individuals, and when the decision has lasting
consequences.
|
|||
Chapter 6: Planning Classroom Tests and
Assessments
1.
|
What
are the three purposes or types of classroom assessment and testing? When and
how should each type be used by teachers?
|
|
|
The
three purposes and/or types of classroom assessment and testing are:
pretesting, testing and assessment during instruction, and post-testing.
Pretesting (as its name implies) occurs before instruction has begun. The
purpose of pretesting is to determine whether students have the prerequisite
skills needed for the instruction (to determine readiness). Pretesting is
also used to assess to what extent students have already achieved the
objectives of the planned instruction (to determine student placement or
modification of instruction). A second purpose of assessment is testing
during instruction. This type of testing is used to monitor the progress of
students and to see if there are areas in which material needs to be
explained to aid learning. Additional purposes of assessment during instruction
also include encouraging students to study, and providing feedback to
students and teachers. Finally, post-testing is used to assess whether the
learning goals for the student have been achieved.
|
|
|
||
2.
|
What
are the major steps involved in building a table of specifications for a unit
of instructions? What is involved in each step?
|
|
|
There
are three main steps in bulding a table of specifications for a unit of
instruction. These are (a) preparing a list of instructional objectives, (b)
outlining the course content, and (c) preparing the two-way chart. In
preparing a list of instructional objectives, separate lists of general
objectives and specific learning outcomes are listed. In the second step, the
content to be covered is broken down (outlined) into major topic areas with
each topic area further parsed into subtopics. Finally, in preparing a
two-way chart the topics and subtopics are listed down the y-axis while the
objectives are carried across the x-axis. In this two-way table, it is
important that the objectives cover the span of the taxonomy of educational
objectives ranging from knowledge through evaluation.
|
|
|
||
3.
|
What
are the three main types of test items? How are they defined? Describe the
categories of test items that might fall in each type.
|
|
|
||
4.
|
What
are four main concerns that a teacher should keep in mind when creating a
test for students? What might be the result if each of these concerns is not
addressed?
|
|
|
The
four main concerns that should be kept in mind when creating a teacher made
test are: matching items and tasks to intended outcomes, obtaining a
representative sample of items and tasks, eliminating irrelevant barriers to
the performance, and avoiding unintended clues in objective test items.
Violation of these four considerations would lead to their own special types
of error and such errors would make proper interpretation of the test results
problematic.
|
Chapter 7: Constructing Objective Test Items:
Simple Forms
1.
|
What
are the three types of objective test items discussed in Chapter Seven? How
are they defined? How are they "simple form"?
|
|
|
The
three types of test items discussed in Chapter Seven are short answer,
true-false, and matching. Short answer is defined as requiring the test taker
to supply the answer by a word, phrase, number, or symbol. True-false is
defined by having the test taker choose from two alternatives as to the
veracity of a declarative statement. Finally, matching items require the test
taker to link the relationship of two concepts choosing the one from one
column that "goes with" or agrees with the corresponding variable
chosen from a parallel column. All three item types are considered
"simple form" because they require a minimum (as opposed to an
extended) response from the test taker and because they measure basic
knowledge level and/or factual skills.
|
|
|
||
2.
|
What
are the basic uses of short-answer questions? How do they differ from
true-false and matching?
|
|
|
Short-answer
items are suitable for measuring a wide variety of relatively simple learning
outcomes. These include assessing whether the student knows the definition of
given terminology in a topic of instruction, and/or the ability to solve
numerical or scientific problems. They differ from true-false and matching
exercises in that short answer is a supply item while the other two are
selection items.
|
|
|
||
3.
|
What
are the basic advantages and disadvantages of true-false items? How might the
disadvantages be avoided?
|
|
|
The
major advantages to true-false questions are that they are time efficient.
The student can answer more true-false questions than any other type of item.
By including more items in a test, the teacher can increase certain types of
content validity as well as increasing reliability (see Chapters Five and
Six). Disadvantages are that they are harder to construct than they first
appear, that they cannot adequately measure higher level learning outcomes
such as analysis or synthesis, and they have a high susceptibility to
guessing. Some of these disadvantages can be reduced during the construction
of the items by not assessing trivial content and avoiding giving clues that
will increase guessing and increase error.
|
|
|
||
4.
|
When
are matching exercises best used? What are the recommendations given in the
chapter as they refer to the number of items in each column and the number of
times an alternative might be chosen? Why are these recommendations
made?
|
|
|
Matching
items are best used in situations in which the teacher wants to see if the
student understands the relationship between two variables, concepts or
events. It is recommended that the two lists not be of the same length and
that the instructions state that any alternative may be used once, more than
once, or not at all. This is to prevent any wrong answer counting as two
since if a student believes that an alternative can be used only once, a
wrong alternative for one item would not available for use for its correct
partner in a corresponding match.
|
Chapter 8: Constructing Objective Test Items:
Multiple-Choice Forms
1.
|
What
are the components of multiple-choice questions? How might multiple-choice
questions be posed?
|
|
|
A
multiple-choice item consists of a problem and a list of suggested solutions.
The problem is called the stem of the item. The list of suggested solutions
are called alternatives. The correct alternative in each item is called the
answer, and the remaining alternatives are called distracters (also called
decoys or foils). Multiple-choice questions may be posed as a direct question
or as an incomplete statement.
|
|
|
||
2.
|
What
are the basic uses of multiple-choice items? Give a brief description of each
use.
|
|
|
The
most common use of multiple-choice items is to measure verbal information.
These items are good for measuring knowledge outcomes and measuring outcomes
at higher taxonomic levels. Included in measuring knowledge outcomes are:
knowledge of terminology, knowledge of specific facts, knowledge of
principles, and knowledge of methods and procedures. Measuring outcomes at
higher taxonomic levels include identification, application of facts and
principles, identification of application of facts and principles, and the
ability to justify methods and procedures.
|
|
|
||
3.
|
Identify
the major advantages and limitations of multiple-choice items. Be as specific
as possible.
|
|
|
Multiple-choice
items work best when they are assessing achievement information. Likewise,
their use should be restricted to verbal materials. One advantage of
multiple-choice questions is their flexibility. Within the parameters of
verbal and achievement material they can be used to test virtually any
subject matter. They are usually less vague than short-answer questions and
are more objective in their scoring. Compared to true-false items, the
student encountering a multiple-choice item must not only identify a wrong
statement, but must know what the correct alternative is. Also compared to
true-false questions, multiple-choice items are more resistant to guessing
and more resistant to response set. The major limitation to multiple-choice
questions is that they are of the selection format and thus less "real
life" than supply item questions.
|
|
|
||
4.
|
What
are some of the ways that multiple-choice questions can be written to
strengthen them, reduce their liabilities and create fairer and more
objective questions? Be as specific as possible.
|
|
|
A
major consideration in the construction of multiple-choice questions is the
stem. A stem should be able to stand alone without the alternatives and pose
a clear question or problem. As much as possible, the multiple-choice stem
should be free of irrelevant material not needed in the stem. Such irrelevant
material may confuse the reader and may cause problems associated with
short-term memory for what is happening in the stem. When possible, the stem
should not be stated in the negative. Certainly double negatives should not
be used. In order not to give clues, all alternatives should agree
grammatically with the item stem. Each item should have one and only one
correct answer. Items with two correct answers should be discarded. All
alternatives including the distracters should be plausible. Implausible or
silly foils give clues to the correct answer by reducing plausible
alternatives and encourage guessing.
|
Chapter 9: Measuring Complex Achievement: The Interpretive
Exercise
1.
|
What
are interpretive exercises? What are the components of interpretive
exercises? How are interpretive exercises posed?
|
|
|
Interpretive
exercises are intended to measure those learning outcomes based on the higher
mental processes, such as understanding, thinking skills, and various
problem-solving abilities. An interpretive exercise consists of a series of
objective items based on a common set of stimuli. The stimuli may be in the
form of written materials, tables, charts, graphs, maps, or pictures. The
series of related test items may take various forms but are most commonly
multiple-choice or true-false items with multiple-choice items the most
widely used.
|
|
|
||
2.
|
What
are the basic uses of interpretive exercises? (List at least three.) What
types and levels of educational outcomes do they assess best?
|
|
|
Interpretive
exercises are used to recognize inferences, recognize warranted and
unwarranted generalizations, recognize assumptions, recognize the relevance
of information, apply principles, and use pictorial materials. Interpretive
exercises are usually used to assess higher order levels of learning that
include understanding and application.
|
|
|
||
3.
|
Identify
the major advantages and limitations of interpretive exercises. Be as
specific as possible.
|
|
|
There
are multiple advantages to using interpretive exercises. One is that the
stimulus materials used makes it possible to measure the ability to interpret
written materials, charts, graphs, maps, pictures, and other media
encountered in everyday situations. Another is that the interpretive exercise
makes it possible to measure more complex learning outcomes than can be
measured with the single objective item. Thirdly, by having a series of
related test items based on a common set of data, greater depth and breadth
can be obtained in the measurement of achievement skills. The interpretive
exercise minimizes the influence of irrelevant factual information on the
measurement of complex learning outcomes. Finally, students may be unable to
demonstrate their understanding of a principle simply because they do not
know some of the facts concerning the situation to which they are to be
applied. The interpretive exercise remedies this.
There
are also a number of limitations. It is difficult to construct sound
exercises. A second limitation is that when introductory material is in
written form, there is a heavy demand on reading skills. Finally, because the
interpretive exercises usually use selection items, they are confined to
learning outcomes at the recognition level.
|
|
|
||
4.
|
What
are some of the ways that interpretive exercises can be written to strengthen
them, reduce their liabilities, and create fairer and more objective
questions? Be as specific as possible.
|
|
|
It
is essential that interpretive exercises be as strong and valid as possible.
This means that the set of stimuli must be appropriate and that the objective
questions that are drawn from the stimuli also be strong. In relation to the
set of stimuli, it is important that the material is relevant to the
objectives of the course. The set of stimuli should also be at an appropriate
reading level for the students taking the test. The stimulus material should
be new to students, not something they have encountered before. It should be
brief but meaningful. It should be clear and concise. Regarding the questions
themselves, they should reflect the stimulus materials. They should also
conform to all of the rules for constructing sound objective items such as
multiple choice.
|
|
Chapter 10: Measuring Complex Achievement: Essay
Question
1.
|
What
is an essay? What is the distinctive feature of the essay question? What
types of learning are essay questions best able to assess?
|
|
|
An
essay is an extended response, supply type assessment. In it the student
replies with connected prose to a question or series of questions. The
distinctive feature of an essay question is its extended response form.
Students are free to construct, relate, and present ideas in their own words.
Essay questions are best used to measure higher order learning objectives
such as analysis, synthesis, and evaluation.
|
|
|
||
2.
|
What
are the two types of essay question format? How do they differ? Which type is
usually more preferable over the other? Why?
|
|
|
The
two types of essay question formats are restricted response and extended
response. Restricted response asks specific questions with specific
instructions on how to answer the question and requires an answer that
conforms to those instructions. The extended-response essay question is more open-ended
and gives the student more latitude in answering the item.
Restricted-response essays are usually preferable since they are more
objective to score and lend themselves to greater reliability.
|
|
|
||
3.
|
What
are the major advantages and limitations to essay questions? Be as specific
as possible.
|
|
|
Among
the advantages of essay questions is the fact that the essay allows for the
measurement of complex learning outcomes that cannot be measured by other
means. A second advantage of the essay is its emphasis on the integration and
application of thinking and problem-solving skills. Finally, the potentially
most important advantage of the essay question is its contribution to student
learning.
Disadvantages
of the essay question are that good essay questions can be difficult to
construct and they can also be difficult to score. Regarding scoring,
interrater reliability is often a problem. They are also time consuming to
score. Finally, since only a few essays can be asked on an exam, material may
not be adequately sampled and the test can lack content validity.
|
|
|
||
4.
|
How
might the interrater reliability of essay scoring be improved? What are the
two ways of scoring an essay? All things being equal, which type of scoring
should be employed and why?
|
|
|
Interrater
reliability in scoring essays can be improved by using scoring rubrics or
plans to which scores must adhere to receive good grades. The teacher may
also wish to write out a well-answered essay and use it as his/her rubric.
Scoring of essays may be holistic or analytic. Holistic scoring requires
reading the entire essay and giving it an overall grade. Analytic scoring
involves reading sections of the essay to see how each section conforms in
answering the essay question and then giving the essay a grade based on the
essay section scores. All things being equal, analytic scoring leads to more
objectivity and more reliable scoring and gives students feedback on what
sections of the essay they did well and poorly on.
|
|
|
||
5.
|
Is
it a sound educational procedure to give students the option of which essays
to answer? Is it a sound educational procedure to grade handwriting and
spelling as part of the essay grade? Defend your positions.
|
|
|
It
is not a sound educational procedure to allow students to have options as to
what essay questions to answer. If such options are given, students are
answering different questions (actually taking different tests), and so the
common basis for evaluating their achievement is lost. It is probably not
advisable to score handwriting and spelling unless the essay test covers
handwriting and spelling. If handwriting and spelling are counted on essays
in other curricular areas, it is possible that a child could answer the essay
adequately and show achievement in learning goals but nevertheless score
poorly on the essay because of irrelevant scoring characteristics.
|
Chapter 11: Measuring Complex Achievement:
Performance-Based Assessments
1.
|
Besides
essay tests, what are other types of performance assessments? Give some
examples of performance assessment. How do these assessments differ from
essays?
|
|
|
Essay
tests are the most common example of a performance-based assessment. However,
there are others, including artistic productions, experiments in science,
oral presentations, and the use of mathematics to solve real-world problems.
Examples might include creating an art or music product or for vocational or
industrial education courses, such as auto repair, woodworking, or word
processing. Performance assessment is also useful for mathematics, science,
social studies, and foreign languages. While essay tests are based on written
responses, the above examples require the student to "do" something
or engage in some specific behaviors.
|
|
|
||
2.
|
Define
process, product, restricted-performance assessment and extended-performance
assessment. Give an example of each.
|
|
|
Performance
assessments provide a basis for teachers to evaluate both the effectiveness
of the process or procedure used and the product resulting from performance
of a task. Unlike simple tests of factual knowledge, there is unlikely to be
a single right or best answer. Restricted-response performance tasks are
usually relatively narrow in definition. The instructions are generally more
focused than extended-response performance tasks and the limitations on the
types of performance expected are likely to be indicated. The
extended-performance task may require students to seek information from a
variety of sources beyond those provided by the task itself. An example of
process is a student showing the procedures used to complete a science
experiment. An example of product is an apple pie that the student has baked.
An example of a restricted-response performance assessment is the
construction of graphs of the average amount of rainfall per month for two
cities. An example of extended-response performance assessment is preparing
and delivering a speech to persuade people to take actions to protect the
environment.
|
|
|
||
3.
|
Identify
two advantages and two limitations of performance assessments. Explain why
they are an advantage or a limitation.
|
|
|
A
major advantage of performance assessments is that they can clearly
communicate instructional goals that involve complex performances in natural
settings in and outside of school. By using tasks that require performances
that correspond as closely as is feasible to major instructional objectives,
they provide instructional targets and thereby can encourage the development
of complex understandings and skills. A second advantage of performance
assessments is that they can measure complex learning outcomes that cannot be
measured by other means. They measure "real world" outcomes.
The
most commonly cited limitation of performance assessments is the
unreliability of ratings of performances across teachers or across time for
the same teacher. Reliability can be greatly increased by clearly defining
the outcomes to be measured, properly framing the tasks, and carefully
defining and following rubrics for scoring performances. Another limitation
of extended performance assessments is their time-consuming nature. This limitation
may not be easily overcome. However, the need for fair and valid assessment
may outweigh the time needed to create and score those assessments.
|
|
|
||
4.
|
Identify
ways for creating sound and useful performance assessments. Why are they
important?
|
|
|
A
number of suggestions are given in the chapter for creating valid performance
assessments. These include focusing on learning outcomes that require complex
cognitive skills and student performances. Time constructing performance
assessments should probably not be spent on lower order or knowledge
objectives. Tasks should be selected that represent both the content and the
skills that are central to important learning outcomes. Assessments should
stress the interdependence of content and skills. Assessments should minimize
the dependence of task performance on skills that are irrelevant to the
intended purpose of the assessment task. It is important that only the
most relevant material be assessed. The teacher should provide the necessary
scaffolding for students to be able to understand the task and what is
expected. Students should have the necessary prerequisite skills to complete
the task. Teachers should construct task directions so that the student's
task is clearly indicated. The students should clearly understand what
is expected of them so that the assessment is valid and accurate. Finally,
the teacher should clearly communicate performance expectations in terms of
the scoring rubrics by which the performances will be judged. Explaining the
criteria that will be used in rating performances provides students with
guidance on how to focus their efforts and helps convey priorities for
learning outcomes.
|
|
|
||
5.
|
Define
rating scales, scoring rubrics, and checklists.
|
|
|
A
scoring rubric typically consists of verbal descriptions of a performance or
aspects of student responses that distinguish between advanced, proficient,
partially proficient, and beginning levels of performance. Both analytic and
holistic scoring rubrics may be employed. A checklist is similar in
appearance and use to the rating scale. The basic difference between them is
in the type of judgment needed. On a rating scale, one can indicate the
degree to which a characteristic is present or the frequency with which a
behavior occurs. The checklist, on the other hand, calls for a simple yes-no
judgment.
|
Chapter 12: Portfolios
1.
|
What
is a portfolio? What qualifies as a portfolio of student work?
|
|
|
A
student portfolio is a purposeful collection of pieces of student work.
However, it possesses several special attributes. A portfolio is a collection
of student work selected to serve a particular purpose, such as the
documentation of student growth. Unlike other examples of student work, a
portfolio does not contain all the work a student does. Instead, a portfolio
may contain examples of "best" works or typical examples from each
of several categories of work.
|
|
|
||
2.
|
What
are some of the advantages and limitations of portfolios?
|
|
|
There
are a number of strengths and limitations of portfolios. An important
advantage is that they can be readily integrated with instruction. Another is
that they give students important opportunities to show what they can do. In
doing this, they also help students become more reflective and critical of
their work allowing them to adjust and improve. It also gives students a
sense of responsibility and self-efficacy for collecting and submitting their
work. Finally, portfolios give teachers products to use in communicating with
parents as to their child's work and what goes on in the classroom.
Among
the disadvantages of portfolios are that they take considerable time to
construct and score. They can also lead to problems with interrater
reliability in scoring and are not easily convertible to summative evaluation
grades.
|
|
|
||
3.
|
What
are some of the purposes of portfolio?
|
|
|
Portfolios
can be used in a variety of ways. Perhaps it is best to view their uses as
poles along four main dimensions. Among one dimension is the use of
portfolios as a means of instruction or assessment. A second dimension of use
is if the portfolio is used to show current accomplishment or works in
progress. A third dimension is whether it shows the student's best work or a
demonstration of typical work. Finally, portfolio use can be seen along the
dimension of whether the portfolio contains finished work or works in
progress.
|
|
|
||
4.
|
Should
students evaluate and/or select the material in their portfolios? If so, what
are some guidelines for the process?
|
|
|
One
legitimate use of portfolios is to have students evaluate and/or select the
material that goes into their portfolios. However, some guidelines are
necessary if such evaluation and choice of material is to take place. To some
extent, the guidelines are dependent on the type of portfolio or its purpose.
It is usually advisable that the student be given particular (and written)
guidelines as to what is to go into the portfolio and how they are to critique
their portfolios. Thus, evaluation and item choice should not be "open –
ended." There should also be prompts given which are intended to
encourage students to think about what they planned to do and what they
actually did, and to evaluate the strong and weak points of the entry. By
asking students to say what they might do differently next time, students are
encouraged to think about how their work might be improved.
|
|
|
||
5.
|
How
should portfolios be evaluated by teachers? Be as specific as possible.
|
|
|
To
evaluate portfolios, a teacher must be clear in his or her mind about the
instructional goals for individual portfolio entries and for the portfolio as
a whole. Teachers must know in advance whether they are going to score the
portfolio analytically or holistically. Analytic scoring rubrics on
individual portfolios are useful for formative evaluation purposes. Holistic
scoring rubrics may be more appropriate for summative evaluations. The types
of rating scales used to score performance assessments are in the most part
also appropriate for scoring portfolios. In order to gain objective scoring,
it is good practice to conceal the identity of the student. Biases such as
the halo effect should be guarded against as much as possible.
|
Chapter 13: Assessment Procedures: Observational
Techniques, Peer Appraisal, and Self-Report
1.
|
What
type of settings are best suited for observational techniques? What kinds of
behaviors are best assessed by observational techniques?
|
|
|
Observational
techniques are best suited for assessment in naturalistic environments. These
would include natural interactions in the classroom, on the playground, or in
the lunchroom. Behaviors well suited for assessment by informal observation
includes important noncognitive outcomes, such as attitudes, appreciations,
and personal-social development.
|
|
|
||
2.
|
What
are anecdotal records? How do they differ from random observations made by
teachers? What are some of the uses of anecdotal records?
|
|
|
Anecdotal
records are factual descriptions of the meaningful incidents and events that
the teacher has observed. Each incident is written down shortly after it
happens. Anecdotal records differ from random observations in that they are
both purposeful and systematic in collection and in scoring. The use of
anecdotal records has frequently been limited to the area of social
adjustment. Although they are especially appropriate for this type of
reporting, they can usually be applied to any area of learning.
|
|
|
||
3.
|
What
are the advantages and limitations of anecdotal records?
|
|
|
Probably
the most important advantage of anecdotal records is that they depict actual
behavior in natural situations. Records of actual behavior provide a check on
other assessment methods and also enable us to determine the extent of change
in the student's typical patterns of behavior. Another advantage of anecdotal
records is that they help gather evidence on events that are exceptional but
significant. Anecdotal records can be used with very young students and with
students who have limited basic communication skills. They are especially
valuable in situations where paper-and-pencil tests, performance assessments,
self-report techniques, and peer appraisals are likely to be impractical or
of limited use. One limitation of anecdotal records is the amount of time
required to maintain an adequate system of records. Another serious
limitation of anecdotal records is the difficulty of being objective when
observing and reporting student behavior. A third limitation is obtaining an
adequate sample of behavior. This limitation can affect validity.
|
|
|
||
4.
|
What
are the uses of peer appraisal and self-report scales? What forms can the two
techniques take? What are Likert scales?
|
|
|
Peer
appraisal and self-report scales are useful when assessment is not easily
carried out by the teacher or when many of the behaviors to be assessed are
conducted in more naturalistic environments like the playground or after
school. In peer appraisal, the guess-who technique can be used as can peer
nominations. In self-assessment, rating scales and interviews are both
appropriate techniques. A Likert scale is a rating scale that has a number of
choice points which differ along a continuum. An example of such a scale
would be a five point scale containing the choices: disagree strongly,
disagree, don't know, agree, and strongly agree.
|
|
|
||
5.
|
What
are interest inventories? What is their connection to aptitude? What are some
techniques of personality assessment?
|
|
|
As
the name implies, interest inventories measure a student's interest,
willingness, or enthusiasm to engage in some activity. They differ from
aptitude tests in that a student may be interested in an activity but lack
the necessary skills or talents to be successful in that activity. Some
techniques of personality assessment are interviews, rating scales, and
projective techniques.
|
Chapter 14: Assembling, Administering, and
Appraising Classroom Tests and Assessments
1.
|
When
reviewing objective test items, what are some of the key things that a
teacher should look for and/or correct?
|
|
|
There
are a number of questions that the teacher should ask him/herself when
reviewing objective test items being considered for inclusion in a test. The
first is whether the format is appropriate for the learning outcome being
measured. For example if the student was expected to be able to produce a
definition, if a short-answer item should be used instead of true-false.
Another important question is whether the level of the behavior required in
the test item matches the taxonomic level of the objective. If the student is
expected in the objective to apply knowledge then a test item that requires
rote knowledge is inappropriate. A third requirement is that the point of the
item is clear to the student. A fourth requirement is that the item be as
short as possible and be free from excess verbiage. Another requirement is
that the projected answer of the item would be agreed upon by experts in the
field of inquiry. The item should also be free from clues and technical
errors. Finally, the item should be free of ethnic, racial, or gender bias.
|
|
|
||
2.
|
What
are some of the ways that test items can be arranged on a test?
|
|
|
One
way that items may be arranged is by the type of items being used. In this
system, for example, all of the multiple-choice items would be arranged first
followed by true-false, etc. A second method is to arrange items by the
goals, objectives, or learning outcomes that the test measures. For example,
if four learning outcomes were being measured, all of the first objective's
test questions would appear first followed by the second objective's test
questions, etc. A third way would be by the difficulty of the items. In order
to increase test motivation, the easiest items would appear first. Finally,
test items may be arranged by the subject matter that the test covers.
|
|
|
||
3.
|
What
are some of the ways that a teacher can reduce test anxiety before and during
testing?
|
|
|
There
are a number of ways that teachers can reduce test anxiety. One way is not to
use tests or the threat of tests as punishment for classroom misbehavior or
for not completing school assignments. Another technique is by not stating
that students need to do their best because this test is crucial for some
aspect of their future life such as getting into a good college, etc. A third
technique is telling the students to work fast because they will need to
finish on time. Finally, the teacher should not warn students of harsh
consequences if they do not do well or fail the test.
|
|
|
||
4.
|
Describe
the concepts of item discrimination and difficulty. What should be the ideal
levels of each concept on a test item?
|
|
|
Item
discrimination refers to the idea that students who do well on the entire
test should answer a particular item correctly while students who score
poorly on the entire test should answer a given item incorrectly. Item
difficulty looks at the percentage of students in the entire class who
answered an item correctly or incorrectly. Perfect item discrimination (1.00)
occurs if the top 27% of the test scorers answers the item correctly and none
of the bottom 27% answers the item correctly. Items should be of moderate
difficulty (i.e., .75) with item discrimination being as close to 1.00 as
possible.
|
|
|
||
5.
|
What
is a correction for guessing? When does it apply? Should it be used for most
tests?
|
|
|
A
correction for guessing is based on the assumption that some students will
answer all questions (especially multiple-choice questions) even if they are
guessing at some answers while some students will leave items blank and not
guess. Students who guess will get some of the items correct by guessing
while students who do not guess at items will automatically get those items incorrect.
Thus, the correction for guessing is an attempt to compensate for the
different modes of test taking. A correction for guessing is superfluous when
all students answer all items. Thus it is better to make sure that all
students answer all items rather than try to compensate after the fact with a
correction for guessing mathematical procedure.
|
Chapter 15: Grading and Reporting
1.
|
What
are the functions of grading and reporting systems? Why are these functions
important or useful?
|
|
|
School
grading and reporting systems are designed to serve a variety of functions in
the school. These include instructional uses, reports to parents, and
administrative and guidance uses. The main function of grades should focus on
learning and student development. This function is strengthened when grades
clarify instructional objectives, indicate the student's strengths and
weaknesses in learning, provide information concerning the student's
personal-social development, and contribute to the student's motivation.
Informing parents (or guardians) of their child's school progress is a basic
function of grading and reporting systems. These reports should help parents
understand the objectives of the school and how well their children are
achieving the intended learning outcomes of their particular program.
Finally, grades and progress reports serve a number of administrative
functions. They are used for determining promotion and graduation, awarding
honors, determining athletic eligibility, and reporting information to other
schools and prospective employers.
|
|
|
||
2.
|
What
are the main types of grading and reporting systems? What is an advantage and
limitation to each system?
|
|
|
The
traditional grading system has been letter grades. This system is concise and
convenient. The grades are easily averaged, and they are useful in predicting
future achievement. However, they have several shortcomings including that
they typically are a combination of achievement, effort, work habits, and
good behavior, that the proportion of students assigned each letter grade
varies from teacher to teacher, and that they do not indicate a student's
specific strengths and weaknesses in learning.
Pass-fail
is a two category system in which the person either receives a passing or a failing
grade with no gradations in between. An advantage is that it permits students
to take some courses, usually elective courses, under a pass-fail option that
is not included in their grade-point average. A limitation is that it offers
very little information as to the extent of learning.
Checklists
are ratings of progress toward the major objectives in each subject-matter
area. An advantage of checklists is that they provide a detailed analysis of
the student's strengths and weaknesses so that constructive action can be
taken to help improve learning. Difficulties encountered with such reports
are in keeping the list of statements down to a workable number and in
stating them in such simple and concise terms that they are readily
understood by all users of the reports.
Another
method of grading and reporting is sending letters home to parents or
guardians. Letters make it possible to report on the unique strengths,
weaknesses, and learning needs of each student and to suggest specific plans
for improvement. Among the limitations are that they require an excessive
amount of time and skill, that descriptions of a student's learning
weaknesses are easily misinterpreted by parents and that letters fail to
provide a systematic and cumulative record of student progress toward the
objectives of the school.
Portfolios
can be an effective means of showing student progress, illustrating
strengths, and identifying areas where greater effort is needed. Portfolios
must be systematic and conform to all of the guidelines for maintaining good
portfolios.
The
parent-teacher conference is a flexible procedure that provides for two-way
communication between home and school. The parent-teacher conference is an
extremely useful tool, but it shares two important limitations with the
informal letter. First, it requires a substantial amount of time and skill.
Secondly, it does not provide a systematic record of student progress.
|
|
|
||
3.
|
What
is a multiple grading and reporting system? What components would multiple
grading systems contain? What is the advantage of adopting such a system?
|
|
|
Rather
than replace letter grades, many educators have advocated trying to improve
the letter-grade system and supplement it with more detailed and meaningful
reports of student learning progress. The typical multiple reporting system
retains the use of traditional letter grades and supplements the grades with
checklists of objectives. In some cases, two grades are assigned to each
subject: one for achievement and the other for effort, improvement, or
growth. When letter grades are supplemented by these other methods of
reporting, the grades become more meaningful.
|
|
|
||
4.
|
What
are some of the questions that a teacher should answer for him/herself before
adopting a letter system of grading and assigning letter grades to students?
|
|
|
A
number of questions and issues must be resolved before the teacher adopts a
letter grading system and begins assigning letter grades to students. These
include: determining what should be included in a letter grade, answering
questions as to how achievement data should be combined in assigning letter
grades, determining the frame of reference to be used in grading, and
answering issues as to how distribution of letter grades should be
determined.
|
|
|
||
5.
|
What
is the benefit of parent teacher conferences? When and how often should
parent-teacher conferences be held?
|
|
|
The
face-to-face conference makes it possible to share information with parents
or guardians. It helps to overcome any misunderstanding between home and
school, and to plan cooperatively a program of maximum benefit to the
student. At the elementary school level, conferences with parents are
regularly scheduled. At the secondary level, the parent-teacher conference is
typically used only when some special problem situation arises.
|
Chapter 16: Achievement Tests
1.
|
What
are the major types of published achievement tests? How are they similar? How
do they differ?
|
|
|
There
are a variety of achievement tests. These include achievement test batteries,
achievement tests in individual subject areas, and individual achievement
tests. These are alike in that they are all commercially available and are
standardized. This means that they have standardized rules for administration
and scoring, a test manual, and a proven reliability and validity. Most are
norm referenced and have been normed on a national group or groups of
students. Most have equivalent forms. Achievement test batteries measure a
number of different curricular areas. Standardized achievement tests in
individual subject areas measure achievement in only one curricular area such
as reading. Individual achievement tests unlike most standardized achievement
tests are given in a one-on-one setting.
|
|
|
||
2.
|
What
are the major differences between standardized achievement tests and informal
tests?
|
|
|
The
main differences between standardized achievement tests and informal
classroom tests are in the nature of the learning outcomes and content
measured. A second difference is in the quality of the test items. Thirdly,
they differ in the proven reliability and validity of the tests. Fourth, they
differ in the procedures for administering and scoring. Finally, they differ
in the interpretation of scores.
|
|
|
||
3.
|
What
are standardized achievement test batteries? Why are they useful? What is a
limitation or disadvantage in using them?
|
|
|
Standardized
achievement tests are frequently used in the form of survey test batteries. A
battery consists of a series of individual tests all standardized on the same
national sample of students. This makes it possible to compare test scores on
the separate tests and thus determine the students' relative strengths and
weaknesses in the different areas covered by the test. One limitation of test
batteries is that all parts of the battery are usually not equally
appropriate for measuring a particular school's objectives.
|
|
|
||
4.
|
Compare
standardized achievement test batteries to standardized achievement tests in
a specific area. How do they compare in terms of reliability?
|
|
|
There
are literally hundreds of separate tests designed to measure achievement in
specific areas or single curricular topics such as reading or math. The
majority of these can be classified as tests of course content or reading
tests of the general survey type. Tests also have been developed for use in
determining learning readiness. Since they can ask more questions in a given
curricular area (e.g., reading) than a standardized test battery, they tend
to have greater reliability in that particular area being assessed.
|
|
|
||
5.
|
What
is a customized achievement test? Why is it useful to the teacher? What
precautions should the teacher take in using such a test?
|
|
|
Banks
of objectives and related test items are maintained by most large test
publishers and by some other organizations. These item banks are used for
computer generation of customized tests. In some cases, the test publisher
prepares the tests. In others, the publisher will sell or lease computer
software that includes banks of items keyed to objectives and a program for
constructing and printing locally prepared customized tests. The advantage of
these customized tests is that they allow the teacher to choose questions
particularly suited to or in conformance with classroom objectives. A
limitation of these customized questions is that enough of these questions
must appear on the test to be reliable. Regular achievement and customized
achievement tests both measure what a student has learned, and both are
useful for predicting success in learning new tasks. The main differences lie
in the types of learning measured by each test and the types of prediction
for which each test is most useful.
|
Chapter 17: Aptitude Tests
1.
|
What
are tests of aptitude? What are their uses? What are the limitations of
aptitude tests?
|
|
|
Aptitude
tests are designed to predict future performance in some activity. Aptitude
tests can provide information that is useful in determining learning
readiness, individualizing instruction, organizing classroom groups,
identifying underachievers, diagnosing learning problems, and helping
students with their educational and vocational plans. Contrary to popular
belief, aptitude tests do not measure a fixed capacity nor can they predict
future behavior with 100% accuracy.
|
|
|
||
2.
|
What
is the difference between aptitude and achievement? How is aptitude conceptualized
or aptitude viewed today?
|
|
|
Historically,
aptitude was viewed as potential for acquiring some trait (e.g., learning)
while achievement was viewed as past learning that occurred as a function of
instruction or experience. More recently, this view has been modified in that
the present level of learned abilities can be useful in predicting future
performance. Performance on aptitude tests is influenced by previous learning
experiences, but it is less directly dependent on specific courses of
instruction than is performance on achievement tests. The various types of
learning measured by achievement and aptitude tests can be best depicted if
they are arranged along a continuum. The spectrum classifies the various
types of tests according to the degree to which the test content depends on
specific learning experiences. At one extreme is the content-oriented
achievement test that measures knowledge of specific course content. At the
other extreme is the culture-oriented nonverbal aptitude test that measures a
type of learning not influenced much by typical school experiences. As one
moves through the different levels of the spectrum, the test content becomes
less dependent on any particular set of learning experiences.
|
|
|
||
3.
|
What
is the relationship between scholastic aptitude, intelligence, and learning
ability?
|
|
|
Tests
designed to measure learning ability traditionally have been called
"intelligence tests." Many people have historically equated
learning aptitude and intelligence as the same construct. This terminology is
still used for some individually administered tests and for some group tests,
but its use is declining. Today the terms learning ability tests,
school ability tests, cognitive ability tests, andscholastic
aptitude tests are used rather than intelligence tests. All
these terms emphasize the fact that these tests measure developed abilities
useful in learning and not innate capacity or undeveloped potential.
|
|
|
||
4.
|
What
are group learning ability tests? What are the two types of group learning
ability tests and how do they differ? Name one major group test of learning
ability from each type.
|
|
|
The
majority of tests of learning ability administered in the schools are group
tests. These are tests that, like standardized achievement tests, can be
administered to many students at one time by persons with relatively little
training in test administration. Some group tests yield a single score;
others yield two or more scores based on measures of separate aspects of
ability. An example of a group ability test that yields a single score is the
Otis-Lennon School Ability Test. An example of a group ability test that
yields separate scores is Cognitive Abilities Test.
|
|
|
||
5.
|
What
are individual tests of ability? Why and with whom are they used? What are
the two most often used individual tests of ability?
|
|
|
Learning
abilities may be measured by individual tests. Sometimes these tests are
called intelligence tests. Individual tests are administered to one examinee
at a time in a face-to-face situation. The examiner presents the problems
orally, and the examinee responds by pointing, giving an oral answer, or
performing some manipulative task. The administrator of the test must usually
be a licensed school psychologist. Because the individual test is
administered to one student at a time, it is possible to control more
carefully such factors as motivation and to assess more accurately the extent
to which disabling behaviors are influencing the score. The influence of
reading skill is deemphasized because the tasks are presented orally to the
student. In addition, clinical insights concerning the student's method of
attacking problems and persistence in solving them are more readily obtained
with individual testing. These advantages make the individual test especially
useful for testing young children, for retesting students whose scores on
group tests are questionable, and for testing students with special problems.
The two most popular individual tests of ability are the Stanford-Binet
Intelligence Scale and the Wechsler Scales.
|
Chapter 18: Test Selection, Administration, and
Use
1.
|
Where
might an educator go to obtain information about published tests? What types
of information are contained in these sites?
|
|
|
There
are a variety of places that an educator may go to obtain information about
published tests. The two most helpful are probablyBuros Mental
Measurements Yearbook and Tests in Print. Both of these
resources contain information about the test publisher, test cost, intended
uses of the tests, technical information and independent reviews. Another
source is the Education Testing Service Test Collection. This
resource contains information and abstracts about thousands of tests. Still
another resource is the publishers themselves. Information may be obtained
from their catalogues although that information may not be completely
objective. Finally, test information may be obtained both from textbooks and
educational and psychological professional journals.
|
|
|
||
2.
|
What
are the steps involved in selecting appropriate tests?
|
|
|
The
first step in selecting a test is to decide the purpose for which the test
will be used. Defining testing needs should be the chief determinant in
choosing a test. Another step is using available information in narrowing the
choice of possible candidate tests. Next, one should locate suitable tests
and obtain a specimen copy of the tests under consideration. Finally, the
tests should be reviewed and evaluated before a final choice is made.
|
|
|
||
3.
|
How
should a test be administered? What are some fair and unfair practices in
administering a published test?
|
|
|
The
main requirement is that the testing procedures prescribed in the test manual
be rigorously followed. Teachers do not have leeway in making special
considerations in test administration for their students regardless of how
much they personally want their students to succeed. However, teachers may
try to reduce student test anxiety prior to their taking the test. But
teaching specifically to the test or giving long practice sessions with past
tests is probably an inappropriate procedure.
|
|
|
||
4.
|
What
are some permissible uses of published test results?
|
|
|
The
main aim of using published tests results is in improving educational
planning for students. This helps in identifying the current level of student
achievement including strengths and weaknesses. Also, any discrepancies
between perceived student ability and test results should be noted and
addressed. Other legitimate uses of test information is the sharing of test
results to parents to help them understand how their child is progressing in
meeting learning goals. Finally, test results may be used in helping the
educator make educational and/or vocational choices but these tests should
not be the only criteria used in making these decisions.
|
|
|
||
5.
|
What
are some unwise uses of using published test information?
|
|
|
Published
test results should not be used to assign course grades. Teacher made tests
are better designed and are more valid for fulfilling that purpose.
Assignment to a remedial track or even retention in a grade should not be the
purpose of published tests. Rather a variety of data and information should
be used in these decisions. Finally, published test results should not be
used to judge teacher effectiveness, fire teachers, or give raises. Children
differ widely in their abilities and in the environmental contributors to
their learning and teacher soundness or unsoundness should not be judged on
the basis of a single published test.
|
Chapter 19: Interpreting Test Scores and Norms
1.
|
What
are two differences between educational/psychological tests and tests in the
natural sciences?
|
|
|
There
are two primary differences between educational/psychological tests and tests
in the natural sciences. The first is the issue of the true zero point. While
it is possible, for example, to have a spot where there is no length when one
is beginning to measure something concrete such as a table, there is no
similar place in learning that contains a true zero point where no learning
has occurred. The second issue related to that is that since there is no true
zero point in educational tests one can not be sure that the difference
between scores are exactly equal and comparable. For example while we can be
sure that the distance between one inch and two inches and between two inches
and three inches is exactly three inches, we cannot be sure that an IQ of 100
is precisely twice that of an IQ of 50.
|
|
|
||
2.
|
What
is the difference between criterion-referenced and norm-referenced test
scores? What are different types of each variety?
|
|
|
Criterion-referenced
tests measure the extent to which a student has learned a set of specified
objectives. Norm-referenced tests measure how well a student has done
compared to other students in the norm group. Scores that measure
criterion-referenced tests are raw scores, percentages, and expectancy
tables. Scores that measure norm-referenced test performance include raw
scores and derived scores such as percentiles and grade equivalents.
|
|
|
||
3.
|
What
are percentile ranks? How are they derived? What is the difference between
percentile ranks and percentages?
|
|
|
||
4.
|
What
is the normal curve? How is the standard deviation related to the normal
curve?
|
|
|
The
normal curve is a symmetrical bell-shaped curve that has many useful
mathematical properties. One of the most useful from the viewpoint of test
interpretation is that when it is divided into standard deviation units, each
portion under the curve contains a fixed percentage of cases. The standard
deviation is related to the mean and indicates how scores disperse themselves
around the mean. The normal curve is divided into approximately six equal
standard deviation units.
|
|
|
||
5.
|
What
are standard scores? What are some of the standard score measures? Describe
them.
|
|
|
Standard
scores express test performance in terms of standard deviation units from the
mean. The basic types of standard scores are z-scores, T-scores, normalized
standard scores, stanines, normal-curve equivalents, and standard age scores.
These scores express test performance simply and directly as the number of
standard deviation units a raw score is above or below the mean. T-scores are
derived from z-scores with the purpose of making each T-score a positive
integer. Normalized standard scores use the z-score and T-score to measure
the area of conversion of scores on a normal curve. It is derived by using a
table based on the normal curve. Stanines are single digit scores on a
nine-point scale ranging from 1-9. The normal-curve equivalent (NCE)
is another normalized standard score that was introduced in order to avoid
some of the pitfalls of grade-equivalent scores. Finally, another widely used
standard score for ability tests is the standard age score (SAS). With these
scores the mean is set at 100 and the standard deviation at 16.
|
No comments:
Post a Comment