The English Writing SOL tests (which are given at 5th, 8th, and 11th grades are given in early March. The writing test is cumulative – assessing writing skills developed over several years of instruction. The tests consist of a multiple-choice section and an essay section (students write in response to a prompt). The Department of Education uses an outside grading service, based in Kansas and Arizona, to grade the writing portion of the tests. Test scorers are trained according to Virginia standards.

These writing tests are not graded by our children’s teachers. Parents don’t get to see their child’s essays. The essays are graded by $11/hour essay graders who are required only to have a B.A. (not to have been English majors, much less English teachers). These graders are trained by the test publisher’s employees to apply its scoring rubric to our kids’ essays. And if the scoring is done as in other states, these graders are most likely given bonuses based on how fast they can go (i.e., how many they can score per day).

Keep in mind that the 11th graders who are taking this writing portion of the English SOLs must pass to graduate.  This is not a dry run; this, for them, is for real and for diplomas in 2004.  Hope those scorers in Kansas and Arizona keep that in mind. 
 

Scorers needed!

This ad ran in the Richmond Times Dispatch in 2002 .

NCS Pearson is the nation’s largest commercial processor of student assessments serving over 40 statewide K-12 testing programs. Qualified candidates must have a minimum of a four-year degree and be able to follow a scoring guide. A background in writing is preferred but not required.

Current project begins March 12

Long-term temporary positions

FT Days: M-F 8:00 a.m. to 4:30 p.m.

Positions start at $10.00/hour

Call 1-866-JOIN NCS or log onto www.quikscreen.com/joinncs to schedule and (sic) interview.

Walk-in interviews welcome daily from 9:00 a.m. – 11 a.m. and 2:00 p.m. – 4:00 p.m.

Please bring proof of degree.

THIS ISN’T ABOUT WRITING; IT’S ABOUT NOT GETTING YELLED AT

 An interview with Daniel Ferri on NPR

Who's scoring those high-stakes tests? Poorly trained temps

By Cameron Fortner
 


Growing up in California's public schools, I took more standardized tests than I can remember: Teachers at every grade level stressed their importance. I didn't want to let anyone down, so I approached each test with all the solemnity and effort a child can muster.

I never questioned that obedience. As a child, I imagined my test answers being flown across the country to a room of educated, professional test scorers who possessed a zeal for essays written on such topics as "A moment that changed my life."

My summer as a test-scorer disabused me of that notion. As a recent college graduate, I worked in a Boston testing company, and instead of the professionals I'd envisioned painstakingly grading exams, I found a room full of temporary employees who had little respect for - and minimal investment in - their jobs.  (Click here for the rest of the story )

 

Letters from a scorer

Dear Scott,

As I told you on the phone, I am working temporarily in a "bubble-factory" where we compete with each other to fill (yes, not make) as many bubbles as possible. Our bubbles, unlike soup bubbles, do not pop up in seconds; they have long lasting and far-reaching impact in the lives of Americans. Our bubbles can change the mood and perhaps sometimes the destiny of many students and their "bestest" dear teachers. They also can financially doom one school district and bless another.

As you may guess, I am a scorer (what?); I grade the standard reading and listening assessment tests conducted on elementary, middle and high-school students all-across America. I am impressed by the company’s sophisticated training and management style. The company’s accomplishments are due as much to her hard work as to her creativity. Considering that the major consumption of employee in this company is pencils, it does not take to be a scorer to understand that the natural warranty on our thumbs and index fingers will run out soon because of the mileage we put on them in just few months. (You do not need to know how this warranty protects the manufacturer; you are not scored Scott). In the course of filling the bubbles, my neighbor’s fragile fingers, which made up of lightweight bones with spongy stuff, are bent like a herd of steers (a bad analogy, of course) marked with paper cuts.

Am I better off or worse off with or without this tedious job? Well, yes and no, and I make one point. Some think that the increase in the toxic boron (from boredom) level in the atmosphere of the working area is due to the non-resident supervisor’s humor-deficiency syndrome. I disagree. Last week, in just two minutes he proved otherwise. He can enrich the soil and eat dead animals. And believe me, he can smile too.

Shifting from sarcastic to serious approach, I think, the real atmospheric danger is biological not psychological. Low level of oxygen and the high level of carbon dioxide and perhaps sulfur dioxide have targeted the scorer’s most precious organ: The Brain. A common scorer is perhaps losing his or her several thousands brain cells per second (bcps), and thus regressing to their elementary years. This regression, in turn, may work well for the company’s clandestine policy to keep the order, to feed the stinking rubrics without objection, and to suppress critical minds.

***

Dear Scott,

Our table currently is the highest producer of filled bubbles. We 8.5 people, (I am the half!) work like termites and score average of 1100 papers a day. Feeling sorry for them and getting bored from the repetitive task, I started sharing with them some amusing and interesting papers. It now became a reciprocal act of generosity among the gang of fast-paced scorers. Furthermore, I started creating side projects for the few who were interested. I thought this would reduce the speed of my neighbors. Wrong. To the contrary, it increased their production. Just, last Friday, the girl next to me scored 152 papers and the guy across the table scored the record 183 papers. Despite the misguided knowledge of our dear employers, the more we laugh the more we score. Nevertheless, I am determined not to exceed my self-imposed 90 paper per day goal. I know I am the slave of employers until I become a millionaire (well, afterwards, I will become the slave of my money), but I will try to keep my itsy-bitsy self-control over my work by limiting myself with 90 ppd as my maximum speed.

Don’t have a cow Scott, I am going to tell you about our extra scoring projects. You might think that we are addicted with scoring or just bored like the brave and clever girl in the story. Whatever the reason, we came up with projects after projects, and projects within projects.

The first project started when I noticed that my next scorer was noticing those who were frequent to the bathroom. Though she was a high-speed scorer, she could keep track of everyone going to the bathroom. Meanwhile, she was able to chew her gum, answer every question I asked (perhaps one question for every four minutes). She also demonstrated the courtesy to laugh or smile to every amusing and dull paper I shared with her. Realizing that I am sitting next to a prodigy, I suggested her to carry out our first project on people’s bathroom habits.

Here is the record of scorers who scored the most in bathroom-visitation. I called them satellites, since they had a temporal pattern in their orbits tangential to the bathroom.

Mr. Tall 12 times

Mr. Brown 11 times (mostly fountain)

Mr. Blonde 7 times

Mr. Green 7 times

Ms. Pony-tail 7 times

Ms. Spy 7 times

In my opinion, they were the smartest employees, since they were relaxing their eyes, cartilage and muscles, in this unhealthy work environment. (These and other frequent-walkers need not to worry since we have terminated the bathroom-tracking project. In fact, I am planning to join their club)

Embarrassed of tracking people on their way to the bathroom (ultimate regression!), for the next project, I suggested checking for possible relationship between their description of the girl who saves the sheep and their description of the bone. Well, we came up with an interesting correlation. Out of 139 papers:

Solid Light Rubbery

Brave   11    20     5

Clever    2    14    1

Caring, etc.   11   3   5

Determined   0   5   1

Grown up     5   0   0

"Weak Descr."    19    19    8

And there was one from each of the unconventional descriptions such as, Indian, Farmer, Short, Tall, and Beautiful. There was an obvious correlation. Those who described the girl "clever" were clever indeed; most of them got the "right" answer Those who found her "caring" ended up with a "solid" but losing score.

This led me to a theory: girls must do better in describing the bones. This assumption was based on the common belief that in elementary school years, girls learn reading and writing faster than boys do. Well, I was proved to be wrong. For 100 boys who described the bone with the expected right answer, lightweight, there were only 70 girls who did likewise. My neighbor came up with a quick explanation: in scientific issues boys are better than girls. Indeed, I should have taken a clue from those who described the Indian girl "caring." Perhaps most of them were girls. (I know the problem with sex stereotypes and I consider myself 76% feminist. But, it seems that there were really some differences between Adam and Eve.)

Well, the third project was more exciting and meaningful, in which, another scorer also joined us. This one involved one of the clearly wrong rubrics. In the story, a fox with kittens escapes from harsh winter to a farmer’s barn. When farmer sees her, he gets angry and wants to chase her away. However, when she leads the farmer to the barn, the farmer notices the fire and puts it out. The farmer feels safe. He checks his barn and sees the fox with her family in the corner. The farmer similes, and walks away.

The question solicits the reason for farmer’s change of feelings. The story provides two equally right answers:

1. The fox warned the farmer about the fire, and farmer could feel safe with the fox in his barn.

2. The farmer gave up from chasing the fox because he saw her kittens. So, the farmer allowed the fox family to pass the harsh winter there.

Unfortunately, the test-makers had decided that the phantasmagoric farmer was a pragmatist capitalist ; not a compassionate animal loving person. Many students were relying on the farmer’s smile just after seeing the fox family. Thus, they considered the farmer a compassionate person. Well, those students lost (and continue to lose) up to three points, because of the redundancy in these faulty questions.

Soon I wondered about the gender differences. I came up with this theory: most likely the girls are the biggest victims of this faulty and biased rubric, since they might focus on the compassionate side of the farmer, rather than his businessman side. Assuming that the numbers of girl and boy students were equal, we started keeping the record of answers and the sexes.

After one day scoring, the results were no surprise. While 99 students answered the question with "fire" and received credits, 149 students answered with "family" and received nothing. In other words, sixty percent of students lost up to three points (12% of their total scores!) But, the real losers, as far as sexes are involved, were girls. Here is our count:

Boys Girls

Kitten/family 57 92

Fire 50 49

As you may infer, boys were more creative in finding irrelevant answers than girls were. Therefore, you see less boys in this record. That means, 63% of girls lost points for their justifiable perception of the farmer as a compassionate person! This is of course a bad news for parents of children who are more focused on love and compassion, and especially bad news for those who have daughters in the State of Washington.

While conducting each project besides our main project, we found the need to play word games. The lady next to me started shaking her pigtails hard. She was a nightingale in a purple veil. It was not true that she was not playing, but it was true that she was playing. As for me, if my chair could speak one day , it would sue me for constantly moving like a child bitten by a swarm of yellow jackets. (If I follow the information provided in the "wanted" list and call a doctor, I won’t get a point.)

For instance, we alternately generated 31 meaningful (no names) words from "BOREDOM." It ended when my neighbor exhausted all meaningful words and wrote down "boo." Of course, I did not continue with a "moo." She won. We also generated 45 meaningful words from "CREATIVITY." Our creativity reached its zenith and simultaneously its death when I resorted to a "tat" and she responded with a "tit." She won again.

On top of all these projects, several of my table-mates started learning a few Turkish and Arabic words, such as "Merhaba" (Hello in Turkish), Kayfa Haluk (How are you in Arabic). In our next incarnation, if we end up scoring again, we are considering to add Persian and Kurdish to this list.

Scott, you may wonder why I am telling you all these gibberish? Hold on and keep reading. The issue is much more serious than you may expect.

All States, except a few, spend millions of dollars and thousands of work hours to conduct these partially multiple choice and partially open ended tests. During my employment, I realized that the tests are not reliable for assessing the aptitude of many students. I would have worried if my son had received a low score on one of these tests, but after scoring more than 6,000 papers, I will take the results less serious. Here are the factors that I believe turn the tests unreliable, especially, for approximately 30% of students.

1. Poorly written questions. Some questions in the tests are terribly vague. Many college-educated adults can easily fail some of the questions directed to 4th graders. This, I believe is one of the biggest problems with standardized tests.

2. Difficulty in evaluation. To quantify answers for consistent scoring, scorers are trained with rubric and "anchor" samples. Nevertheless, this process ends up penalizing many students who come up with unique, creative or unexpected answers. It was very painful not to credit very original and thoughtful answers, since they did not fit the pre-determined rubrics.

3. Human error. Though I am impressed with the management, training, and quality control of the company, the inevitable human error is another, though less serious, factor in reducing the reliability of scoring. There is a competition among scorers to score as many papers possible. This race reduces the quality of fair evaluation, since our eyes start just looking for some key words, ignoring potential credible answers.

The documents are all "secured" and according my employment contract I cannot reveal them. However, knowing the importance of these exams and their cost to the society in terms of money and emotion, I decided to bring the issue to your attention. I argue that either multiple choice tests should be designed to entirely eliminate this highly costly and unreliable open-ended exams, or the qualification of those who write the questions and determines the rubrics must be strictly monitored by lawyers.