This blog is my effort to consolidate and organize some back and forth I have been having about evaluation use. It was spurred by a piece on NPR about the Administration’s position on an after school program. (Trump’s Budget Proposal Threatens Funding For Major After-School Program.) In large measure the piece dealt with whether the program was effective. Arguments abounded about stated and unstated goals, and the messages contained in a variety of evaluations. Needless to say, the political inclinations of different stakeholders had a lot to do with which evaluations were cited. Below are the notions that popped into my head as a result of hearing the piece and talking to others about it.
Selective Use of Data
Different stakeholders glommed onto different evaluations to make their arguments. Is this OK? On some level it is not because of course, people should consider all evidence, not just the evidence that supports their point of view. But there is a flaw in the statement I just made. In fact it would be very poor practice to consider “all the evidence”. Why? Because all evidence is not equal. Here are two examples.
Methodological selectivity: Some evaluations will have better methodology than others. Do we want to automatically include all the evaluations? Of course not. Think how meta-analysis is done. There is a careful step of deciding which studies should be in and out.
Policy selectivity: Suppose a program has evaluation data on 4 outcomes, 2 of which are highly related to the mission of the sponsor, and 2 of which are remote from the mission. Should all 4 be automatically considered, or should some be ignored? Or suppose all 4 are relevant, but some have greater consequence for the social good than others. Do we want to give all 4 findings equal weight with respect to merit, worth, and so on? There is nothing obvious about the proper decisions for questions like these.
So if selectivity is desirable, what determines “good selectivity”? There is no objective answer to this question because of differing opinions about methodology and because values drive beliefs about policy relevance. So really, it is naïve to say things like: “Of course people should consider all evidence, not just the evidence that supports their point of view.” I think the most problematic word in that statement is “all”. I think the statement should be rephrased as the following instructions:
- Identify the evaluations whose methodologies you would usually take seriously.
- Try to respect methodologies you do not usually favor, and pick a few that you can muster the nerve to take somewhat seriously.
- From within the group derived from 1 and 2 pick evaluations that support your philosophical inclinations and ones that run counter to those inclinations.
- Stretch yourself by paying attention to the whole set of evaluations that you came up with
As the debate about the program unfolded it became clear that there was a limited set of intended program goals. Moreover, those goals were of the long range difficult to achieve variety, e.g. “reduce dropout rate”. As the argument over the program played out, various supporters of the program came up with other advantages of the program, e.g. after school child care. Quite an argument ensued as to which goals should be counted.
What about the goals that people claimed were important, but which were not originally identified? Did the original designers of the program do a poor job of articulating the desired outcomes? To some degree they certainly did miss some important ones. On the other hand, the program was set up by a government entity that had a clear mission, and that agency did indeed articulate a set of outcomes that were aligned with their mission. That seems pretty good to me.
Of course they could have articulated a fuller set, but my complexity antennae are wiggling. I’d say it would be impossible to identify all of them because: 1) It might be impossible for some goals to be identified until after the program was in operation. 2) Even up front, it would be impossible to bring in the diversity of opinion that would be needed to identify too many more desirable outcomes.
Look what happened. Once a (presumably credible) evaluation showed that the program did not meet its stated objectives, people said two things. 1) “Wait a minute. That’s not what we meant. What we really meant were all these other outcomes as well, and we do have evaluation to show those other outcomes are being met”. 2) “And in any case, it’s OK because these other outcomes can be plausibly connected to the long term outcomes that any reasonable person could see cannot be met with a program like this alone”.
What Outcomes Are Legitimate?
Now we have the question of which of those unforeseen outcomes are legitimate for a program like this. Take the child care outcome. I like that result and I’d like to see more programs that serve that function. More is better. On the other hand, is child care (and its consequences for families) a reasonable and legitimate outcome for a program coming out of this part of the education department? Some (including me) would say “yes”, but people who would say “no” are not crazy or irrational. So should “child care” be considered in the public policy debate, whether that outcome is achieved or not? After all, the more we give credence to goals not aligned with long range objectives, the lower the probability of ever designing programs that will achieve those objectives.
The reason to do evaluation (one of them, anyway) is to inform a public policy debate. “Debate” is the operative word. Different stakeholders have different ideological views about whether the government should sponsor programs like this, the legitimacy of different outcomes, and so on. So naturally they attach a different valence to different findings. What I find so interesting is who gloms onto what findings, and the arguments people construct around them.
Who are the good guys and the bad guys in this? My inclination is to say that anyone who insists on justifying the program in terms of things like dropout rates is well, I’ll be kind and say “misinformed”. On the other hand we do want to decrease dropout rates, and this program did not do that.
Program Theory (or, at least, a decent explanation of why a program may work)
If we really cared about dropout rates, do we really want to keep funding a program that fails to accomplish that goal? My view is that we do if two conditions were in place.
First, we had a plausible (and maybe even defensible) theory as to what outcomes would lead to fewer dropouts, and made sure we included them in the evaluation. (I’d even be comfortable saying that “theory” is a bit too strong a word. I’d settle for “a pretty good explanation”.) That is quite different from looking at what the program actually accomplished, and back-fitting those accomplishments into a logic chain from program to long term outcome.
Second, we need to kick the program up to a higher level and identify the constellation of other programs that need to be operating in order to get the desired outcomes. We don’t need all of them but we need the most important and proximate ones. (I’m leaving out the far from trivial question of whether it’s possible to identify what these programs are.) I don’t think we need to evaluate those programs, but we at least need a sense of whether enough is happening that we can expect our program to do what we want it to do.
Realpolitik of Program Design
As for why some of those outcomes were left unstated. I touched on some of the reasons above, but there is another. As an example, let’s assume that the child care outcome is legitimate. I can easily see the program designers saying: “Yes this is a reasonable and acceptable outcome to state. But if we did, the program would never get funded, so we are going to leave it out”. So what’s the best course of action? Leave it out, and one of the easier to demonstrate positive outcomes disappears, thus leaving a higher percentage of difficult to accomplish outcomes to pin the program’s fate one. Dangerous, but maybe a necessary gamble.
The Challenge to Evaluation Use Will Increase in the near future
We have an Administration that is fundamentally opposed to many of the kinds of programs that I and my colleagues have been dealing with over the course of our careers. What this means is that we will find more and more legitimate stakeholders looking at evaluation data to show that programs do not work. Leaving out our personal beliefs about whether these programs should be funded, we are faced with more of a challenge than we have had for a long time. Namely, the range and diversity of stakeholder opinion that will attend to evaluation will increase. I think that is good thing. I don’t like the idea that only people who agree with my personal beliefs should use the results of my technocratic output. On the other hand the greater the diversity of opinion, the greater the challenge we will face in explaining what we have found, and what the information is good for.