Discussion in the first session of the Azenet Tucson book club — th Theory; using explanatory power section with the introduction to life cycle behavior (p.49). The most common evaluation activity among our members is evaluation of state or federally funded programs (DOE, SAMHSA, OJJDP, BJA). Common characteristics:
- programs have a few years to implement an ‘evidence-based practice’
- evaluation is closely structured around performance measures, often with online reporting requirements
- some projects require comparison groups or other hard-to-implement designs
While programs are ‘start ups’ and are supposed to mature, the rigidity of the reporting requirements means that evaluation often leaps right over implementation issues to outcomes (substituting ‘fidelity’ measures and specific program monitoring – dichotomous = ‘did they or didn’t they do X?” or ‘sustainability’). Even the Strategic Prevention Framework stuff (SAMHSA, which is trying to build coalitions to address local substance abuse issues) doesn’t use implementation theory or program life cycle ideas, at least here in Arizona. So, (surprise!) when we don’t have any theory, the plan doesn’t include any way to measure/record what happened as the program or coalition weathered maturation changes, staff turnover etc., and there’s no way to report on it. Do you have specific recommendations for social and educational program implementation-stages theory and/or program maturation? I think that this discussion will continue throughout our reading.
Introduction
This is the kind of question that is hard to answer in the abstract because specifics matter so much. I have dealt with problems like this in my practice, and I always need to learn the gory details. For instance, some of the suggestions I’m about to give make the assumption that the evaluators have a close relationship with their project officers and with a variety of other stakeholders. I have no idea if this assumption is correct in your situation. But I never let little things like ignorance stop me. I’ll try anyway.
It seems to me that the kinds of problems you are describing fit into my discussion of “incorrect assumptions early in the evaluation life cycle” (pages 149 – 154.) The most relevant case is #1, where the sponsor insisted on an inappropriate outcome measure.
Stages and Life Cycles
Personally I’m not knowledgeable about implementation stage theories for the kinds of programs you are dealing with. None of the references I have on page 59 are about social or educational programs, although I’m sure there is a big social science literature on this topic. Actually, I see a weakness in my book that I don’t delve into what the various life cycle stages are, and discuss how they relate to evaluation surprise. I may deal with this lack if Guilford ever sees the wisdom of asking me for a second edition.
I talk about concepts such as “start up phases”, and “stability” using the “I’ll know it when I see it” measurement criterion. Not the best measurement system for a guy in my business. Still, I can think of intuitive criteria that may be useful. For instance: Did they just receive funding and are still getting organized? Are they hiring a lot of staff or have the folks that work there been around for a while? Do they have working relationships with closely related systems? (For instance, if schools are supposed to refer to health care clinics, is the referral mechanism in place.) Have the services provided stayed more or less consistent for some time? Has the content of the services been changing a lot even though there is no obvious change in the operating environment?
All that said, I don’t think your problem is lack of an implementation theory. I’m sure that if you went shopping you would find one. The real problem is whether you have any opportunity to collect the necessary data.
Data
Assuming that you are comfortable with a measure of maturity, how would the various programs stack up? Are they all in the start-up phase, or are there some that are legitimately ready for some kind of outcome measure? It might be worth looking at this because the kinds of problems they have may be different. For instance, a mature program may have a problem because you are boxed into collecting inappropriate outcome measures, while a start-up may mean that the whole idea of measuring outcome makes no sense.
I’d bet that in addition to the funder, there are other stakeholders who have an interest in the evaluation, and that their interests align more with what you know needs to be done than do the funder’s requirements. Can you design the evaluation in such a way as to try to meet those needs? That would legitimize other kinds of data collection. I don’t know what position your project officers would take on this, but it’s hard for me to believe that they would be unsympathetic. I realize that it’s easy to suggest more data collection and hard to pay for it. But it may be possible, particularly if you could justify it as a needed precursor to getting the data the funder is asking you for. Or, maybe you could get some of those other stakeholders to kick a bit of money into the pot. You may not need a lot because you already have the evaluation mechanism up and running.
Maybe another possibility is to get your funders to support you in the collection of other data. After all, your project officers must know that they are putting you in a tough position. They are probably trapped themselves in a web or regulations and policy that is making them pass those evaluation requirements on to you. Especially if you could bring other stakeholder to the table asking for other types of data collection, you might get support for getting that data.
Comparison Groups and Other Hard to Implement Designs
I don’t know what you mean by “hard to implement” but I am sympathetic to the difficulty of implementing comparison group studies. (I am also a big fan of comparison groups. Having some in an evaluation I am doing is saving us because there are some strong confounds in the experimental group.) Still, there are designs you may be able to use. What about a no-control group interrupted time series? Or even if you can’t go back far in time, a simple pre-post comparison with valid and reliable data can go a long way to making some reasonable causal inference.
Is there any chance that you are measuring outcomes for which there is national data to compare against? (For instance, I’m thinking of the standardized tests that are given in all the states across the globe.) I realize that you funders may not like these designs, but they can be worthwhile. How much wiggle room do you have in negotiating the design?
Is the Evidence Based Practice Any Good?
I hate to bring this up, but it may be worth pondering. How much faith do people have that the “evidence based practice” is based on enough evidence to believe the program should have the desired outcomes? If nothing else, looking at this question may help set expectations. In the best of all worlds though, the programs you are evaluating would add to the knowledge of evidence based practice, and thus have a longer term impact on the kind of programming your funders are pushing.