Recently a friend of mine asked about my understanding of what role complexity can play in Evaluation, and how I would further that role. Below is an edited version of what I sent her.
My goal for the role of complexity in Evaluation
Complexity as discussed in evaluation circles contains a great deal of information that is either wrong, or ill-chosen as elements of complexity that can be useful in Evaluation. Those discussions do not appreciate the broad and deep knowledge about complexity that has roots in many different scientific disciplines. Much (most really) of that knowledge is not applicable in the field of Evaluation, but some of it is. My goal is to influence the field to appreciate and use the knowledge that is applicable.
As I see it, the critical issue revolves around program theory, i.e. people’s beliefs about what consequences a program will have, and why it will have those consequences. The problem is not methodology because for the most part our standard arsenal of quantitative and qualitative tools is more than adequate. The problem is that evaluators do not choose appropriate methodologies because their designs are customized to test incorrect program theories.
I have an ancient textbook on Social Psychology that tried and failed to come up with an unambiguous definition of “Social Psychology”. The author (Roger Brown, as I remember) made the point that having a good definition is the mark of maturity in a field, not a requirement for beginning investigation.
That’s how I feel about “complexity”. Those who claim to do research on “complexity” come from many different fields – meteorology, physics, political science, biology, economics, and so on. That eclecticism is also true for the behaviors of nature on which they pin the label “complex”. Some of those behaviors began in one field and diffused across disciplinary boundaries. Others had parallel invention in more than one field. Much of the cross-disciplinary borrowing is technical, i.e. actual scientific knowledge is picked up from one intellectual domain and used in another. Some of the borrowing is metaphorical. An idea is applied to help understand or describe something, even though the “technical” aspects of that idea are not applicable. “Chaos” is a good example. “Chaos” has a technical meaning that is not appropriate in many of the settings where the idea of “chaos” is invoked. Nevertheless, something worthwhile is gained in the effort to apply the concept.
“Complex systems” are all the rage in Evaluation these days. But as far as I can tell there is very little evaluation in which someone made operational decisions based on complex systems. It’s hard to point to an example in which someone said: Because this is a complex system, I will make such and such decisions about the methodology I will use, or the data I will collect, or how I will analyze and interpret that data, or what I will say to stakeholders.” I think the reason for the lack of operational application of “complex systems” in Evaluation is that the concept is too ambiguous to provide guidance on the practical concerns of methodology and data.
What can provide that guidance is knowledge of what complex systems will do. Here are two examples:
Outcome distribution: In my experience most funders make the assumption (at least implicitly), that benefits will be more or less symmetrically disturbed. That’s a comfortable belief when public money is being invested through a political process. But an evaluator who appreciated complexity might look at that program and say: “I know that complex systems often generate outcomes that are log linearly distributed, and from what I know of this program, I can see how such a distribution might manifest itself. I will try to get my stakeholders to think in these terms. I will prepare to use appropriate statistical methods. I will set my expectations as to when outcomes might first become noticeable.” And so on and so forth.
Efficiency and robustness: Fractal structures represent the best compromise between efficiency and the ability to remain functioning when links are broken. For most evaluation that is irrelevant. But if you were evaluating anything infrastructure like – roads, health care facilities that referred to each other, and so on, you would do well to draw on knowledge about fractals when devising program theory and deciding what data to collect.
A major problem in applying complexity in Evaluation is lack of a coherent framework. To explain this I’ll give a counter example from statistics.
If I gave a briefing on evaluation findings and said that I analyzed the data with logistic regression, most people would have no clue as to what I did. Explaining it would not be easy. But I could give a lecture on statistical reasoning that focused on a few key topics. The idea of a general linear model in which true score and error are separate, that error averages to zero, what a representative sample means, type 1 and 2 error, and maybe a few other concepts. If I did that people would understand the logic behind what I did, even if they knew nothing about the ins and the outs of logistic regression. That would let me cite other statistical tests as well because even if they did not understand the particular test, they would understand the logic I was applying to my data interpretation.
Alas, no analogous lecture can be given on complexity. There is no unifying logic. There is only a body of work, spread across different disciplines that share some combination of: 1) conceptual similarities and 2) some mutual agreement that the concept belongs in the domain of the “complex”.
There are three domains of knowledge that need to be transferred into the field of Evaluation. (They are related, but it’s easier to talk about them separately.
This involves cherry picking from complexity science and choosing particular complex behaviors that have special relevance in Evaluation. The video I did at MSU was my favorite list as of a few months ago, but it has expanded since.
As above, I’ll revert to my favorite example – statistics. Most social scientists use statistics, but some also serve as boundary spanners between their field and the field of statistics. Either they develop statistical methods and apply them in their home field, or they actively read the statistical literature. Either way, they constitute a group of people who are recognized by their peers as social scientists, and who serve the role of making cutting edge statistics an accepted part of social science work. We need people like that in Evaluation, and we don’t have them. I’ll give you an example.
Scaling: There is a lot of very interesting work in Urban Planning involving scaling. For instance, how many gas stations does a city need? The answer is that the larger the population, the more the gas stations. But it’s not a 1:1 relationship. I forget the number, but it’s an exponential relationship that gets smaller as the population increases. Where is the work in Evaluation that looks for scaling relationships for primary care clinics in particular geographic or cultural settings?
Or look at this a slightly different way. Let’s say the research has been done, and we know the scaling factor. (Which I bet is case. It’s hard for me to believe that the subject has not been thoroughly researched.) If I were evaluating a program to improve health care infrastructure, you can bet I’d think about adding this inquiry to whatever else I was doing. That would provide some insight on the effectiveness and efficiency of the health center distribution.
When we speak of the very small, the very large, or the very fast, we accept that the workings of the world do not make common sense. Space is curved? Time contracts? Particles can be waves? We don’t understand it, but we believe it.
When it comes to the human scale, however, we have trouble escaping the constraints of our common sense beliefs. Common sense drives our beliefs about how programs work, and so that is how we fashion our programs and our evaluations. I don’t want to imply that common sense is wrong or that we should not use it. Of course we should. But it is also true that despite appearances, much of our world does not conform to common sense. That realization is an important theme in complexity science. If that theme could be incorporated into thinking about program design and evaluation, we would end up with better program designs and better evaluations to improve those programs.
My vision for bringing complexity into the mainstream of evaluation has three dimensions – 1) focus on specific evaluations, 2) process for boundary spanning, and 3) expanding the knowledge base.
There are three stages to this activity, each building on the ones previous.
This activity would begin with a series of discussions that were focused on already done evaluations, preferably on programs that have been around for a while so that long term consequences could be observed. Using these evaluations, a dialogue would take place among members of four groups:
- a variety of complexity experts,
- the evaluators who did the work, and
- evaluators not involved in the evaluation.
The goal of the exercise would be to address five questions:
- What were the assumptions (aka program theory) on which the program was designed?
- In light of complexity dynamics, how might the program theory been devised?
- Given the discussion about 1 and 2, what might have been worthwhile changes in the methodology?
- Given what is known about complexity, how might the data be reinterpreted?
- If further evaluation is done, what should it look like?
The next step would be to move from a focus on an already accomplished evaluation to a new one. The complexity representatives used in the previous exercise should be used here as well, as they would have already gained experience as to what evaluation is, how to conduct a dialogue with evaluators.
This step moves from design to execution. The intent would be to conduct an evaluation, from design to data interpretation, with input from the complexity experts used above. Over and above evaluating the program in question, this activity should be designed as research on the incorporation of complexity into the conduct of evaluation. The research design would be a longitudinal, mixed-methods case study. It would be ideal to use the evaluation designed above, but this would not be a requirement.
Success in the activities described above will require careful attention to sampling, as there is a vast range of choice for the evaluators and complexity researchers who would be involved. The evaluators must meet two criteria. They must:
- represent a variety of evaluation approaches, and
- have an open mind to methodologies and paradigms other than their favorites.
As a very preliminary choice for complexity experts, four groups are worth considering. These would be experts in:
- evolutionary dynamics (e.g. genetic algorithm),
- agents, emergence, and attractor/strange attractors
- diversity (e.g. Scott Page at the University of Michigan), and
- complexity as it is researched at the Santa Fe Institute and their ideological kin.
The knowledge base needs to be expanded in terms of both nurturing evaluators who have deep knowledge about the role that complexity can play in evaluation, and complexity-specific knowledge that is of special value in evaluation.
Here I’m thinking of something like the AEA’s GEDI program, but here with a focus on complexity rather than ethnic or gender diversity. The idea would be to recruit a small group of early-career evaluators who would work with more established evaluators on evaluations that incorporated complexity. In addition to that work, they would get training, and a chance to interact with each other.
Most of the research that has been done on complexity has been done by people whose primary affiliation is in a specific discipline – political science, meteorology, economics, and so on. That research was not done to advance knowledge of complexity. It was done to advance knowledge in the researchers’ home discipline. Given the theories that have sprung up that stake a claim to making Evaluation its own discipline, there should be opportunities for people to research the implications of evaluation in that discipline. As an example. Developmental Evaluation is highly oriented toward the arc of a program’s change over time. That perspective seems ideally suited for research that applies principals of evolutionary biology to program development. What would we learn if we looked at a program as an organism climbing a fitness a fitness landscape that was populated by other organisms (aka programs) in a complex ecosystem?