This is the first of three blog posts I am writing to help me understand how “complexity” can be used in evaluation. If it helps other people, great. If not, at least it helped me.

Common Introduction to all Three Posts
Practicality and Theory

The Value and Dangers of Using Evaluation Program Theory
Complexity as an Aspect of Evaluation Program Theory

Appropriate but Incorrect Application of Scientific Concepts to Achieve Practical Ends

Common Introduction to all Three Posts
Part 1:  Complexity in Evaluation and in Studies on Complexity
In this section I talk about using complexity ideas as practical guides and inspiration for conducting an evaluation, and how those ideas hold up when looked at in terms of what is known from the study of complexity. It is by no means necessary that there be a perfect fit. It’s not even a good idea to try to make it a perfect fit. But the extent of the fit can’t be ignored, either.

Part 2: Complexity in Program Design
The problems that programs try to solve may be complex. The programs themselves may behave in complex ways when they are deployed. But the people who design programs act as if neither their programs, nor the desired outcomes, involve complex behavior. (I know this is an exaggeration, but not all that much. Details to follow.) It’s not that people don’t know better. They do. But there are very powerful and legitimate reasons to assume away complex behavior. So, if such powerful reasons exist, why would an evaluator want to deal with complexity? What’s the value added in the information the evaluator would produce? How might an evaluation recognize complexity and still be useful to program designers?

Part 3: Turning the Wrench: Applying Complexity in Evaluation
This is where the “turning the wrench” phrase comes from in the title of this blog post. Considering what I said in the first two blog posts, how can I make good use of complexity in evaluation? In this regard my approach to complexity is no different than my approach to ANOVA or to doing a content analysis of interview data. I want to put my hands on a tool and make something happen. ANOVA, content analysis and complexity are different kinds of wrenches. The question is which one to use when, and how.

Complex Behavior or Complex System?
I’m not sure what the difference is between a “complex system” and “complex behavior”, but I am sure that unless I try to differentiate the two in my own mind, I’m going to get very confused. From what I have read in the evaluation literature, discussions tend to focus on “complex systems”, complete with topics such as parts, boundaries, part/whole relationships, and so on. My reading in the complexity literature, however, makes scarce use of these concepts. I find myself getting into trouble when talking about complexity with evaluators because their focus is on the “systems” stuff, and mine is on the “complexity” stuff. In these three blog posts I am going to concentrate on “complex behavior” as it appears in the research literature on complexity, not on the nature of “complex systems”. I don’t want to belabor this point because the boundaries are fuzzy, and there is overlap. But I will try to draw that distinction as clearly as I can.

Practicality and Theory
I am torn. Part of me is a very practical fellow with a promiscuous view of evaluation. I do what works. I grab the methodologies that I can and I grab the data that I can. I shake them all together to provide the best insight I can about how programs are implemented, how they operate, and what they achieve. I suppose that makes me a technologist. I care about practical action, not about truth. (For more on evaluation as technology, see: Evaluation as Social Technology). I know about the dual wave-particle nature of light. I also know that my glasses are designed based on the theory that light travels in waves. So what if the theory is incorrect? It is good enough. I can see.

Another part of me does care about truth, or at least, about faithful adherence to principles that have a defensible theoretical and empirical base. To extend the example, I can live with the lens maker ignoring the particle aspect of light. I’d feel queasy if the lens maker told me that light travels through aether. I’d get my glasses in either case, so why would I care? Because in one scenario there is a deliberate use of a partial truth to achieve a practical purpose. In the other there is an assertion of what we know to be untrue. As long as desired purposes are being achieved the distinction is merely aesthetic. But aesthetics matter. To me, at least.

The particle/wave example is nice because it provides a short and unambiguous illustration of how scientific theory can be legitimately misapplied to achieve practical ends. But that is Physics. What is the relevance to Evaluation?

The Value and Dangers of Using Evaluation Program Theory
Program theory is often an important tool for designing evaluation and interpreting data. It can provide guidance on what data to collect, what relationships to look at, how to interpret data, and what recommendations to make. I’m not arguing that we always need program theory, or even that program theory is an unalloyed good. I’m only claiming that it is often useful.

Unfortunately, using program theory can also lead one astray. How so? A program theory may result in a methodology that looks in the wrong places, or misses some of the places where the evaluation should be looking. A program theory may be “locally correct”, but not correct for other implementations of the program in other contexts. A program theory may guide an evaluation that provided useful results, but which does not reflect why the program actually works as it does, or achieves what it achieves.

Complexity as an Aspect of Evaluation Program Theory
One of the many reasons why program theory can be problematic in evaluation is that we often either ignore complexity behavior when we formulate program theory, or we misapply complexity ideas when we do attend to complexity. Two examples follow:

Example #1: Phase shift behavior in networks:
Network behavior is a common and important topic in the research work of complexity scientists. A noteworthy characteristic of network behavior is that networks are often subject to phase shift behavior. Increasing connections among nodes in a network can have a minimal consequence for the overall shape of the network until a critical level of connectedness occurs, at which point there is a sudden and dramatic jump in overall connectedness. (If you want to see a great demo of this phenomenon, download Net Logo and run the “giant component” model.) See the picture below for one of the runs. Check out the graph that plots fraction of connected nodes against connections per node.giant_component

How might knowledge of network phase shift behavior play a role in program theory? Well, evaluators often deal with programs that exhibit network-like connections. For instance: 1) community development, 2) improving civil society in a developing country, or 3) nurturing tech startups. In all of these it would be reasonable for the evaluation to include a look at how relationships among the groups formed, how many relationships there were, and how the richness of relationships changed over time. Thus in all these cases any program theory that was formulated might do well to consider the possibility of phase shift behavior. So doing would affect data collection schedules, resources devoted to tracking linkage formation, hypotheses about the nature of “success”, and setting stakeholder expectations.

Example #2: Symmetrical versus power law distributions of outcomes
This is another topic that shows up frequently in studies of complexity. In almost all of the evaluations I have ever done, I have assumed a normal distribution of impact. Whether it was test scores in school, or health status, or numbers of accidents, I’d think about a mean and a more or less symmetrical distribution around that mean. (In some cases I might think of a Poisson distribution if the mean were near the zero point and numbers could not go negative, but that would change my statistical techniques, not my fundamental understanding of how the program worked.) Essentially, the program theory says: “Most impact is around the mean, and there is about an even fall off of large and small values around that mean.”

I have come to appreciate, however, that many programs operate on a program theory that says: “Expect a few very large successes, a reasonable number of moderate successes, and a very large number of cases with no, or very small success.” These are power law distributions. (Or at least they may be.) They require a different set of beliefs about what the program will accomplish, a different definition of success, adjustments to beliefs about what a fair or acceptable outcome may be, and different analytical methods. Here are two examples: 1) business success of start-ups brought into a novel incubator program, and 2) citations of work by graduate students who were enrolled in a mentoring program. See the picture below for an illustration of symmetrical and power law distributions.random-vs-power-law-distribution-2
Appropriate but Incorrect Application of Scientific Concepts to Achieve Practical Ends
In this section I’ll get back to what I said earlier about appropriate but incorrect application of scientific concepts to achieve practical ends. The more I think about complex behavior being embedded in program theory, the more dissatisfied I become with how complexity is frequently used in program evaluation. I find the application too misleading for the work I do. “I” is the operative word. I’m not trying to convert anyone. What is practical for me does not have to be practical for anyone else.

In the remainder of this post I’ll provide two examples of why I’m not comfortable with how complexity is commonly applied in evaluation. By “comfortable” I mean application that to my satisfaction: 1) recognizes what is known about complexity, 2) minimizes any misrepresentation, and 3) furthers the work I do. By “furthers” I mean that it leads me to do more useful evaluation than I otherwise would.

Example #1: Simple, Complicated, Complex
“Simple – complicated – complex” is becoming a staple framework in our business. I can easily see how this distinction might be useful as a guide for conducting evaluation. For instance, imagine that I were evaluating the relationship between training teachers to use a new curriculum (presumably a proven best practice), the extent to which teachers used the curriculum, and student achievement. Let’s also toss in “organizational support” as defined by the degree to which principal’s encourage their staff to use the new curriculum. Finally, let’s throw in one single feedback loop. If the kids learn well, the teachers will use the curriculum more. (And for “simplicity” let’s assume this program took place in a stable environment, with students who were similar in background and intellectual capabilities, uniformly high levels of competence among the teachers, and so on.) I don’t know what a formal definition of “simple” is, but this example seems pretty simple to me.Teachers

And yet we may have a problem. An important insight from the field of complexity is that seemingly simple systems can behave in chaotic ways when the end result of an operation becomes the input for the next iteration of that operation, and further, that the chaotic behavior would arise even if no external influences were operating. I suppose we could wiggle out of the difficulty by defining any system with a feedback loop as not being “simple”, but that seems to trivialize what “simple” means. I can’t see defining a system with a few steps and a single feedback loop as being “not simple”.

I am by no means asserting that the program I described would behave chaotically. In fact I am quite sure it would not because somebody would be watching that feedback loop. If strange and disturbing things began to happen, someone would do something about it. In any case there are natural inhibitors operating, e.g. there is an upper limit to how much time teachers can spend using the new curriculum.

My problem is that if I were to apply the “simple – complicated – complex” framework I would miss some insight from complexity that would have important implications for my evaluation design. If I built on the insight about small changes causing chaotic behavior I might be sensitized to add some design elements to my evaluation. Here a few. 1) I might want to include a data collection process that would detect control processes that may be operating e.g. Who is monitoring the program? How effective are corrective actions? 2) I might pay more attention to those “natural limits, e.g. how much time can teachers devote to using that new curriculum? 3) I would be wrong to assume that if the program behaved in unexpected ways, the culprit must lie in some influence from the setting in which the program is embedded. If I really thought the program were simple, my attention would not turn to these aspects of evaluation design.

Example #2: Agreement x Certainty
The agreement x certainty matrix depicts a theory that is oft cited in our field. It hypothesizes a two dimensional space made up of: 1) agreement on a course of action, and 2) certainty with respect to what the course of action will accomplish. The theory is applied in many different areas, but we tend to use it as a way of understanding the conditions under which systems are simple, complicated, complex, or chaotic, with those distinctions having a major influence on how we do evaluation.simple complicared

To understand the AxC theory it is necessary to reverse our normal understanding of how graphs are laid out. In the AxC framework, values get higher as they approach the origin, and lower as they move further away. Scenarios near the origin are simple. They get complicated as they move out, and chaotic as they move out further. Many people I know use this theory to good advantage in their evaluation work. As long as the “good advantage” remains, I think people should keep using the AxC approach. But I have trouble doing it. Actually, I have two kinds of trouble, one pragmatic and one aesthetic.

From a pragmatic point of view, in my own evaluation I deal with scenarios where situations far from the origin can be exceedingly stable. How so? Think of all the reasons a program may not change. Nobody knows how to do anything better, no matter how little agreement and certainty there is. Maybe some kind of Nash equilibrium is operating – nobody is happy but everybody is doing as well as they can without further decreasing the satisfaction of others. Maybe the current situation is awful, but there is legislation or regulation holding it in place. Maybe any alternate solution is prohibitively expensive. Maybe the alternate is cheap but the switching costs are high. And on and on. So for the work I do, if I assumed that low AxC led to chaos, I would make some very bad decisions about evaluation theory, evaluation design, and data interpretation.

My second difficulty is aesthetic. It’s the aether problem I used earlier with respect to lens making. AxC strays far from what we know about complexity. “Chaos” in its way can be very predictable and patterned. “Agreement” and “certainty” are psychological and social psychological notions that do not appear in the complexity literature. “Simple” systems can exhibit chaotic behavior. The obvious response to me is: “Get over it. It’s only metaphor”. But I can’t. For me it’s a metaphor that is just too far from what we know about complexity.



4 thoughts on “Drawing on Complexity to do Hands-on Evaluation (Part 1) – Complexity in Evaluation and in Studies in Complexity

  1. Jonny, I have a different understanding of the “simple-complicated-complex” heuristic:
    + “simple” means linear or additive, i.e., superpositioning works well and the Generalized Linear Model is predictive. All parameters are knowable and uncertainty is small.
    + “complicated” means proportionate but not linear, or perhaps a system of linear equations (?) All parameters are knowable and uncertainty is small–or maybe a little higher than before.
    + “complex” means behavior depends on boundary conditions, i.e., differential equations, and emergence happens (not sure how that is modeled mathematically). Some parameters or their values are not knowable and thus uncertainty is high.
    + “chaos” means…I don’t know.

    Note these are ways of thinking about the system (epistemology), not assertions about the state of the system (ontology). This could explain how it’s possible to usefully model a system with (ontological) feedback loops as an (epistemologically) “simple” linear system at certain points in its history. I think that’s what GLM statisticians do all the time.

    I also think that folks who like the AxC matrix are considering the stakeholder engagement process that leads up to defining the boundaries of a system or what “counts as success.” Maybe your idea of complexity assumes this negotiation has already happened and been settled? If so, then this debate is between two different conceptions of “system”– the evaluation system vs. the program system–which would explain why some of us seem to be talking past each other or using “inaccurate” metaphors.

    Would you buy this understanding? And if so, would you feel more comfortable with these heuristics?

  2. These conversations demonstrate to me that Jonny is way ahead of the common evaluation complexity discourse and we’d all (well certainly I) benefit from taking his thoughts and ideas far more seriously. What I especially like about his thoughts are that the are grounded both in established complexity theory (which much of the complexity discourse in evaluation is not) and evaluation practice (ditto).

    Which is not to say that I agree with everything he says, nor is everything he says new to the evaluation discussions (which he wouldn’t claim anyway). For instance, the (excellent) idea of basing or at least informing program theory on the basis of network theory has been written about extensively by Rick Davies.

    The blog was written in three stages and like Chad I’m going to comment in parts, rather than write a huge email.

    In Part One, after his excellent description of ideas from his corner of the complexity field, he comments on two very fashionable concepts in evaluation. One is the notion of Brenda Zimmerman (and others’) simple, complicated and complex; and what is called Stacey’s Agreement/Certainty matrix.

    I agree with him that these applications don’t come from classic complex adaptive systems theory – however they do come from other parts of the complexity field. He takes to the two concepts with an axe, which is fair enough except that his description of them is open to debate. Zimmerman never talked about simple, complicated and complex systems – she talked about simple, complicated and complex problems. In other words, they are epistemological concepts not ontological states. Jonny’s discussion treats them as if they were actual systemic states. This confusion has been reflected in other critiques (notably in the journal Evaluation), and there is a substantial and long standing debate within the systems field about whether there are such things as simple systems or even whether you can treat certain aspects of a complex system as if they were simple … it is known as the Contingency Debate. Note that I’m not talking about whether you can treat an entire complex system as if it were simple, but whether it is possible to take certain aspects of complex situations and treat them as if they were simple (eg crossing the road is a highly complex situation but we can treat it as if it were simple by a set of established rules). Furthermore there are others within the broader complexity area, like David Snowden, who claim to have found a way to juggle the various contradictions (ironically based more on network theory than complexity theory).

    Jonny is kinder on the Agreement/Certainty matrix than I am, but his critique is spot on and I encourage those who use it as a basis for their understanding of ‘complexity’ to look at Jonny’s comments. Jonny fails to mention that the original diagram was buried deep in one of Stacey’s books almost as a passing reference, Stacey didn’t even bother to describe what he meant by one axis and in any case he now argues strongly against the diagram and gets very annoyed when people call it the Stacey Matrix.

    But on the whole I think Jonny’s final comment sums up the problems with both of these common frameworks. They are metaphors – and metaphors are themselves complex beasts deliberately designed to be imprecise; to ground the unknown and unknowable in the known; to describe the indescribable. We need to use them as precision instruments with great caution.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s