A Model for Evaluation of Transformation to a Green Energy Future

I just got back from the IDEAS global assembly, which carried the theme: Evaluation for Transformative Change: Bringing experiences of the Global South to the Global North. The trip prompted me to think about how complexity can be applied to evaluating green energy transformation efforts. I have a longish document (~2000 words) that goes into detail, but here is my quick overview.

Because transformation is a complex process, any theory of change used to understand or measure it must be steeped in the principles of complexity.

The focus must be on the behavior of complex systems, not on “complex systems”. (Complex systems or complex behavior?)

In colloquial terms, a transformation to reliance on green energy can be thought of as a “new normal”. In complexity terms, “new normal” connotes an “attractor”, i.e. an equilibrium condition where perturbations settle back to the equilibrium. (Why might it be useful to think of programs and their outcomes in terms of attractors?)

A definition of a transformation to green energy must specify four measurable elements: 1) geographical boundaries, 2) level of energy use, 3)  time frame, and 4) level of precision. For instance: “We know that transformation has happened if in place X, 80% of energy use comes from green sources, and has remained at about that level for five years.”

Whether or not that definition is a good one is an empirical question for evaluators to address. What matters is  whether the evaluation can provide guidance as to how to improve efforts at transformation.

Knowing if a condition obtains is different from knowing why a condition obtains. To address the “why”, evaluation must produce a program theory that recognizes three complexity behaviors – attractors, sensitive dependence, and emergence.

Because of sensitive dependence, unambiguous relationships among variables may not continue over time or across contexts. Because of emergence, transformation does not come about as a result of a fixed set of interactions among well-defined elements. The result of sensitive dependence and emergence may produce outcomes that exist within identifiable boundaries,  i.e. within an attractor space. If they do, that is akin to “predicting an outcome”. If they do not, that is akin to showing that a program theory is wrong.

Models with many elements and connections cannot be used for prediction, or even, for understanding transformation as a holistic construct. Small parts of a large model, however, can be useful for designing research and for understanding the transformation process.

Six tactics can be used for evaluating progress toward transformation: 1) develop a TOC that recognizes complex behavior, 2) measure each individual factor in the model, 3) consider how much change took place in each element of the model, 4) focus on parts of the model, but not the model as a whole, 5) use computer-based modeling, 6) employ a multiple-comparative case study design.

As all the analysis takes place, interpret the data with respect to the limitations of models, and the implications of emergence, sensitive dependence, and attractor behavior.

Preliminary Notes on The Application of Concepts from Evolutionary Biology and Ecology to Evaluation

Introduction
I’m working on the notion that there are circumstances when evaluators should think of programs as species of organisms adapting in an ecological niche. This document contains some preliminary thoughts on that topic. I’m groping toward an article, a series of blog posts, and some YouTube movies. I’m looking for any suggestions anyone might have to help me along.

Inapplicability to Evaluation
One thing I need to be careful about is that evolution is agnostic as to the outcome, it only cares about species viability. We care about goals.

Evolution does not mean “progress” in the sense that we humans think of making life better for people. It’s not hard to imagine a dystopian, but highly sustainable, evolutionarily successful world. (In general, people talk about “sustainability” as if it is an unalloyed good. It’s not. It is neutral with respect to being “desirable” or “undesirable” concerning desired ends. In my business the problem is that systems are too sustainable. You can beat them over the head with data until the cows come home, and still they do not change.)

When is an evolutionary biological perspective needed?
People get turned on when I give my complexity workshops and think that they have to apply principles of complex behavior in everything they do. Hooey. It’s one thing to say there is complex behavior operating. It’s quite something else to say that one has to go to the trouble of dealing with it. There is a large and legitimate need for evaluation of single programs with respect to first-order outcomes. No overwhelming need to deal with complexity there. Ditto evolutionary biology.

Toward the end of Superforecasting: The Art and Science of Prediction, Tetlock  has a nice discussion of when incremental forecasting is useful, given the ubiquity of log-linearly distributed rare occurrences that can change the course of events. (See Rumsfeld memo to Bush, Cheney, and Rice.) There is an analogous argument to make about evaluation. If it’s true for evaluation in general, it’s certainly the case for using knowledge of evolutionary biology to shape an evaluation. (Actually I can make a good case that we should only evaluate short-term, proximate outcomes. But that’s another story.)

Also, evolution does not care about whether an organism lives of dies. It cares about whether a species thrives of goes extinct. So applying evolutionary biology to evaluation is only appropriate when that which is being evaluated is a class of programs. It’s the viability of the class that matters, not the individual programs within the class.

How can an evolutionary biological perspective be used in evaluation?
An evolutionary biology perspective can be used in a few different ways.

  • Technical, as the authors do in “Organizational Ecology”. They actually apply Lotka-Volterra equations to the birth and death of types of organizations. Using more familiar methodologies, I can see evaluators doing things like estimating how quickly a program’s environment is changing, or the diversity of similar programs in the same ecological niche.
  • As a vocabulary and a set of constructs that can help developing models, devising methodologies, and interpreting data. Some examples: 1) Fitness landscape: If a set of programs begin to evolve in a particular direction, what are the consequences of small changes for the fitness of that set of programs?

2) Co-evolution of species and environment: In the U.S. at least, Uber is a great example. It could only exist because the environment was conducive – IT infrastructure, GPS, weaknesses in current taxi services, availability of venture and human capital, and so on. But once the “species” began to thrive, the environment had to adapt to it, e.g. rules about traffic congestion, public conveyance regulations in various cities, dedicated waiting spaces at airports, and so on. Depending on the nature and direction of the adaptation, the species may or may not thrive. (It’s an open question. Uber is losing heaps and gobs of money.)

Birth and extinction
If evaluation is going to take an evolutionary biology perspective, it has to take the concept of species birth and extinction seriously. We care about spurring innovation and stifling ineffective programs.

One very juicy example is the advent of the mail order business, as invented by Sears. This required the establishment of Rural Free Delivery in the late 19th century, high local prices, and a market pull for a broad range of goods. It had a truly profound effect on bringing a variety of goods to a large percentage of the population at lower prices, allowing African Americans to buy and get credit for purchases that were not available locally, and badly affected the income of local merchants, some of whom sponsored book burnings of the Sears catalogue.

My problem is that I cannot think of as good an example for the type of stuff that evaluators would evaluate. It’s easy enough to think of examples, but not big, interesting ones. For instance, there are STEM programs that did not exist before Sputnik and did not exist for girls until 20 or so years ago. There were always private schools in the US, but not charter schools in their present incarnation. How long ago was it that there were no programs in environmental education, or climate change mitigation efforts? In terms of extinction, think of big state mental hospitals in the US, and specialized hospital wards for AIDS patients.

Links with other types of evaluation
I would do well to make the case that an evolutionary biological perspective has ties to other trends in evaluation. I can think of three: complexity, developmental evaluation, and sustainability. I have the first one pretty well worked out. Not so much the other two.

Examples I’m looking for
I’m looking for examples that make the transition from evaluating a program to evaluating a group of similar programs, which is what an evolutionary perspective would require. My difficulty is finding an example that evaluators would recognize as something they might get paid to do.

As of now I’m pondering two possibilities.

  • Sustainability (See NDE Summer 2019.)
  • Telemedicine/telehealth. This has lots of elements I can use. Ancestors (back to plain old telephone service), rapid evolution, adaptation to changing environment (costs of health care, docs leaving rural areas, etc.), co-evolution as the innovation affects is environment, “species” nested in “genius” (e.g. maternal health and surgical consulting), competition, and much else besides.

To build on the example of telemedicine, someone might get paid to evaluate a telehealth counseling program for nursing mothers in Australia, i.e. a program that had an identifiable source of funding coming from some small corner of the Ministry of Health. But getting paid to evaluate the overall consequences of having a telehealth infrastructure and set of services in the country? A nice piece of social science research to be sure, but I’m not sure how many of our brethren would see it as an “evaluation”. I have a feeling that Foundations might do this at a program level, but I’m not sure.

 

 

 

 

 

Evaluating for complexity when programs are not designed that way Part 10 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 10 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Post status
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way up

Evaluating for complexity when programs are not designed that way

There are good reasons to design programs with complex behavior in mind, and good reasons not to. (For the reasons not to, see Part 3 Ignoring complexity can make sense, which makes a case for the rationality of letting the sleeping complexity dog sleep.)

The fact that programs are not designed in ways that recognize complexity does not mean that evaluation should ignore complexity. My reasoning is that even if programs are not designed with complex behavior in mind, knowing about complex behavior can still be useful to stakeholders. Figure 1 illustrates what I have in mind.

Figure 1: Overlay of Complex Model on a non-Complex Program Design

Blue region of model
Blue represents the original program model. It has much in it that is oblivious to complex behavior (and to common sense, for that matter.)

Green region of model
Green represents a program model that recognizes some of the complex behaviors that may be operating. To make my point, I superimposed it on the original model

  • Network effects are included.
  • Undesirable consequences are acknowledged.
  • Data are collected and analyzed with respect to groups of outcomes, without regard to any unique outcome within the group.
  • The social implications of distribution shapes are considered over and above the technical aspects of doing statistical analysis.

If I had my choice, I’d evaluate with respect to the green model exclusively, but I acknowledge that stakeholders may need more fine-grained information. In any case, as you know by now, I am a big supporter of using more than one model in any single evaluation.  All models are wrong, but many different models can be both wrong and useful in different ways.

Assumptions and Risk

Jane Buckley, August 9th, 2019

Assumptions make up a significant percentage of every person’s everyday thinking. Most are subconscious, implicit and go without recognition (when I leave work, my car will be where I left it this morning). Others rise to the surface of our awareness and we can use that awareness to our advantage by checking the validity of that assumption; for example, I my ask my partner if my assumption that he purchased milk while at the grocery store is correct.

All assumptions represent some amount of risk, suggesting the following questions:

  • why do some assumptions emerge from our subconscious to be checked while others remain hidden?
  • how can we best surface assumptions and risks for the purpose of program planning and evaluation?

In order to answer these questions, we must first acknowledge that assumptions are an adaptive mechanism the brain uses to manage the sheer volume of stimuli and information that it takes on every day. If I had to constantly monitor my car’s position, I wouldn’t be able to write this blog or do much of anything else. Assumptions are the cognitive short-cuts that allow us to be so productive and creative in our thinking.

It makes sense that our brains are always looking for ways to consolidate and use assumptions as much as possible to move us through our environment and lives more efficiently. Following this reasoning, the only reason we would consciously identify and address an assumption is because the risk associated with that assumption may supersede the advantages it offers. This is, in part, an answer to the first question:

  • Assumptions naturally become explicit when their associated risk is a) known and b) understood to be greater than the advantage of maintaining the assumption

To understand how this assumption-risk dynamic plays out in program work, we need to unpack the idea of risk a bit more.

When we are thinking about community development programs, (programs focused on health, economic development, youth development education, etc.) risks include program conditions or components that might:

  1. inhibit positive outcomes
  2. do harm to beneficiaries
  3. waste resources

These three types of risk (among other more context-specific risks) endanger program success and do supersede any advantage gained by maintaining an associated assumption. Therefore, it is critical that programs uncover assumptions associated with the various types of risk so that they can be addressed (checked or consciously accepted) by program staff and leaders.

It can be hard for program staff to uncover assumptions that underly the programs that they work on day-in and day-out. Even when program staff understand the role and typology of assumptions (paradigmatic, prescriptive, causal), it can be hard to identify them without a more targeted path in to the subconscious. For some program teams, using the idea of risk is an effective way of facilitating this brainstorm. For example, one might ask, “What are the outcomes, outside our formal program plans, that have to hold true in order for us to meet our objectives?” Or, “Why do we think this is the best approach given our available resources?” This brings us to an answer to the second question:

  • It is critical that program professionals, who have context and program specific expertise, intentionally engage in surfacing implicit program assumptions in preparation for planning, evaluation and learning. This can be accomplished either by brainstorming assumptions directly and identifying associated risks or by brainstorming possible program risks and identifying the associated assumptions.

 

A few very successful programs, or many, connected, somewhat successful programs? Part 9 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 9 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way 8/15

Very successful programs, or many, connected, somewhat successful programs?

This is the third blog post in this series that touches on the on the notion that having a less successful program may be better than having a more successful program. (The first was Part 4 Complex behavior can be evaluated using comfortable, familiar methodologies. The second was Part 6: Joint optimization of unrelated outcomes.) A common theme in these posts is that unpredictable changes are likely when multiple isolated changes connect.

If these kinds of effects can result from networking, one could argue that resources are better invested in multiple programs, each of which is somewhat successful, than in a few programs, each of which is highly successful. (Of course, there is no guarantee whatsoever that these changes will be desirable, nor is it certain that the networking effects will result in greater total of positive change. But for now, I’m looking on the bright side – both the direction and amount of change will be desirable.)

To illustrate, consider the two scenarios depicted in Figure 1. In both, red arrows depict the short-term success of the program on a ten-point scale.  The scenario on the left shows four very successful programs, each of which works in isolation from the others. The scenario on the right shows six somewhat successful programs, each of which has dense connections with the others.

Figure 1: Comparison of Few Non-networked Very Successful Programs and Many Networked Moderately Successful Programs

In the short term, the scenario on the left is the most beneficial. After all, just counting outcome points shows a total of 35, versus 19 on the right. But what might happen over time? I have no idea, but I’ll spin three not too farfetched possibilities.

Changes specified in the model:  Note that in Figure 2 there are three direct connections with girls’ education: 1) SME capacity, 2) civic skills, and 3) crop yield. The SME and crop yield relationships are plausible because of two constructs not shown in the model, namely family income and affordability of school fees. The model is: crop yield and/or SME capacity –> family income –> affordability of school fees.  The civic skills relationship makes sense because as people learn to participate in civic life, the quality of education is likely to get better.

The direct relationships in the previous paragraph are augmented by an indirect relationship: civic skills –> SME capacity. Why might this relationship make a difference? Because SME capacity leads to family income, so all programs that affect SME capacity will influence girls’ education.

Changes not specified in the model:    It is not hard to imagine that SME owners will be among those who participate in civic skills training programs, and that such participation will bring business owners together who did not previously know each other.  Between the new connections and the new skills, is it too hard to imagine that novel business opportunities will develop, or that creative ways to solve community problems will be revealed?

Sustainability consequences of collective change across multiple programs   All the changes described above can be thought of as generalized improvements in a higher-level construct called “community functioning”. Over time, “community functioning” may circle back to the original programs, and thus improve their outcomes as well.

To take just one small possibility as an example. Imagine that the highly competent administrator of the malaria prevention program left for another job. What is the likelihood of an equally competent administrator being available to take his or her place? Those odds are better if the networking effects shown on the right in figure 1 have been operating. Why? Because increased wealth in the community may result in a salary that would make the job desirable, and because the overall quality of life in the community may make it a desirable place to live.

Figure 2 is a graphical view of a possible set of consequences of implementing the models shown in Figure 1.

Figure 2: Graphical Depiction of Networked and Non-networked Program Outcomes Over Time
  • Initial outcomes in the non-networked scenario exceed the initial outcomes for the same programs in the networked scenario. (Black stripes for initial non-networked, blue stripes for initial networked.)
  • In the non-networked scenario, the level of program outcomes remain constant over time. (Black stripes versus black solid.)
  • In the networked scenario outcomes grow so that over time, in five of the six cases, the networked outcomes exceed their initial values. (Blue stripped versus blue solid.)
  • In three of the six cases, networked values grow to exceed non-networked values. (Blue solid versus black solid.)
  • A, B, and C represent desirable changes that resulted from network effects that could not have been envisioned for each of the programs in isolation. These cases do not have accompanying black lines because they were never part of the outcome models for the original programs.
  • D is there to remind us that we should never assume that all program outcomes (especially when not part of the program model) will be desirable. If I wanted to be pessimistic, but maybe not too farfetched, I could have included E, F, G, and H, all of which I would have depicted as negative.

An implicit assumption in the above is that networking relationships have either been designed into the programs or implemented in such a way that they evolve naturally. (For example, all the programs are in the same geographical area or administrative unit.) It is by no means certain that networking relationships will develop. Whether they do, and what they do, are empirical questions that need to be addressed as part of the evaluation.

The above paragraph implies that program designers have considered the possibility of connections among their programs. Part 3 (Ignoring complexity can make sense) makes the point that designers may not be doing their jobs well if they did try to build those connections. But even if they don’t, evaluation can still provide valuable insight about the complex behavior that the program designers are ignoring. That is the subject of the next (and thankfully final) section of this series of blogs (Part 10 Evaluating for complexity when programs are not designed that way).

How can the concept of “attractors” be useful in evaluation? Part 8 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 8 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? 8/9
10 Evaluating for complexity when programs are not designed that way 8/19

Why might it be useful to think of programs and their outcomes in terms of attractors?

Exercises to understand the historical behavior of a program, (or a class of programs), is a worthwhile activity in any evaluation. People should always do it. My hope in this blog post, however, is to make a convincing case that the concept of an “attractor” provides a richer way to think about a program’s history, its likely behavior in the future, and its outcomes.

The Wikipedia definition of an attractor is:

In the mathematical field of dynamical systems, an attractor is a set of numerical values toward which a system tends to evolve, for a wide variety of starting conditions of the system. System values that get close enough to the attractor values remain close even if slightly disturbed.

With a definition like that, it helps to know what Wikipedia thinks a dynamical system is.

In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in a geometrical space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a pipe, and the number of fish each springtime in a lake.

Figure 1 shows some more examples of attractors. #1 is a map of watersheds in the United States. #2 shows multiple animal species at a watering hole. #3, well let’s just say that this one is my favorite. #4 shows planetary orbits.

The pictures in Figure 1 have a limitation, namely that the constructs they depict are real objects in physical space. It’s not just that certain elevations describe water flow. It’s that physical water follows through those elevations. It’s not just that playgrounds appeal to kids. It’s that real kids inhabit the playground space. This coincidence of attractor and physical object does not apply to all attractors. A particularly shaped physical space can also be used to describe less tangible constructs. To telegraph an example I’ll use later, an attractor space in the shape of a pendulum can be used to describe fluctuations in government policy.

Figure 1: Examples of Attractors

Why bother to think in terms of attractors?
Why might it be worth thinking of programs and their outcomes in terms of attractors? Why not just rely on observing history and be done with it? Because thinking in terms of attractors:

  • Provides insightful visualization of how outcomes may change over time.
  • Kicks program theory up in the level of abstraction, thus revealing similarities and differences among seemingly different activities.
  • Can reveal likely system behavior even without good historical data.

To illustrate, I’ll use the example of a federal regulatory agency. Between a lot of evaluation I have done, a lot of reading relevant literature, and quite a bit of talking to close observers, this is a subject I know something about.

There is some very interesting social science research and theory on the question of maintaining safety by emphasizing rule compliance or engaging in cooperative activity between government and industry. It’s a complicated story, but the bottom line is that neither strategy can be entirely successful by itself, and that getting the mix right at any given time is a tricky problem. In fact, the most effective mix moves back and forth with circumstances. A dated, but good explanation of this phenomenon is Shapiro, S. A., & Rabinowitz, R. S. (1997). Punishment versus cooperation in regulatory enforcement: A case study of Osha. Administrative Law Review, 49(4). For a more general discussion, try: Sparrow, Malcom K. The Regulatory Craft: Controlling Risks, Solving Problems, and Managing Compliance 2000.

The image of a pendulum comes to mind. Looking back, it seems as if the emphasis that regulatory agencies place on enforcing compliance and working cooperatively oscillates, moving in one direction and then swinging back in the other. One way to look at this is to say that the attractor space traces the ark of a pendulum. Figure 2 shows what have in mind. Figure 2 can be thought of as a form of program model that deliberately omits a great deal of information in order to highlight the oscillatory behavior of the program. (For much more on models, see Part 5 – A pitch for sparse models.)

Figure 2: Pendulum as a Descriptor of Cooperation and Enforcement in a Regulatory Agency

Considering the argument I made above for thinking in terms of attractors and not exclusively in terms of program history, what’s the advantage here?

Visualization of how outcomes may change over time: Sometimes pictures help understanding, and I think this is one of those cases, particularly with respect to sustainability. Imagine evaluating a program designed to minimize a safety problem that was based on a cooperative program theory. No matter how successful that program, I’d assess the likelihood of long-term sustainability as lower in the scenario at the left rather than the scenario on the right? Why? Because I know the shape of the attractor. Of course the model leaves a lot out. It does not account for relationships between labor and management in the industry, or the inclinations of agency leadership, or developments in safety technology, or any of the myriad factors that may affect sustainability. All I know is the shape of the attractor and the location of the agency on it. Figure 2 is a sparse model that abstracts the attractor shape from all the other reasons that may affect sustainability. But it’s a sparse model that provides a lot of insight that would be obscured if I tried to build a mode that included labor/management relationships, the inclinations of policy makers, or developments in safety technology.

Level of abstraction: Figure 2 may be useful for understanding sustainability for one particular program in one particular regulatory agency, but it is also useful for revealing that the same attractor can be applied to any setting where the mission inclines toward enforcing compliance, but where the social reality of ensuring safety also calls for cooperation.

Historical data: It is always important to have empirical data on how an agency has behaved in the past. It’s one thing to draw a picture such as Figure 2, it is something else to have empirical knowledge of how much cooperation and enforcement is taking place, how long the trend has been going in one or another direction, what the balance point is, and so on. Unfortunately, in most instances an evaluator will not have such data. As examples of the difficulties involved, consider a few of the issues that spring to mind. 1) How obvious is it that a program falls into the categories of enforcement and cooperative emphasis? 2) How much activity is informal, and thus cannot be labeled as a “program” or a “policy” that can be identified and assessed? 3) What’s the metric for cooperation and enforcement? In the absence of good data, an evaluator may have to rely on expert judgement. That might provide a valid assessment, but it’s not very satisfying, either. It’s precisely because precise historical data may be lacking that attractor shapes provide insight. It may not take too much experience and historical knowledge to identify the general shape of an attractor and a reasonable sense of where an agency stands on it.

Before I move on to the next topic, I feel a need to make a point about the compliance/cooperation issue. The example I gave above springs from my own experience in evaluating cooperative programs. Please do not take this as an endorsement of a cooperation-only approach to improving safety. I do not believe that such a policy can be effective.

In the spirit of all models being wrong

I believe that the simple pendulum attractor model is useful because by ignoring many relevant issues, it succeeds at revealing some important program behaviors. But appreciating how the model is wrong is also revealing.

There are really two bobs: Figure 2 ignores the fact that there are always elements of cooperation and enforcement at play. Figure 3 is a more accurate portrayal of the situation.

Figure 3: Another Attractor Model for a Regulatory Agency

Smooth versus discontinuous movement: The smooth, predictable movement of a pendulum bob does not characterize how regulatory agencies behave. Cooperation with industry can change very quickly in light of a plane crash, a railroad derailment that leaks toxic chemicals, a pipeline explosion, or an interstate bus that ran off the road. Programs and policies can be easily canceled in the light of legislative and public outcry. Implementing programs and policies, however, take time. Movement in a cooperative direction is incremental.

These are the kinds of dynamics that makes me such a strong advocate of employing multiple models when evaluating the same program. Figure 2 has the advantage of clearly portraying the attractor that governs the sustainability of cooperative programs. It provides that simplicity, however, by ignoring the model depicted in Figure 3, which recognizes the co-existence of enforcement and cooperation. If my sole interest was in sustainability, I’d stick with Figure 2. But if I was also interested in whether agency staff would accept the cooperative program, I might also look at Figure 3 because I’d posit that the balance of cooperative and enforcement efforts may be relevant to acceptance. “Also” is the operative word in the previous sentence. Both models would be useful, each for a different part of the evaluation.

I can also imagine doing evaluation in a regulatory agency that needed only the model in Figure 3. For example, imagine a program designed to enhance communication and cooperation among factions within the agency that had different opinions about the value of enforcement and cooperation. If I ever got lucky enough to do an evaluation like that, I’d start by invoking the model in Figure 3. It’s not that one model is correct and the other is incorrect. It’s a matter of each model being useful for different reasons.

Why should evaluators care about emergence? Part 7 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 7 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

Why should evaluators care about emergence?

What’s the difference between an automobile engine (Figure 1 and a beehive (Figure 2)? After all, in each the whole is larger than the sum of its parts.

The answer is that for the engine, it is possible to explain what each part is and what role that part plays in the functioning of the engine.  I can tell you shape of a cylinder, how it is constructed, why it is needed to contain combustible material, how it moves up and down and is attached to the crankshaft, and so on. When I finished, you would know how an internal combustion engine worked and how a cylinder contributes to the overall functioning of the engine.

I could not give you such an explanation for how any single bee contributes to the construction or functioning of a beehive. The beehive materia

Figure 2: Beehive

lizes when all those bees interact with each other as they do their simple bee things. That is emergence.

Change happens when the parts of an engine are assembled. Change happens when bees do bee things. But the type of change is different. Only the latter is an emergent phenomenon

Bees are bees, but what of emergence at the human scale of people, organizations, social groups, and political entities? It’s easy to find many examples. Some of the ones I like are: 1) The fractal nature of market fluctuations cannot be explained by the behavior of individual buyers and sellers. 2) The number of patents per capita in a city (when plotted logarithmically) increases more than the increase in a city’s population. 3) Traffic jams move in a direction opposite the direction of the flow of traffic. 4) the collective consequences of people, policy, business and infrastructure yields specialized districts in cities. In each of these cases, the behavior of the larger unit cannot be explained by breaking it down into its constituent parts.

Emergence matters in evaluation because it implies program theory that acknowledges that phenomena cannot be understood in terms of constituent parts. To take the example of specialized districts in cities. Any effort to understand the consequences of such districts for the city’s appeal to outsiders needs to be understood in terms of the impact of the district on city life. It would not help to do such an analysis by researching the individual people, policies, businesses and infrastructure that comprise the district. Those do not, and cannot, “add up” to “district appeal”. To extend the example, the appeal of the city to outsiders probably has to do with the entire group of specialized districts, and how those districts affect each other. Districts might be a meaningful unit of analysis, but the constituent parts of districts would not.

Why not try to do the analysis in terms of the constituent parts? The reason has nothing to do with our analytical capabilities or our access to data. The reason is that because these are emergent phenomena, it is no more possible to understand behavior in terms of its parts than it is to understand a beehive in terms of individual bees.

Figure 3: Emergent Versus Discrete Causality

The challenge for evaluation centers on program theory, as illustrated in Figure 3. The difference between the model at the top and the model at the bottom does not seem all that dramatic. In fact, the difference is profound. The model at the top states that it is possible to understand “appeal” in terms of the individual contributions of people, policy, business, and infrastructure. The model at the bottom acknowledges that however one might understand “appeal”, it is not in terms of the unique contributions of people, policy, business, and infrastructure. The difference between the two models has very different consequences for

  • Methodology
  • Data requirements
  • Stakeholder expectations, and
  • What we can say about impact.