Evaluating for complexity when programs are not designed that way Part 10 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 10 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Post status
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way up

Evaluating for complexity when programs are not designed that way

There are good reasons to design programs with complex behavior in mind, and good reasons not to. (For the reasons not to, see Part 3 Ignoring complexity can make sense, which makes a case for the rationality of letting the sleeping complexity dog sleep.)

The fact that programs are not designed in ways that recognize complexity does not mean that evaluation should ignore complexity. My reasoning is that even if programs are not designed with complex behavior in mind, knowing about complex behavior can still be useful to stakeholders. Figure 1 illustrates what I have in mind.

Figure 1: Overlay of Complex Model on a non-Complex Program Design

Blue region of model
Blue represents the original program model. It has much in it that is oblivious to complex behavior (and to common sense, for that matter.)

Green region of model
Green represents a program model that recognizes some of the complex behaviors that may be operating. To make my point, I superimposed it on the original model

  • Network effects are included.
  • Undesirable consequences are acknowledged.
  • Data are collected and analyzed with respect to groups of outcomes, without regard to any unique outcome within the group.
  • The social implications of distribution shapes are considered over and above the technical aspects of doing statistical analysis.

If I had my choice, I’d evaluate with respect to the green model exclusively, but I acknowledge that stakeholders may need more fine-grained information. In any case, as you know by now, I am a big supporter of using more than one model in any single evaluation.  All models are wrong, but many different models can be both wrong and useful in different ways.

A few very successful programs, or many, connected, somewhat successful programs? Part 9 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 9 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way 8/15

Very successful programs, or many, connected, somewhat successful programs?

This is the third blog post in this series that touches on the on the notion that having a less successful program may be better than having a more successful program. (The first was Part 4 Complex behavior can be evaluated using comfortable, familiar methodologies. The second was Part 6: Joint optimization of unrelated outcomes.) A common theme in these posts is that unpredictable changes are likely when multiple isolated changes connect.

If these kinds of effects can result from networking, one could argue that resources are better invested in multiple programs, each of which is somewhat successful, than in a few programs, each of which is highly successful. (Of course, there is no guarantee whatsoever that these changes will be desirable, nor is it certain that the networking effects will result in greater total of positive change. But for now, I’m looking on the bright side – both the direction and amount of change will be desirable.)

To illustrate, consider the two scenarios depicted in Figure 1. In both, red arrows depict the short-term success of the program on a ten-point scale.  The scenario on the left shows four very successful programs, each of which works in isolation from the others. The scenario on the right shows six somewhat successful programs, each of which has dense connections with the others.

Figure 1: Comparison of Few Non-networked Very Successful Programs and Many Networked Moderately Successful Programs

In the short term, the scenario on the left is the most beneficial. After all, just counting outcome points shows a total of 35, versus 19 on the right. But what might happen over time? I have no idea, but I’ll spin three not too farfetched possibilities.

Changes specified in the model:  Note that in Figure 2 there are three direct connections with girls’ education: 1) SME capacity, 2) civic skills, and 3) crop yield. The SME and crop yield relationships are plausible because of two constructs not shown in the model, namely family income and affordability of school fees. The model is: crop yield and/or SME capacity –> family income –> affordability of school fees.  The civic skills relationship makes sense because as people learn to participate in civic life, the quality of education is likely to get better.

The direct relationships in the previous paragraph are augmented by an indirect relationship: civic skills –> SME capacity. Why might this relationship make a difference? Because SME capacity leads to family income, so all programs that affect SME capacity will influence girls’ education.

Changes not specified in the model:    It is not hard to imagine that SME owners will be among those who participate in civic skills training programs, and that such participation will bring business owners together who did not previously know each other.  Between the new connections and the new skills, is it too hard to imagine that novel business opportunities will develop, or that creative ways to solve community problems will be revealed?

Sustainability consequences of collective change across multiple programs   All the changes described above can be thought of as generalized improvements in a higher-level construct called “community functioning”. Over time, “community functioning” may circle back to the original programs, and thus improve their outcomes as well.

To take just one small possibility as an example. Imagine that the highly competent administrator of the malaria prevention program left for another job. What is the likelihood of an equally competent administrator being available to take his or her place? Those odds are better if the networking effects shown on the right in figure 1 have been operating. Why? Because increased wealth in the community may result in a salary that would make the job desirable, and because the overall quality of life in the community may make it a desirable place to live.

Figure 2 is a graphical view of a possible set of consequences of implementing the models shown in Figure 1.

Figure 2: Graphical Depiction of Networked and Non-networked Program Outcomes Over Time
  • Initial outcomes in the non-networked scenario exceed the initial outcomes for the same programs in the networked scenario. (Black stripes for initial non-networked, blue stripes for initial networked.)
  • In the non-networked scenario, the level of program outcomes remain constant over time. (Black stripes versus black solid.)
  • In the networked scenario outcomes grow so that over time, in five of the six cases, the networked outcomes exceed their initial values. (Blue stripped versus blue solid.)
  • In three of the six cases, networked values grow to exceed non-networked values. (Blue solid versus black solid.)
  • A, B, and C represent desirable changes that resulted from network effects that could not have been envisioned for each of the programs in isolation. These cases do not have accompanying black lines because they were never part of the outcome models for the original programs.
  • D is there to remind us that we should never assume that all program outcomes (especially when not part of the program model) will be desirable. If I wanted to be pessimistic, but maybe not too farfetched, I could have included E, F, G, and H, all of which I would have depicted as negative.

An implicit assumption in the above is that networking relationships have either been designed into the programs or implemented in such a way that they evolve naturally. (For example, all the programs are in the same geographical area or administrative unit.) It is by no means certain that networking relationships will develop. Whether they do, and what they do, are empirical questions that need to be addressed as part of the evaluation.

The above paragraph implies that program designers have considered the possibility of connections among their programs. Part 3 (Ignoring complexity can make sense) makes the point that designers may not be doing their jobs well if they did try to build those connections. But even if they don’t, evaluation can still provide valuable insight about the complex behavior that the program designers are ignoring. That is the subject of the next (and thankfully final) section of this series of blogs (Part 10 Evaluating for complexity when programs are not designed that way).

How can the concept of “attractors” be useful in evaluation? Part 8 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 8 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? 8/9
10 Evaluating for complexity when programs are not designed that way 8/19

Why might it be useful to think of programs and their outcomes in terms of attractors?

Exercises to understand the historical behavior of a program, (or a class of programs), is a worthwhile activity in any evaluation. People should always do it. My hope in this blog post, however, is to make a convincing case that the concept of an “attractor” provides a richer way to think about a program’s history, its likely behavior in the future, and its outcomes.

The Wikipedia definition of an attractor is:

In the mathematical field of dynamical systems, an attractor is a set of numerical values toward which a system tends to evolve, for a wide variety of starting conditions of the system. System values that get close enough to the attractor values remain close even if slightly disturbed.

With a definition like that, it helps to know what Wikipedia thinks a dynamical system is.

In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in a geometrical space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a pipe, and the number of fish each springtime in a lake.

Figure 1 shows some more examples of attractors. #1 is a map of watersheds in the United States. #2 shows multiple animal species at a watering hole. #3, well let’s just say that this one is my favorite. #4 shows planetary orbits.

The pictures in Figure 1 have a limitation, namely that the constructs they depict are real objects in physical space. It’s not just that certain elevations describe water flow. It’s that physical water follows through those elevations. It’s not just that playgrounds appeal to kids. It’s that real kids inhabit the playground space. This coincidence of attractor and physical object does not apply to all attractors. A particularly shaped physical space can also be used to describe less tangible constructs. To telegraph an example I’ll use later, an attractor space in the shape of a pendulum can be used to describe fluctuations in government policy.

Figure 1: Examples of Attractors

Why bother to think in terms of attractors?
Why might it be worth thinking of programs and their outcomes in terms of attractors? Why not just rely on observing history and be done with it? Because thinking in terms of attractors:

  • Provides insightful visualization of how outcomes may change over time.
  • Kicks program theory up in the level of abstraction, thus revealing similarities and differences among seemingly different activities.
  • Can reveal likely system behavior even without good historical data.

To illustrate, I’ll use the example of a federal regulatory agency. Between a lot of evaluation I have done, a lot of reading relevant literature, and quite a bit of talking to close observers, this is a subject I know something about.

There is some very interesting social science research and theory on the question of maintaining safety by emphasizing rule compliance or engaging in cooperative activity between government and industry. It’s a complicated story, but the bottom line is that neither strategy can be entirely successful by itself, and that getting the mix right at any given time is a tricky problem. In fact, the most effective mix moves back and forth with circumstances. A dated, but good explanation of this phenomenon is Shapiro, S. A., & Rabinowitz, R. S. (1997). Punishment versus cooperation in regulatory enforcement: A case study of Osha. Administrative Law Review, 49(4). For a more general discussion, try: Sparrow, Malcom K. The Regulatory Craft: Controlling Risks, Solving Problems, and Managing Compliance 2000.

The image of a pendulum comes to mind. Looking back, it seems as if the emphasis that regulatory agencies place on enforcing compliance and working cooperatively oscillates, moving in one direction and then swinging back in the other. One way to look at this is to say that the attractor space traces the ark of a pendulum. Figure 2 shows what have in mind. Figure 2 can be thought of as a form of program model that deliberately omits a great deal of information in order to highlight the oscillatory behavior of the program. (For much more on models, see Part 5 – A pitch for sparse models.)

Figure 2: Pendulum as a Descriptor of Cooperation and Enforcement in a Regulatory Agency

Considering the argument I made above for thinking in terms of attractors and not exclusively in terms of program history, what’s the advantage here?

Visualization of how outcomes may change over time: Sometimes pictures help understanding, and I think this is one of those cases, particularly with respect to sustainability. Imagine evaluating a program designed to minimize a safety problem that was based on a cooperative program theory. No matter how successful that program, I’d assess the likelihood of long-term sustainability as lower in the scenario at the left rather than the scenario on the right? Why? Because I know the shape of the attractor. Of course the model leaves a lot out. It does not account for relationships between labor and management in the industry, or the inclinations of agency leadership, or developments in safety technology, or any of the myriad factors that may affect sustainability. All I know is the shape of the attractor and the location of the agency on it. Figure 2 is a sparse model that abstracts the attractor shape from all the other reasons that may affect sustainability. But it’s a sparse model that provides a lot of insight that would be obscured if I tried to build a mode that included labor/management relationships, the inclinations of policy makers, or developments in safety technology.

Level of abstraction: Figure 2 may be useful for understanding sustainability for one particular program in one particular regulatory agency, but it is also useful for revealing that the same attractor can be applied to any setting where the mission inclines toward enforcing compliance, but where the social reality of ensuring safety also calls for cooperation.

Historical data: It is always important to have empirical data on how an agency has behaved in the past. It’s one thing to draw a picture such as Figure 2, it is something else to have empirical knowledge of how much cooperation and enforcement is taking place, how long the trend has been going in one or another direction, what the balance point is, and so on. Unfortunately, in most instances an evaluator will not have such data. As examples of the difficulties involved, consider a few of the issues that spring to mind. 1) How obvious is it that a program falls into the categories of enforcement and cooperative emphasis? 2) How much activity is informal, and thus cannot be labeled as a “program” or a “policy” that can be identified and assessed? 3) What’s the metric for cooperation and enforcement? In the absence of good data, an evaluator may have to rely on expert judgement. That might provide a valid assessment, but it’s not very satisfying, either. It’s precisely because precise historical data may be lacking that attractor shapes provide insight. It may not take too much experience and historical knowledge to identify the general shape of an attractor and a reasonable sense of where an agency stands on it.

Before I move on to the next topic, I feel a need to make a point about the compliance/cooperation issue. The example I gave above springs from my own experience in evaluating cooperative programs. Please do not take this as an endorsement of a cooperation-only approach to improving safety. I do not believe that such a policy can be effective.

In the spirit of all models being wrong

I believe that the simple pendulum attractor model is useful because by ignoring many relevant issues, it succeeds at revealing some important program behaviors. But appreciating how the model is wrong is also revealing.

There are really two bobs: Figure 2 ignores the fact that there are always elements of cooperation and enforcement at play. Figure 3 is a more accurate portrayal of the situation.

Figure 3: Another Attractor Model for a Regulatory Agency

Smooth versus discontinuous movement: The smooth, predictable movement of a pendulum bob does not characterize how regulatory agencies behave. Cooperation with industry can change very quickly in light of a plane crash, a railroad derailment that leaks toxic chemicals, a pipeline explosion, or an interstate bus that ran off the road. Programs and policies can be easily canceled in the light of legislative and public outcry. Implementing programs and policies, however, take time. Movement in a cooperative direction is incremental.

These are the kinds of dynamics that makes me such a strong advocate of employing multiple models when evaluating the same program. Figure 2 has the advantage of clearly portraying the attractor that governs the sustainability of cooperative programs. It provides that simplicity, however, by ignoring the model depicted in Figure 3, which recognizes the co-existence of enforcement and cooperation. If my sole interest was in sustainability, I’d stick with Figure 2. But if I was also interested in whether agency staff would accept the cooperative program, I might also look at Figure 3 because I’d posit that the balance of cooperative and enforcement efforts may be relevant to acceptance. “Also” is the operative word in the previous sentence. Both models would be useful, each for a different part of the evaluation.

I can also imagine doing evaluation in a regulatory agency that needed only the model in Figure 3. For example, imagine a program designed to enhance communication and cooperation among factions within the agency that had different opinions about the value of enforcement and cooperation. If I ever got lucky enough to do an evaluation like that, I’d start by invoking the model in Figure 3. It’s not that one model is correct and the other is incorrect. It’s a matter of each model being useful for different reasons.

Why should evaluators care about emergence? Part 7 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 7 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

Why should evaluators care about emergence?

What’s the difference between an automobile engine (Figure 1 and a beehive (Figure 2)? After all, in each the whole is larger than the sum of its parts.

The answer is that for the engine, it is possible to explain what each part is and what role that part plays in the functioning of the engine.  I can tell you shape of a cylinder, how it is constructed, why it is needed to contain combustible material, how it moves up and down and is attached to the crankshaft, and so on. When I finished, you would know how an internal combustion engine worked and how a cylinder contributes to the overall functioning of the engine.

I could not give you such an explanation for how any single bee contributes to the construction or functioning of a beehive. The beehive materia

Figure 2: Beehive

lizes when all those bees interact with each other as they do their simple bee things. That is emergence.

Change happens when the parts of an engine are assembled. Change happens when bees do bee things. But the type of change is different. Only the latter is an emergent phenomenon

Bees are bees, but what of emergence at the human scale of people, organizations, social groups, and political entities? It’s easy to find many examples. Some of the ones I like are: 1) The fractal nature of market fluctuations cannot be explained by the behavior of individual buyers and sellers. 2) The number of patents per capita in a city (when plotted logarithmically) increases more than the increase in a city’s population. 3) Traffic jams move in a direction opposite the direction of the flow of traffic. 4) the collective consequences of people, policy, business and infrastructure yields specialized districts in cities. In each of these cases, the behavior of the larger unit cannot be explained by breaking it down into its constituent parts.

Emergence matters in evaluation because it implies program theory that acknowledges that phenomena cannot be understood in terms of constituent parts. To take the example of specialized districts in cities. Any effort to understand the consequences of such districts for the city’s appeal to outsiders needs to be understood in terms of the impact of the district on city life. It would not help to do such an analysis by researching the individual people, policies, businesses and infrastructure that comprise the district. Those do not, and cannot, “add up” to “district appeal”. To extend the example, the appeal of the city to outsiders probably has to do with the entire group of specialized districts, and how those districts affect each other. Districts might be a meaningful unit of analysis, but the constituent parts of districts would not.

Why not try to do the analysis in terms of the constituent parts? The reason has nothing to do with our analytical capabilities or our access to data. The reason is that because these are emergent phenomena, it is no more possible to understand behavior in terms of its parts than it is to understand a beehive in terms of individual bees.

Figure 3: Emergent Versus Discrete Causality

The challenge for evaluation centers on program theory, as illustrated in Figure 3. The difference between the model at the top and the model at the bottom does not seem all that dramatic. In fact, the difference is profound. The model at the top states that it is possible to understand “appeal” in terms of the individual contributions of people, policy, business, and infrastructure. The model at the bottom acknowledges that however one might understand “appeal”, it is not in terms of the unique contributions of people, policy, business, and infrastructure. The difference between the two models has very different consequences for

  • Methodology
  • Data requirements
  • Stakeholder expectations, and
  • What we can say about impact.

 

 

Joint optimization of unrelated outcomes – Part 6 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 6 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? 7/16
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

Joint optimization of unrelated outcomes

This blog is one of two in the series that discusses the possible advantages of having a less successful program than having a more successful program. The other is Part 9: Very successful programs, or many, connected, somewhat successful programs?

Figure 1: Typical Model for an AIDS Program

Figure 1 is a nice traditional model of an AIDS prevention/treatment program. The program is implemented, and services are provided. Because of careful planning, the quality of service is high. The combined amount and quality of service decreases the incidence and prevalence of AIDS. Decreased incidence and prevalence lead to improvements in quality of life and other similar measures. Because incidence and prevalence decrease, the amount of service provided goes down. However, at whatever level, the quality of service remains high. All these changes can be measured quantitatively. Change in the outcomes also affects the activities of the program, but for the most part, understanding those changes requires qualitative analysis.

There is nothing wrong with this model and this evaluation. I would dearly love to have a chance to do a piece of work like that. Note, however, an aspect of this program that characterizes every program I have ever seen. All the outcomes are highly correlated with each other. Because of the ways in which change happens, this can have some unpleasant consequences.

The unpleasant consequences can be seen by casting the AIDS program within a model that recognize that the AIDS program is but one of many organisms in a diverse ecosystem of health activities (Figure 2). (For a really good look at the subject of diversity and change, see Scott Page’s Diversity and Complexity.)

Figure 2: Casting the AIDS Model into an Ecosystem Based Health Care Model

Looked at in those terms, the way change happens can have widespread, and probably not very desirable consequences. Table 1 explains the details in the model shown in Figure 2.

Table 1: Explanation of Figure 2-
Upper right
  • The ecosystem of health services is arranged in a radar chart to show how much each service contributes to overall health in the community. There is no scale on the chart, but the absolute value of each service quality does not matter. All that matters is that under the circumstances, each service is about as good as it can be.
Upper left
  • This is the model of our very successful AIDS prevention and treatment program, as shown in Figure 1.
Lower left
  • This is a chart of what can happen to health system resources when an overriding priority is put on AIDS, to the exclusion of everything else. Resources flow from the rest of the system to AIDS. When I say “resources” I do not mean just money. I mean everything about a health care system that is needed for the system to function well.
Lower right
  • This radar chart shows the status of the system after the AIDS effort has been in operation for a while. Indeed, the AIDS measures improve. But what of the other services? How do they accommodate to nurses choosing to move to AIDS care, or policy makers time and intellectual efforts pointed in the AIDS direction, and so on? It seems reasonable to posit that whatever happens to those other services, it will not be to their advantage. Their environment has become resource poor.

This is what happens when a single objective is pursued when a system is comprised of diverse entities with diverse goals. You will get what you worked for, but the system as a whole may be the worse off for it. What is the solution? The solution is to work at jointly optimizing multiple somewhat unrelated outcomes. “Somewhat” is an important qualifier because the range of objectives cannot be too diverse. In the AIDS example, all health care objectives certainly have some overlap and relationships to each other. It’s not as if the goals to be jointly optimized were as far apart as AIDS and girls’ schooling. Some coherence of focus is needed.

The above advice can be excruciatingly difficult to follow. One problem is that there is nothing obvious about what “joint optimization” means. AIDS prevention, tertiary care, and women’s health – imagine drawing a logic model for the goals of each of these programs. Then imagine the interesting conversations that would ensue on the topic of how much achievement of each goal was appropriate.

Indeed, one way to look at the simple model depicted by Figure 1 is that it is a program operating within an organizational silo. And as I tried to show in Part 3 (Ignoring complexity can make sense), operating within silos is rational and functional. I am by no means arguing that the model in figure 2 is in any way better than the model in figure 1, or that programs must be designed and evaluated with respect to one or the other. My only point in this blog post is to show that there is complex system behavior in the form of evolutionary adaptation that is likely to cause unintended undesirable consequences when efforts are made to pursue a set of highly correlated outcomes.

Finally, I know many people take a dim view of the dark scenario I painted above, namely, that the most likely unintended consequences of pursuing a single objective are negative. But I think I’m right. For an explanation, see the section “Why are Unintended Consequences Likely to be Undesirable?” in From Firefighting to Systematic Action: Toward A Research Agenda for Better Evaluation of Unintended Consequences

A pitch for sparse models – Part 5 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 5 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes 7/8
7 Why should evaluators care about emergence? 7/16
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

Models

I’ll start with my take on the subject of “models”. I do not think of models exclusively in terms of traditional evaluation logic models, or in terms of the box and arrow graphics that we use to depict program theory. Rather, I think in terms of how “models” function in the process of scientific inquiry.  Table 1 summarizes how I engage models when I do evaluation. [Some writings that influenced my thinking about this topic: 1) Evaluation as technology, not science (Morell), 2) Models in Science Frigg, Roman and Hartmann, 3) The Model Thinker: What You Need to Know to Make Data Work for You (Page) 4) Timelines as evaluation logic models (Morell).]

Table 1: How Jonny Thinks About Models

Simplification A model is a simplification of reality that deliberately omits some aspects of a phenomenon’s functioning in order to highlight others. Simplification is required because without it, no methodology could cover all relevant factors.
Ubiquity Because evaluation is an analytical exercise there is always a need for some kind of a model. That model may be implicit or explicit, detailed or sparse, comprised of qualitative or quantitative concepts, and designed to drive any number of qualitative or quantitative ways of understanding a program. Also, models can vary in their half-lives. Some will remain relatively constant over an entire evaluation. Some may change with each new piece of data or each new analysis. But there will always be more going on than can be managed in any analysis. There will always be a need to decide what to strip out in order to discern relationships among elements of what is left.
Ignorance No matter how smart we are, we will never know what all the relevant factors are. We cannot have a complete model no matter how hard we try.
Choice Models can be cast in different forms and at different levels of detail. The appropriate form is the one that works best for a particular inquiry.
Multiple forms There is no reason to restrict an inquiry to only one model, or one form of model. In fact, there are many good reasons to use multiple models.
Wrong but useful George Box was right. “All models are wrong, but some are useful”. (Go here for a dated but public version. To here for the journal version.)
Outcome focus I use models to guide decisions about what methodology I should employ, what data I should collect, and how I should interpret the data. I tend not to use models to explain a program. If I did, I would include more detail than I could handle in an evaluation exercise. I do not use models for program advocacy, but if I did, it would use less detail.

A common view of models in evaluation

Considering the above, what should evaluation models look like? This question is unanswerable, but I do have a strong opinion as to what a model should not look like. It should not look almost all the models I have ever seen. It should not look like Figure 1. I know that no model used by evaluators looks exactly like this, but almost all models I have ever seen have a core logic that is similar. Qualitatively, they are all the same. I do not like these models.

Figure 1: Common, way over specified model

One reason I do not like these models is because they do not recognize complex behavior. Here are some examples of some complex behaviors that these kinds of models miss.

 

  • Even a single feedback loop can result in non-linear behavior
  • Small perturbations in any part of the model’s behavior may result in a major change in a model’s trajectory.
  • The model as a whole, or regions of it, may combine to generate effects that are not attributable to any single element in the model.
  • Models as depicted in Figure 1 are cast as networks, but the model is not treated as a network that can exhibit network behavior.
  • The model asserts that intermediate outcomes can be identified, as can paths through those outcomes. It is entirely possible that precise path cannot be predicted, but that long-term outcome can.

Another reason I do not like these models is because they are not modest. Read on. 

Recognizing ignorance

Give all the specific detail in Figure 1 a good look. Give it the sniff test. Is it plausible that we know enough about how the program works to specify it at that

Figure 2 Models with successive degrees of ignorance

level of detail? I suppose it’s possible, but I bet not.

As an aside, I also think that if models like this are used, they should include information that they always lack.  Here are two examples. 1) Are all those multiple arrows equally important? 2) Do those multiple connections represent “and”, or “or” relationships? It makes a difference because too many “and” requirements almost certainly portend that the program will fail. These are my favorites from a long list I developed for an analysis of implicit assumptions. If you want them all go to: Revealing Implicit Assumptions: Why, Where, and How?

My preference is to use models along the lines of those in Figure 2. From top to bottom, they capture a greater sense of what we do not know because we have not done enough research, or what we cannot know because of the workings of complex behaviors.

Blue model: The story in this model is that there are outcomes that matter, but whose precise relationships cannot be identified. (See the ovals in the “later”) column. The best we can do is think of these outcomes in groups such that if something happens in one group, something will happen in the subsequent group. This is the best we can do. We cannot specify relationships among single outcomes within each group, or specific outcomes across groups. Also, it is possible that for each replication of the program, the 1:1 relationship within and across groups may differ. Or, there may be no 1:1 relationships at all. Rather, there is emergent behavior in one group that is affecting the other. Or put more simply, the best we can say is that “if stuff happens here, stuff will happen there”. 

Green model: The story in the middle acknowledges an even greater degree of ignorance. The intermediate outcomes are still there, but the model acknowledges that much else not related to the program might be affecting the long-range outcome. Still, that long-range outcome can be isolated and identified. This seems like an odd possibility, but I believe that it is quite possible. (See Part 8: How can the concept of “attractors” be useful in evaluation?)  

Yellow model: The story at the bottom acknowledges more ignorance still. There, not only are the intermediate outcomes tangled with other activity, but the long-range outcome is as well.

I have no a priori preference for any of these models. The choice would depend on how much we know about the program, what the outcomes were, how much uncertainty we could tolerate, what data were available, what methodologies were available, the actual numbers for “later” and “much later”, and the needs of the stakeholders. What matters though, is that thinking of models in this way acknowledges the effects of complex behavior on program outcomes, and that it recognizes how little we know about the details of why a program will do what it does. Also, l I do not claim that these models are the only ones possible. They are as they say, for illustrative purposes only. Evaluators can and should be creative in fashioning models that serve the needs of their customers.

Locally right but not globally right

Models can have the odd characteristic of being everywhere locally correct but not globally correct. I tried to illustrate this with the green rectangle in Figure 3. Imagine moving that rectangle over the model. The relationships shown within the rectangle may well behave as the model depicts them, but as the size of the rectangle grows to overlap with the ent

Figure 3 Models can be everywhere correct locally but wrong globally

ire model, the fit between model and reality may fade. Several aspects of complex behavior explain why this is so.

  • Multiple interacting elements may exhibit global behavior that cannot be explained in terms of the sum of its parts. This is the phenomenon of emergence. (Part 7 Why should evaluators care about emergence?)
  • The model is a network, and networks can adapt and change as communication runs along its edges.
  • Because of sensitive dependence, small changes in any part of a system can result in long term change as the system evolves. The direction of that evolution cannot be predicted. To know it, the system must run, and its behavior observed.
  • All those feedback loops can result in non-linear change.
  • Collections of entities and relationships like this can result in phase shift behavior, a phenomenon where the characteristics of a system can change almost instantaneously.

Summary of common themes

There are two common themes that run through everything I have said in this post.

  • The models limit detail, either by removing specific element-to-element relationships, or by limiting the number and range of elements under investigation.
  • They portray scenarios in which complex behavior is affecting program outcome.

These two themes are related. One of the reasons we should use sparse models is because complex behavior makes it inappropriate to specify too much detail.

 

Complex behavior can be evaluated using comfortable, familiar methodologies – Part 4 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common introduction to all sections

This is part 4 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models 7/1
6 Joint optimization of unrelated outcomes 7/8
7 Why should evaluators care about emergence? 7/16
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

This blog post will give away much of what is to come in the other parts, but that’s OK. One reason it’s OK is that it’s never a bad thing to cover the same material twice, each time in a somewhat different way. The other reason it’s OK is that that before getting into the details of complex behavior and its use in evaluation, an important message needs to be internalized. Namely, that the title of this blog post is in fact correct. Complex behavior can be evaluated using comfortable, familiar methodologies.  

Figure 1 illustrates why this is so. It depicts a healthy eating program whose function is to reach out to individuals and teach them about dieting and exercise. Secondary effects are posited because attendees interact with friends and family. It is thought that because of that contact, four kinds of outcomes may occur.

  • Friends and family pick up some of the information that was transmitted to program attendees, and improve their personal health related behavior.
  • Collective change occurs within a family or cohort group, resulting in desirable health improvements, even though the specific changes cannot be identified in advance.
  • There may be community level changes. For instance, consider two examples: 1) An aggregate improvement in the health of people in a community may change their energy for engaging in volunteer behavior. The important outcome is not the number of hours each person puts in. The important outcome is what happens in the community because of those hours. 2) Better health may result in people working more hours, and, hence earning more money. Income is an individual level outcome, but the consequences of increased wealth in the community is a community level outcome.
  • To cap it all off, there is a feedback loop between the accomplishments of the program and what services the program delivers. So over time, the program’s outcomes may change as the program adapts to the changes it has wrought.
Evaluating Complex Behavior With Common, Familiar Methodologies

Even without a formal definition of complexity, I think we would all agree that this is a complex system. There are networks embedded in networks. There are community-level changes that cannot be understood by “summing” specific changes in friends and family. There are influences among the people receiving direct services. Program theory can identify health changes that may occur, but it is incapable of specifying any of the other changes that may occur. There is a feedback loop whereby the effects of the program influence the services the program delivers. And what methodologies are needed to deal with all this complexity? They are in the Table 1. Everything there are methods that most evaluators can either do themselves or can easily recruit colleagues who can.

Table 1: Familiar Methodologies to Address Complex Behaviors
Program Behavior Methodology
Feedback between services and impact
  • Service records
  • Budges and plans
  • Interviews with staff
Community level change
  • Monitoring
  • Observation
  • Open ended interviewing
  • Content analysis of community social media
Direct impact on participants
  • Interviews
  • Exercise logs
  • Food consumption logs
  • Blood pressure / weight measures

There are two exceptions to the “comfortable, familiar methodology” principle. The first would be cases where formal network structure mattered. For instance, imagine that it were not enough to show that network behavior was at play in the healthy eating example, but that the structure of the network and its various centrality measures were important for understanding the program outcomes. In that case one would need specialized expertise and software. The second case would be a scenario where it would further the evaluation if the program were modeled in a computer simulation. Those kinds of models are useless for predicting how a program will behave, but they are very useful for getting a sense of the program’s performance envelope, and testing assumptions about relationships between program and outcome. If any of that mattered, one would need specialized expertise in system dynamic or agent-based modeling, depending on one’s view of how the world works and what information one wanted to know.