Complex behavior can be evaluated using comfortable, familiar methodologies – Part 4 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common introduction to all sections

This is part 4 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models 7/1
6 Joint optimization of unrelated outcomes 7/8
7 Why should evaluators care about emergence? 7/16
8 Why might it be useful to think of programs and their outcomes in terms of attractors? 7/19
9 A few very successful programs, or many, connected, somewhat successful programs? 7/24
10 Evaluating for complexity when programs are not designed that way 7/31

This blog post will give away much of what is to come in the other parts, but that’s OK. One reason it’s OK is that it’s never a bad thing to cover the same material twice, each time in a somewhat different way. The other reason it’s OK is that that before getting into the details of complex behavior and its use in evaluation, an important message needs to be internalized. Namely, that the title of this blog post is in fact correct. Complex behavior can be evaluated using comfortable, familiar methodologies.  

Figure 1 illustrates why this is so. It depicts a healthy eating program whose function is to reach out to individuals and teach them about dieting and exercise. Secondary effects are posited because attendees interact with friends and family. It is thought that because of that contact, four kinds of outcomes may occur.

  • Friends and family pick up some of the information that was transmitted to program attendees, and improve their personal health related behavior.
  • Collective change occurs within a family or cohort group, resulting in desirable health improvements, even though the specific changes cannot be identified in advance.
  • There may be community level changes. For instance, consider two examples: 1) An aggregate improvement in the health of people in a community may change their energy for engaging in volunteer behavior. The important outcome is not the number of hours each person puts in. The important outcome is what happens in the community because of those hours. 2) Better health may result in people working more hours, and, hence earning more money. Income is an individual level outcome, but the consequences of increased wealth in the community is a community level outcome.
  • To cap it all off, there is a feedback loop between the accomplishments of the program and what services the program delivers. So over time, the program’s outcomes may change as the program adapts to the changes it has wrought.
Evaluating Complex Behavior With Common, Familiar Methodologies

Even without a formal definition of complexity, I think we would all agree that this is a complex system. There are networks embedded in networks. There are community-level changes that cannot be understood by “summing” specific changes in friends and family. There are influences among the people receiving direct services. Program theory can identify health changes that may occur, but it is incapable of specifying any of the other changes that may occur. There is a feedback loop whereby the effects of the program influence the services the program delivers. And what methodologies are needed to deal with all this complexity? They are in the Table 1. Everything there are methods that most evaluators can either do themselves or can easily recruit colleagues who can.

Table 1: Familiar Methodologies to Address Complex Behaviors
Program Behavior Methodology
Feedback between services and impact
  • Service records
  • Budges and plans
  • Interviews with staff
Community level change
  • Monitoring
  • Observation
  • Open ended interviewing
  • Content analysis of community social media
Direct impact on participants
  • Interviews
  • Exercise logs
  • Food consumption logs
  • Blood pressure / weight measures

There are two exceptions to the “comfortable, familiar methodology” principle. The first would be cases where formal network structure mattered. For instance, imagine that it were not enough to show that network behavior was at play in the healthy eating example, but that the structure of the network and its various centrality measures were important for understanding the program outcomes. In that case one would need specialized expertise and software. The second case would be a scenario where it would further the evaluation if the program were modeled in a computer simulation. Those kinds of models are useless for predicting how a program will behave, but they are very useful for getting a sense of the program’s performance envelope, and testing assumptions about relationships between program and outcome. If any of that mattered, one would need specialized expertise in system dynamic or agent-based modeling, depending on one’s view of how the world works and what information one wanted to know.

Embracing Uncertainty: The case for mindful development

 

Guy Sharrock, Catholic Relief Services

There is a growing awareness that many aspects of economic and social development are complex, unpredictable, and ultimately uncontrollable. Governments, non-governmental organizations, and international agencies have realized the need for a change in emphasis; a paradigm shift is taking place away from predominantly linear and reductionist models of change to approaches that signal a recognition of the indeterminate, dynamic and interconnected nature of social behavior.

Over the last few years many international NGOs have been adopting a more adaptive approach to project management often with reference to USAID’s ‘Collaborating, Learning and Adapting’ (CLA) framework and model. In the case of Catholic Relief Services this work builds on earlier and not unrelated capacity strengthening interventions – still ongoing – in which projects are encouraged to embed ‘evaluative thinking’ (ET) (Buckley et al., 2015) into their modus operandi.

Ellen Langer, in her excellent book The Power of Mindful Learning (Langer, 1997) introduces the notion of ‘mindfulness’. This concept, underpinned by many years of research, can be understood as being alert to novelty – intentionally “seeking surprise” (Guijt, 2008) – introducing in a helpful manner a sense of uncertainty to our thinking and thereby establishing a space for ‘psychologically safe’ learning (Edmondson, 2008) and an openness to multiple perspectives. This seems to me very applicable to the various strands of CLA and ET work in which I’ve been recently engaged; Langer’s arguments for mindful learning seem as applicable to international development as they are to her own sector of research interest, education. To coin the language of Lederach (2007), Langer seems to “demystify” the notion of mindfulness whilst at the same time offering us the chance to “remystify” the practice of development work that seeks to change behavior and support shifts in social norms. This is both essential and overdue for development interventions occurring in complex settings.

A mindful approach to development would seek to encourage greater awareness in the present of how different people on the receiving end of aid adapt (or not) their behavior in response to project interventions; in short, a willingness to go beyond our initial assumptions through a mindful acceptance that data bring not certainty but ambiguity. According to Langer, “in a mindful state, we implicitly recognize that no one perspective optimally explains a situation…we do not seek to select the one response that corresponds to the situation, but we recognize that there is more than one perspective on the information given and we choose from among these.” (op. cit..: 108). Mindful development encourages a learning climate in which uncertainty is embraced and stakeholders intentionally surface and value novelty, difference, context, and perspective to generate nuanced understandings of the outcome of project interventions. Uncertainty is the starting point for addressing complex challenges and a willingness to “spend more time not knowing” (Margaret Wheatley, quoted in Kania and Kramer, 2013) before deciding on course corrections if needed. As Kania and Kramer (ibid.: 7) remark, “Collective impact success favors those who embrace the uncertainty of the journey, even as they remain clear-eyed about their destination.”

References

Buckley, J., Archibald, T., Hargraves, M. and W.M. Trochim. (2015). ‘Defining and Teaching Evaluative Thinking: Insights from Research on Critical Thinking’. American Journal of Evaluation, pp. 1-14.

Edmondson, A. (2014). Building a Psychologically Safe Workplace. Retrieved from: https://www.youtube.com/watch?v=LhoLuui9gX8

Guijt, I. (2008). Seeking Surprise: Rethinking Monitoring for Collective Learning in Rural Resource Management. Published PhD thesis, Wageningen University, Wageningen, The Netherlands.

Kania, J. and M. Kramer. (2013) ‘Embracing Emergence: How Collective Impact Addresses Complexity’. Stanford Social Innovation Review. Stanford University, CA.

Langer, Ellen J. (1997). The Power of Mindful Learning. Perseus Books, Cambridge, MA.

Lederach, J.P., Neufeldt, R. and H. Culbertson. (2007). Reflective Peacebuilding. A Planning, Monitoring, and Learning Toolkit. Joan B. Kroc Institute for International Peace Studies, University of Notre Dame, South Bend, IN, and Catholic Relief Services, Baltimore, MD.

 

Ignoring complexity can make sense – Part 3 of a 10-part series on how complexity can produce better insight on what programs do, and why 

Common Introduction to all sections

This is part 3 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Approximate post date
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way up

Ignoring complexity can make sense

Complexity is in large measure about connectedness. It is about what happens when processes and entities combine or interact. I believe that understanding complex connectedness will make for better models, and hence for more useful methodologies and data interpretation. Of course I believe this. Why else would I be producing all these blog posts and videos?

Still, I would be remiss if I did not advance a contrary view, i.e. that avoiding the implications of complexity can be functional and rational. In fact, it is usually functional and rational. I don’t think evaluators can do a good job if they fail to appreciate why this is so. It’s all too easy to jump to the conclusion that program designers “should” build complex behavior into their designs. I can make a good argument that they should not.

The difference between Figure 1 and Figure 2 illustrates what I mean. Every evaluation I have been involved with comes out of the organizational structure depicted in Figure 1. A program has internal operations (blue). Those operations produce consequences (pink). There is a feedback loop between what the program does and what it accomplishes. Real world cases may have many more parts, but qualitatively they are all the same picture.Figure 2 illustrate s how programs really operate. The core of the program is still there, color coded in the same pink and blue. However, that program contains embedded detail (dark blue and dark red), and is connected to a great deal of activity and organizational structure outside of its immediate boundaries (green, gray, yellow, and white.)

Figure 1
Figure 2

The people working in the program certainly know about  these complications. They also know that those complications affect the program they are managing. So why not act on that knowledge? There are good reasons. Think about what would be involved in taking all those relationships into account.

  • Different stakeholders will have different priorities.
  • Different organizational cultures would have to work with each other.
  • Goals among the programs may conflict and would have to be negotiated.
  • Different programs are likely to have different schedules for decision making.
  • The cost of coordination in terms of people, money, and time would increase.
  • Different time horizons for the different activities would have to be reconciled.
  • Interactions among the programs would have to be built into program theory and evaluation.
  • Program designers would have to interact with people they don’t know personally, and don’t trust.
  • Each program will have different contingencies, which instead of affecting a narrow program, would affect the entire suite of programs.

That’s the reality. I’d say its rational to work within narrow constraints, no matter how acutely aware people are of the limitations of doing so.

Can Knowledge of Evolutionary Biology and Ecology Inform Evaluation?

I posted a longish piece (~3,500 words) on my website with the same title as this post. Section headings are:

  • Case: Early childhood parent support
    • Program design
    • Evaluation design
  • Some useful concepts from evolutionary biology and ecology
    • Population
    • Coevolution
    • Birth/death rates
    • Selection pressure
    • Species and species variation
  • What would the evaluation look like if its design were informed by knowledge of evolutionary biology and ecology?
    • Populations, and birth/death rates
    • Coevolution and population
    • Selection pressure
    • Species and species variation
  • Do we gain anything from applying an evolutionary lens?
    • Paradigmatic concepts
    • Methodology

I’m always interested in having people point out the error of my ways.

A Model for Evaluation of Transformation to a Green Energy Future

I just got back from the IDEAS global assembly, which carried the theme: Evaluation for Transformative Change: Bringing experiences of the Global South to the Global North. The trip prompted me to think about how complexity can be applied to evaluating green energy transformation efforts. I have a longish document (~2000 words) that goes into detail, but here is my quick overview.

Because transformation is a complex process, any theory of change used to understand or measure it must be steeped in the principles of complexity.

The focus must be on the behavior of complex systems, not on “complex systems”. (Complex systems or complex behavior?)

In colloquial terms, a transformation to reliance on green energy can be thought of as a “new normal”. In complexity terms, “new normal” connotes an “attractor”, i.e. an equilibrium condition where perturbations settle back to the equilibrium. (Why might it be useful to think of programs and their outcomes in terms of attractors?)

A definition of a transformation to green energy must specify four measurable elements: 1) geographical boundaries, 2) level of energy use, 3)  time frame, and 4) level of precision. For instance: “We know that transformation has happened if in place X, 80% of energy use comes from green sources, and has remained at about that level for five years.”

Whether or not that definition is a good one is an empirical question for evaluators to address. What matters is  whether the evaluation can provide guidance as to how to improve efforts at transformation.

Knowing if a condition obtains is different from knowing why a condition obtains. To address the “why”, evaluation must produce a program theory that recognizes three complexity behaviors – attractors, sensitive dependence, and emergence.

Because of sensitive dependence, unambiguous relationships among variables may not continue over time or across contexts. Because of emergence, transformation does not come about as a result of a fixed set of interactions among well-defined elements. The result of sensitive dependence and emergence may produce outcomes that exist within identifiable boundaries,  i.e. within an attractor space. If they do, that is akin to “predicting an outcome”. If they do not, that is akin to showing that a program theory is wrong.

Models with many elements and connections cannot be used for prediction, or even, for understanding transformation as a holistic construct. Small parts of a large model, however, can be useful for designing research and for understanding the transformation process.

Six tactics can be used for evaluating progress toward transformation: 1) develop a TOC that recognizes complex behavior, 2) measure each individual factor in the model, 3) consider how much change took place in each element of the model, 4) focus on parts of the model, but not the model as a whole, 5) use computer-based modeling, 6) employ a multiple-comparative case study design.

As all the analysis takes place, interpret the data with respect to the limitations of models, and the implications of emergence, sensitive dependence, and attractor behavior.

Preliminary Notes on The Application of Concepts from Evolutionary Biology and Ecology to Evaluation

Introduction
I’m working on the notion that there are circumstances when evaluators should think of programs as species of organisms adapting in an ecological niche. This document contains some preliminary thoughts on that topic. I’m groping toward an article, a series of blog posts, and some YouTube movies. I’m looking for any suggestions anyone might have to help me along.

Inapplicability to Evaluation
One thing I need to be careful about is that evolution is agnostic as to the outcome, it only cares about species viability. We care about goals.

Evolution does not mean “progress” in the sense that we humans think of making life better for people. It’s not hard to imagine a dystopian, but highly sustainable, evolutionarily successful world. (In general, people talk about “sustainability” as if it is an unalloyed good. It’s not. It is neutral with respect to being “desirable” or “undesirable” concerning desired ends. In my business the problem is that systems are too sustainable. You can beat them over the head with data until the cows come home, and still they do not change.)

When is an evolutionary biological perspective needed?
People get turned on when I give my complexity workshops and think that they have to apply principles of complex behavior in everything they do. Hooey. It’s one thing to say there is complex behavior operating. It’s quite something else to say that one has to go to the trouble of dealing with it. There is a large and legitimate need for evaluation of single programs with respect to first-order outcomes. No overwhelming need to deal with complexity there. Ditto evolutionary biology.

Toward the end of Superforecasting: The Art and Science of Prediction, Tetlock  has a nice discussion of when incremental forecasting is useful, given the ubiquity of log-linearly distributed rare occurrences that can change the course of events. (See Rumsfeld memo to Bush, Cheney, and Rice.) There is an analogous argument to make about evaluation. If it’s true for evaluation in general, it’s certainly the case for using knowledge of evolutionary biology to shape an evaluation. (Actually I can make a good case that we should only evaluate short-term, proximate outcomes. But that’s another story.)

Also, evolution does not care about whether an organism lives of dies. It cares about whether a species thrives of goes extinct. So applying evolutionary biology to evaluation is only appropriate when that which is being evaluated is a class of programs. It’s the viability of the class that matters, not the individual programs within the class.

How can an evolutionary biological perspective be used in evaluation?
An evolutionary biology perspective can be used in a few different ways.

  • Technical, as the authors do in “Organizational Ecology”. They actually apply Lotka-Volterra equations to the birth and death of types of organizations. Using more familiar methodologies, I can see evaluators doing things like estimating how quickly a program’s environment is changing, or the diversity of similar programs in the same ecological niche.
  • As a vocabulary and a set of constructs that can help developing models, devising methodologies, and interpreting data. Some examples: 1) Fitness landscape: If a set of programs begin to evolve in a particular direction, what are the consequences of small changes for the fitness of that set of programs?

2) Co-evolution of species and environment: In the U.S. at least, Uber is a great example. It could only exist because the environment was conducive – IT infrastructure, GPS, weaknesses in current taxi services, availability of venture and human capital, and so on. But once the “species” began to thrive, the environment had to adapt to it, e.g. rules about traffic congestion, public conveyance regulations in various cities, dedicated waiting spaces at airports, and so on. Depending on the nature and direction of the adaptation, the species may or may not thrive. (It’s an open question. Uber is losing heaps and gobs of money.)

Birth and extinction
If evaluation is going to take an evolutionary biology perspective, it has to take the concept of species birth and extinction seriously. We care about spurring innovation and stifling ineffective programs.

One very juicy example is the advent of the mail order business, as invented by Sears. This required the establishment of Rural Free Delivery in the late 19th century, high local prices, and a market pull for a broad range of goods. It had a truly profound effect on bringing a variety of goods to a large percentage of the population at lower prices, allowing African Americans to buy and get credit for purchases that were not available locally, and badly affected the income of local merchants, some of whom sponsored book burnings of the Sears catalogue.

My problem is that I cannot think of as good an example for the type of stuff that evaluators would evaluate. It’s easy enough to think of examples, but not big, interesting ones. For instance, there are STEM programs that did not exist before Sputnik and did not exist for girls until 20 or so years ago. There were always private schools in the US, but not charter schools in their present incarnation. How long ago was it that there were no programs in environmental education, or climate change mitigation efforts? In terms of extinction, think of big state mental hospitals in the US, and specialized hospital wards for AIDS patients.

Links with other types of evaluation
I would do well to make the case that an evolutionary biological perspective has ties to other trends in evaluation. I can think of three: complexity, developmental evaluation, and sustainability. I have the first one pretty well worked out. Not so much the other two.

Examples I’m looking for
I’m looking for examples that make the transition from evaluating a program to evaluating a group of similar programs, which is what an evolutionary perspective would require. My difficulty is finding an example that evaluators would recognize as something they might get paid to do.

As of now I’m pondering two possibilities.

  • Sustainability (See NDE Summer 2019.)
  • Telemedicine/telehealth. This has lots of elements I can use. Ancestors (back to plain old telephone service), rapid evolution, adaptation to changing environment (costs of health care, docs leaving rural areas, etc.), co-evolution as the innovation affects is environment, “species” nested in “genius” (e.g. maternal health and surgical consulting), competition, and much else besides.

To build on the example of telemedicine, someone might get paid to evaluate a telehealth counseling program for nursing mothers in Australia, i.e. a program that had an identifiable source of funding coming from some small corner of the Ministry of Health. But getting paid to evaluate the overall consequences of having a telehealth infrastructure and set of services in the country? A nice piece of social science research to be sure, but I’m not sure how many of our brethren would see it as an “evaluation”. I have a feeling that Foundations might do this at a program level, but I’m not sure.

 

 

 

 

 

Evaluating for complexity when programs are not designed that way Part 10 of a 10-part series on how complexity can produce better insight on what programs do, and why

Common Introduction to all sections

This is part 10 of 10 blog posts I’m writing to convey the information that I present in various workshops and lectures that I deliver about complexity. I’m an evaluator so I think in terms of evaluation, but I’m convinced that what I’m saying is equally applicable for planning.

I wrote each post to stand on its own, but I designed the collection to provide a wide-ranging view of how research and theory in the domain of “complexity” can contribute to the ability of evaluators to show stakeholders what their programs are producing, and why. I’m going to try to produce a YouTube video on each section. When (if?) I do, I’ll edit the post to include the YT URL.

Part Title Post status
1 Complex systems or complex behavior? up
2 Complexity has awkward implications for program designers and evaluators up
3 Ignoring complexity can make sense up
4 Complex behavior can be evaluated using comfortable, familiar methodologies up
5 A pitch for sparse models up
6 Joint optimization of unrelated outcomes up
7 Why should evaluators care about emergence? up
8 Why might it be useful to think of programs and their outcomes in terms of attractors? up
9 A few very successful programs, or many, connected, somewhat successful programs? up
10 Evaluating for complexity when programs are not designed that way up

Evaluating for complexity when programs are not designed that way

There are good reasons to design programs with complex behavior in mind, and good reasons not to. (For the reasons not to, see Part 3 Ignoring complexity can make sense, which makes a case for the rationality of letting the sleeping complexity dog sleep.)

The fact that programs are not designed in ways that recognize complexity does not mean that evaluation should ignore complexity. My reasoning is that even if programs are not designed with complex behavior in mind, knowing about complex behavior can still be useful to stakeholders. Figure 1 illustrates what I have in mind.

Figure 1: Overlay of Complex Model on a non-Complex Program Design

Blue region of model
Blue represents the original program model. It has much in it that is oblivious to complex behavior (and to common sense, for that matter.)

Green region of model
Green represents a program model that recognizes some of the complex behaviors that may be operating. To make my point, I superimposed it on the original model

  • Network effects are included.
  • Undesirable consequences are acknowledged.
  • Data are collected and analyzed with respect to groups of outcomes, without regard to any unique outcome within the group.
  • The social implications of distribution shapes are considered over and above the technical aspects of doing statistical analysis.

If I had my choice, I’d evaluate with respect to the green model exclusively, but I acknowledge that stakeholders may need more fine-grained information. In any case, as you know by now, I am a big supporter of using more than one model in any single evaluation.  All models are wrong, but many different models can be both wrong and useful in different ways.