Four facets of AEA. Four questions. No answers provided.

A friend of mind and I were discussing the nature of AEA. I have come to quite a few conclusions about this, but my current thinking is more in the way of questions than answers. As I see it, what’s needed is an exploration of four questions.

  1. Where does the evaluation that we do fit within all the evaluation that is done?
  2. How does what we call “evaluation theory” fit within the greater domain of social science?
  3. What is the nature of AEA’s professional support network?
  4. What are the implications of AEA’s social values for the field of evaluation?

Where does the evaluation that we do fit within all the evaluation that is done?
Every once in a while, in the popular press, I read about the evaluation of a social program. As far as I can tell, the people doing these evaluations have no involvement in AEA. I don’t see this as problematic. It just means that AEA attracts a breed of evaluator that does not work on the kinds of programs that the popular press likes to write about. But it does mean that a lot of evaluation takes place that is outside of the ambit of AEA’s member’s activity. More important, it is outside the ambit of anything AEA claims for evaluation. So, considering the population of evaluations that are done, what subset is covered by AEA members? Why that particular subset?

How does what we call “evaluation theory” fit within the greater domain of social science?
Over the decades there has been plenty of writing about how evaluation differs from other types of social science with respect to how evaluation should be designed and executed, and how the products of evaluation should be used. It is largely this body of work that sets evaluation apart from other social science. Still, evaluation theory does not exist in isolation. Some of it is a novel extension of what went before. Some of it is a new expression of what already exists, that new expression needed to orient evaluation in unique ways.

What is the nature of AEA’s professional support network?
We know that one function of AEA (as with all professional organizations) is to embed its members in a professional support network. But what is the structure of that network? How does it function and influence the work lives of its members? How does it further their professional interests? How does it develop and evolve? What sub-networks does it contain?

What are the implications of AEA’s social values for the field of evaluation?
AEA as an organization pursues a particular set of values with respect to the dissemination of data, the use of data, the value of data, and the social/organizational processes that should drive the development and execution of evaluation. There are consequences to pursuing these values for the kinds of programs that are evaluated, the nature of the information that is produced, and the range of stakeholders who might look to evaluation as a source of knowledge.


Complexity as a Trend in Evaluation: Similarities and Differences with Classical Statistics

Evaluators have always been concerned about the weakness of their efforts to drive more effective programs, more desirable outcomes, and fewer unintended negative consequences. Broadly speaking, these concerns fall into three categories.

  • evaluation methodology,
  • “evaluation use” e., the dynamics by which evaluation works its way into decision making, and
  • limits on what evaluation can say about why programs operate as they do and what consequences they have.

 Complexity as a Trend in Evaluation and Planning

In recent years, the evaluation community has been looking to “complexity” as a source for addressing these difficulties. Continue reading “Complexity as a Trend in Evaluation: Similarities and Differences with Classical Statistics”

A Complex System View of Technology Acquisition Choice

I am involved in a project that involves helping people make a single choice among multiple technologies. They must commit to one, so there is no waffling. This is one more of many such exercises that I have been involved in over the course of my career, and I have never been fully satisfied with any of them. On an intuitive level, everyone knows they cannot make the best choice, but everyone thinks that they should be able to. I finally figured out why they cannot. I don’t mean that people are not smart enough. I mean that it is impossible. The behavior of complex systems makes it impossible.

A Workable, Effective Solution
If there is a technology choice with a very few criteria, and it is absolutely clear what criterion is truly critical, and there is good data on performance, then yes,  it is possible to make the best choice. But how many situations like that are there? So, what to do in the majority of all the other cases?

Before I get into a longish esoteric discussion, I’ll jump to a simple, practical method for making a technology choice. The answer is that we accept the reality of how human beings make decisions. We satisfice within a context of bounded rationality. As Herbert Simon put it, “decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world”.

With respect to technology choice, satisficing dictates two decision making strategies which can be used alone or combined.

  • Find a few acceptable technology choices and pick the one you are most comfortable with.
  • Aggregate the requirements into broad enough categories and accept the imprecision that such aggregation requires.

And now for my explanation of why this simple solution is more than an efficient convenience, but a necessity. Of course, there is too much going on in the world for our humble intellect to find and understand. But it is more than the volume of information and our limited capacities. It is how that information is structured.

What are the System-based Reasons why an Optimal Choice Cannot be Made?
To begin, I need to define what I mean by “best choice”. I mean it in a technical optimization sense, where there is a true joint optimization of all relevant criteria. “Best” can also have a social psychological meaning, i.e., a situation where most interested parties are as satisfied as they can be with the collective choice that was made. But although I am a social psychologist, I’ll stick to a definition of “best” that is near and dear to the hearts of my engineer friends.

Choice Criteria are Networked
Why can’t a best choice be made? The answer is that choice criteria are networked and that the nodes of the network are subject to environmental influences. The result is sensitive dependence and emergent behavior. To illustrate with an example, see Table 1. It contains a list of choice criteria that I adapted from a project I’m working on.

Table 1: Technology Choice Criteria
High level Detailed
Signal detection capability 1.   Data analysis capability
2.   Number of signal types detected
3.   Signal resolution for each data type
Human Factors


4.   Usability for operator
5.   Training requirements
6.   Visual presentation quality of output
Interoperability 7.   Data export formats
8.   Data import formats
Operating environment 9.     Time of day
10.   Temperature
11.   Weather conditions
Market 12.   Competing technologies
13.   Market demand
14.   Initial cost
15.   Life cycle cost
16.   Compatibility with technology trends
17.   Synergy with other technologies in places where implemented.

Pull just four elements from the list (see the picture): 1) number of signal types detectable, 2) weather,  3) cost, and 4) training requirements. A wider range of detection needs, the ability to work in bad weather, and low training requirements will all increase cost. The ability to work in adverse weather conditions may affect the types of signal detection that can be used. The greater the diversity of information, the greater the training requirements. What would happen to all the tangled dependencies if new hiring drove up the burden on training, or if a need for higher resolution imaging asserted itself, or if requirements for operation in adverse weather conditions were relaxed? Scale this up to dependencies among the seventeen choice criteria, and even a casual look at the dependencies makes it obvious why a strict ordering of criteria is impossible.

Network Behavior
Node relationships in networks are prone to sensitive dependence. This means that local differences in any one node, (or in a small number of nodes) might ripple through the system and affect relationships among many of the nodes. And the nature of those large-scale changes may be different as a function of different local changes. Moreover, networks can be adaptive in the sense that as influence is transmitted across edges, node and edge relationships can rearrange. I am not claiming that sensitive dependence or network adaptivity will always be at play in networks, only that they often are. Given what I know about interactions among technology requirements, it’s hard for me to believe that they are not at play in networked technology choice requirements.

There is yet another network phenomenon that I am convinced makes a strict ordering of criteria impossible, but which I won’t push too hard because I can’t make a strong case for it. I suspect that the “best” technology choice is not an additive function of its component requirements. Rather, “best” is an emergent characteristic of network behavior. Or put differently, each requirement loses its unique identity.

Influences on Network Nodes
If local change in a network of choice criteria can have such profound effects, how certain can we be that those kinds of changes will occur? Very certain. Consider just a few of the endless possibilities that may affect one or a few choice criteria.

  • Technology costs may rise or fall with market conditions.
  • A competing technology standard may become ascendant.
  • Funds for technology acquisition may increase or decrease.
  • The importance of the reasons for the technology choice may change.
  • Domains (location, business conditions, etc.) where the technology is desirable may narrow or broaden.
  • Choices are based on the best knowledge one has at the time about each relevant criterion. But the discovery of more extensive, or more accurate, knowledge is always a
  • And many, many more.

In the first section of this blog post I made the observation that people have an intuitive appreciation for the difficulty of making an optimal choice among competing technology acquisition candidates. In the second section I provided a complex system, network-based justification for this intuitive appreciation. I laid all this out to make the point that it is not difficult to make an optimal choice, it is impossible, and that therefore choices need to be made via a process of satisficing rather than optimizing. With respect to technology choice, satisficing dictates two decision making strategies that can be used alone or combined.

  • Find a few acceptable technology choices and pick the one you are most comfortable with.
  • Aggregate the requirements into broad enough categories and accept the imprecision that such aggregation requires.







Simulation Using Events and Goals – A New Approach to Agent-based Modeling

I just finished a research project that was funded through the DARPA Ground Truth program: SCAMP (Social Causality with Agents using Multiple Perspectives). The credit for developing SCAMP goes to my colleague Van Parunak, who had the vision to conceptualize the methodology and the ability to get it funded and to see it through. Van is the President  of ABC Research: Superior Solutions through Agent-Based and Complex Systems. For technical detail on SCAMP go to: Parunak et al. (2020) SCAMP’s Stigmergic Model of Social Conflict. Computational and Mathematical Organization Theory.

The computer science of SCAMP drew me in, but what really interested me was its novel approach to modeling social phenomena. SCAMP looks at network-based phenomena in ways that traditional networking does not. For a non-technical explanation of the possibilities, go to: Social Causality with Agents using Multiple Perspectives: A Novel Approach to Understanding Network-based Social Phenomena.

SCAMP is different from traditional social networks in two fundamental ways. First, nodes are events through which agents pass. In traditional networks the nodes are the agents themselves. Second, SCAMP event networks connect to a second set of networks that represent the goals for each agent group. Because of these differences SCAMP reveals both novel questions and novel perspectives on familiar questions. For a deeper explanation, read on.

Continue reading “Simulation Using Events and Goals – A New Approach to Agent-based Modeling”

Social Causality with Agents using Multiple Perspectives: A Novel Approach to Understanding Network-based Social Phenomena

PDF version of this post. SCAMP description 12_04_2020

Current Approach to Network-based Social Phenomena
There is a deep body of research and theory on emergent effects in networks. To name but a few, the network lens has been applied to phenomena such as disease transmission, personal behavior, rumor and fake news, innovation adoption, and political and social change. The list could go on and on. A common feature of this research is that it identifies nodes as people or groups (agents), and edges as interaction (e.g. communication, influence) among agents.

SCAMP is a network-based scenario-simulation methodology whose runs can reveal new understanding about known topics of research, and also, reveal hitherto unrealized research questions. SCAMP has this capacity because it treats networks in novel ways. 

People/groups  –> Events
Nodes in SCAMP networks do not represent people or groups. They represent events. Edges in SCAMP do not represent communication or influence. They represent agents’ choices as agents participate in successive events based on their individual preferences, the flow of information that results from their decisions, their actions, and their associations with one another. Relationships among events fall into three categories: 1) agent choice, 2) inhibit, and 3) support. If an event involves movement through physical space, SCAMP can also model that movement, including spatially-based interactions among agents.

To the extent that traditional network analysis is concerned with what agents “want”, those desires are incorporated into the rules that govern agent behavior. Agents’ goals are not treated as separate entities in their own right. SCAMP includes goal hierarchies that are relevant to each agent group. The degree of goal satisfaction within the hierarchy influences agent decisions as they move from event to event. SCAMP allows different hierarchies for different agents, and also, for linked goals across hierarchies.

Event networks differ from goal networks
In SCAMP the rules that govern agents’ movement across events are different from the rules that govern relationships among levels in goal hierarchies. Event movement is driven by an agent’s “personality” (e.g. preferences, affiliations). The state of a goal network is driven by changes in the degree to which events in the world satisfy various goals.

Event networks and goal networks are connected
Selected events in an event network are connected to selected goals. Event activity influences those goals. Conversely, goal satisfaction influences event activity. The figure is a schematic of a SCAMP model. For the sake of simplicity, it omits differences in types of edge relationships, movement across space, and/or relationships among goals, and various other technical details that would be part of a fully-fledged SCAMP model.

Construction by domain experts, not programmers
SCAMP can be configured by domain experts with no formal programming training.

Imagine an election scenario consisting of five types of agents: 1) advocates of position A, 2) advocates of position B, 3) conspiracy believers, 4) provocateurs, and 5) fact checkers. Based on their “personality”, agents move across events such as: “election is declared fraudulent”, “incumbent accepts defeat”, “extremist groups merge”, “political establishment accepts result”, “supporting and opposing groups clash in city X”, “clashes spreads”, and “violence decreases”. A goal hierarchy for say, fact checkers might culminate with “population accepts fact checkers’ assessments”, supported by a variety of sub-goals.

A SCAMP simulation run would reveal agents’ movements across events, goal satisfaction, and mutual influences between event activation and goal satisfaction. Thus, SCAMP could address questions such as: Under what circumstances will fact checkers and advocates of a specific political position affiliated with each other? Or, if such an affiliation takes place, will each group attain more of its goals? These are network-based questions that have both real-world salience and theoretical implications for understanding political behavior, neither of which could be addressed using traditional network approaches.

SCAMP was developed with DARPA funding under their Ground Truth program. The prime contractor was Parallax Advanced Research.
For more information contact:
Van Parunak  
Jonny Morell


Development Trajectories, Complexity Thinking and Theories of Change

Aaron E. Zazueta;
Nima Bahramalian;
Thuy Thu Le

This article builds on a previous contribution to this blog identifying a set of complex adaptive systems that are particularly useful in the formulation of theories of change (TOCs), find the link to the blog here. These include the concepts of the Social-Ecological Systems, boundaries, domains, scales, agents, adaptive behavior, and emergence and system development trajectory in the formulation of theories of change. This article briefly explains how to use these concepts and presents some aspects of an the article Development trajectories and complex systems-informed theories of change which was published in September 2020 in the American Journal of evaluation (Zazueta et al., 2020). A non-edited version of the article is available here. The article illustrates the use of the approach in  the evaluation of the UNIDO /SECO project SMART- Fish in Indonesia (UNIDO, 2019), available here.

The long-term objective of the SMART-Fish was to support the transformation to more efficient and environmentally sustainable fisheries that increased value across the market chain, especially among the small fishermen and women. The project addressed three value chains that have different ecological, economic and social characteristics: Pangasius, Pole and Line Tuna and Seaweed. The evaluation focused on the identification of the conditions conducive to the intended transformation and the assessment of the extent to which the project contributed to the advancement of those conditions.  The evaluation had two main phases. In the first phase, the project management team and the evaluation team jointly developed a proposal of the social ecological fisheries system in Indonesia that was subsequently presented to project stakeholders for discussion and application.  The first phase consisted of the following steps:

  1. Definition of a manageable set of domains that could provide an initial framework to identify key enabling conditions for the intended transformation. Drawing from a review of the technical literature, the  teams  identified five broad domains: policy and regulatory, institutional, technological, financial, and sociocultural.
  2. Brainstorming to identify the four or five most important enabling conditions in each of the five domains. The result was 32 enabling conditions.
  3. Regrouping the 32 enabling into clusters and when appropriate relabeling domains. This led to the grouping of the 32 conditions into six domains.
  4. Identification of the instances in which each of the 32 conditions had an enabling function to the rest of the conditions across the system. This step consisted of development of a relation matrix which identified 236 significant links among  the 32 conditions. Figure1 plots the links among the different enabling conditions.
  5. Ranking of the 32 enabling conditions in terms of their influence across the system. Using the program NodeXL, we ran several tests to identify the most influential enabling conditions across the system. These tests identified five of such conditions (figure 2).

In phase two, the evaluation team convened three stakeholder focus groups, one for each value chain. Each group was asked to rate the state of the 32 enabling conditions before the project started and at the time of project completion. Subsequently, the groups were asked to rate the extent to which SMART- Fish contributed to the changes in these conditions. When aggregating the responses on the three value chains, changes in the enabling conditions to the long-term objectives were most pronounced in the domains of trade and markets, governance, and production (Figure 3). These were three domains that the project targeted and in which stakeholders reported SMART-Fish making substantial contribution.  While the stakeholders acknowledged project contributions in science and technology, the progress made in many of the enabling conditions in this domain was seen as low.  Progress was made in the enabling conditions under the financial domain with no link to the project.

The evaluation team also used the focus group data to assess the extent to which the project had contributed to the five key enabling conditions with the most influence on the system previously identified through the network analysis which are presented in Figure 2. This figure indicates that despite the project making important contributions to enabling condition  pertaining innovation capacity in science and technology,  not much progress was achieved on other conditions pertaining the technology domain.

Despite the complexity of the system, the approach allowed us to develop a model to understand the factors leading to transformation of the fisheries systems in Indonesia and the extent and forms by which the project had contributed to a development trajectory consistent with the long term goal of the project.  The approach also to enabled the engagement of stakeholders in the assessment of the contributions made by the project in the trajectory of the desired policies.    

UNIDO. (2019). Independent Terminal Evaluation Indonesia SMART-Fish Increasing Trade capacities of Selected Value Chains within the Fisheries Sector in Indonesia. United Nations Industrial Development Organization.

Zazueta, A. E., Le, T. T., & Bahramalian, N. (2020). Development Trajectories and Complex Systems–Informed Theories of Change. American Journal of Evaluation, I–20.



Applied Complexity: Theory and Practice of Human Systems Dynamics

Glenda Eoyang, PhD
Founding Executive Director, Human Systems Dynamics Institute,

My particular take on complexity and systems is called human systems dynamics (HSD). It is a field of theory and practice that applies principles of complex adaptive systems to help people see, understand, and influence emergent patterns in complex human systems.  HSD is applicable at all scales of human experience from intrapersonal reflection and cognition through global patterns of economic and cultural interaction (Eoyang, 1997). For more information about the models and methods of HSD, visit our website Here, I would like to introduce the basic features of the theory and practice that form the foundation of HSD.

HSD theory is drawn from the field of complex adaptive systems (Dooley, 1997).  In this approach, a system is defined as a collection of agents that interact to generate system-wide patterns. Those patterns then constrain the behavior of agents in future cycles of interaction. The process is called emergence or self-organization (Baranger, et al, 2006).

A variety of interesting and relevant natural phenomena can be understood from this systems perspective. Uncertainty is, perhaps, the most significant feature of a complex adaptive system. Regardless of the amount of information available, the future of an open, high dimension, nonlinear complex system cannot be predicted. Self-organized criticality explains discontinuous change over time. Non-Gaussian, Power Law data distributions are explained through interactions within and across scales of the system  (Bak, 1996). Dissipative structures explain how orderly relationships appear to be generated spontaneously in systems with open boundaries (Prigogine, et al, 2017). Drawing from a variety of perspectives in the complexity sciences, my research defined three conditions that influence the speed, path, and products of self-organizing processes in human systems (Eoyang, 2001). Short descriptions and citations for sources that inform HSD are accessible here

Complexity science has informed a whole generation of social and human systems change literature (Eoyang, 2011). The practice of HSD contributes to that body of work. The purpose of HSD is captured in our vision: People everywhere thrive because we see patterns clearly, seek to understand, and act with courage to transform turbulence and uncertainty into possibility for all. We have developed a variety of models and methods to help individuals and teams interact with and influence complex systems in which they work and play, while remembering that the future is ultimately unpredictable and uncontrollable (Eoyang & Holladay, 2013).

Bak, P. (1996). How nature works: The science of self-organized criticality. New York, NY, USA: Copernicus.
Baranger M., Kauffman S., Stanley E., Levin S., Clark D. (2006) Emergence. In: Minai A.A., Bar-Yam Y. (eds) Unifying Themes in Complex Systems. Springer, Berlin, Heidelberg
Dooley, K. (1997). “A complex adaptive systems model of organization change.” Nonlinear Dynamics, Psychology, and the Life Sciences, 1: 69-97.
Eoyang, G. H. (1997). Coping with Chaos: SEVEN simple tools. Cheyenne, WY: Lagumo.
Eoyang, G. (2001). Conditions for Self-Organizing in Human Systems.  Unpublished doctoral dissertation. The Union Institute and University.
Eoyang, G. H. (2011). Complexity and the Dynamics of Organization Change. In P. Allen, S. Maguire, & B. McKelvey (Authors), The sage handbook of complexity and management (pp. 319-354). Los Angeles, CA: SAGE.
Eoyang, G., Holladay, R. (2013). Adaptive action: Leveraging uncertainty in our organization. Stanford University Press.
Prigogine, I., Stengers, I., & Toffler, A. (2017). Order out of chaos: Man’s new dialogue with nature. London: Verso.




Creating change in complex systems: the role of phase space

Mat Walton BA (Hons), DPH, PhD
Technical Lead Social Systems
Institute of Environmental Science and Research Limited (ESR)
Kenepuru Science Centre: 34 Kenepuru Drive, Kenepuru, Porirua 5022

In this piece I introduce the concept of phase space (interchangeable with term ‘state space’).  Within a Complex Adaptive Systems (CAS) understanding of how systems change over time, I argue that phase space is important for designing interventions within a CAS.  Here I draw principally on the work of David Byrne and colleagues, located within dynamical systems and chaos (Capra & Luisi, 2014; Mitchell, 2009).

Complex Adaptive Systems are comprised of a set of interconnected elements.  Through the interaction of these elements phenomena ‘emerge’ as a feature of the system as a whole.  That is, we can’t look at individual elements within the system to understand the emergent phenomena (Byrne, 2013; Byrne & Callaghan, 2014).

A CAS often produces quite stable emergent phenomena over time.  When change does occur in emergent phenomena, it is due to a new configuration of the system – new elements, new connections or both.  Because emergent phenomena are the result of many interactions, there is large uncertainty in how a system will change (Eppel, Matheson, & Walton, 2011).  What is certain, is that the range of possible future system states is enabled and restricted by phase space.

Phase space can be thought of as the space within which a CAS can occupy.  While we can’t know precisely how a system might change, we do know that it will be within the phase space.  A change in emergent phenomena within a phase space may be incremental.  A radical change suggests a shift in phase space, a qualitative difference in the system (Byrne & Callaghan, 2014).

An example of a radical shift in emergent phenomena can be found in New Zealand’s government.  New Zealand is a parliamentary democracy.  In 1996, the system for electing parliament changed from First Past the Post to Mixed Member Proportional (MMP).  MMP is a system that is designed to produce coalition, multi-party governments.  That is, to have a majority of votes in parliament (required to become government), more than one party need to agree to work together.  Under First Past the Post, New Zealand had a history of mostly single party governments.  Since MMP, there has only been coalition governments.  While coalition governments were possible under First Past the Post, and single-party governments are possible under MMP rules, there is only a small set of conditions under which these outcomes are likely.

We could say that the phase space changed in 1996 from one where single party governments are usual, to one where coalition governments are usual.  In this case the rules around how members of parliament are elected changed the elements in the system (representation of political parties), and by implication the interaction between the elements (how political parties work together to achieve majority of parliamentary votes).

Phase space has implications for designing programmes and interventions.  Questions to consider include: is the intended outcome of my programme possible within current phase space?  Will the programme create a different phase space (that makes my programme outcomes more likely)?  How will I recognise if phase space has changed?

Byrne, D. (2013). Evaluating complex social interventions in a complex world. Evaluation, 19(3), 217-228. doi:10.1177/1356389013495617
Byrne, D., & Callaghan, G. (2014). Complexity Theory and the Social Sciences: The state of the art. Oxon: Routledge.
Capra, F., & Luisi, P. L. (2014). The Systems View of Life. Cambridge: Cambridge University Press.
Eppel, E., Matheson, A., & Walton, M. (2011). Applying complexity theory to New Zealand public policy: Principles for practice. Policy Quarterly, 7(1), 48-55. Retrieved from
Mitchell, M. (2009). Complexity: A guided tour. Oxford: Oxford University Press.

Applying the complexity concept of “sensitive dependence” to understanding how a program works

Jonathan (Jonny) A Morell
President, 4.669… Evaluation and Planning
This is the first of what I hope will be many posts that show how specific constructs from complexity science can be useful for doing evaluation. There will only be many posts if others contribute. Please do.

What complex behavior is this post about?
This post is about “sensitive dependence”.

A system’s sensitivity to initial conditions refers to the role that the starting configuration of that system plays in determining the subsequent states of that system. When this sensitivity is high, slight changes to starting conditions will lead to significantly different conditions in the future (Santa Fe Institute).

… refers to the idea that current and future states, actions, or decisions depend on the sequence of states, actions, or decisions that preceded them – namely their (typically temporal) path.  For example, the very first fold of a piece of origami paper will determine which final shapes are possible; origami is therefore a path dependent art (Santa Fe Institute).

What is an evaluation scenario where “sensitive dependence” may be relevant?
Figure 1 is a typical logic model. It’s simple and stylized, but it’s the kind of thing we like to draw. (There is usually a feedback loop or two, but I’m leaving them out to  keep the picture simple.) I realize that one can make distinctions between logic models, theories  of action, and theories of change, and  that those distinctions may affect what I am about to say. But for now, I am content to call these “models”. I don’t think the LM/ToA/ToC differences will make a big difference in my argument.

Maybe I’m wrong but I have a strong sense that whenever we construct models like Figure 1, what we really mean is Figure 2. I suspect that we assume that for the outcome to be achieved, each link needs to be operative. I hope I am wrong because if we really believed in this model, it means we have set the program up for failure. Why? Because each connection must be operative. What are the odds?

We want people to design programs like Figure 3 because a program with this logic has a high probability of succeeding. The reason it could succeed is because there are three separate paths that can lead to the desired outcome.

Let’s say that we are handed a program like Figure #3. Or hopefully, that we have helped our customers realize that they should design programs like number three. What does this mean for evaluation? In one sense, not much. After all, in all three models we still have to measure each element and determine if the indicated relationships are operating. Nor would the methodology be much different. More or less, whatever combination of time series data, comparison group data, and qualitative assessments that we would deploy for one model we would deploy for the others. The differences are in data interpretation.

What would make a difference would be if our customers looked at number three and asked: “Well, now that you have showed us what the correct path is, we can design programs like that  in the future and save ourselves a lot of time and effort. Isn’t that right?” Put another way, our customers would be saying: “You have showed us that our program theory was wrong. The real program theory is much simpler and cheaper and easier than we thought”.

How should we respond? If we had enough confidence in our methodology, we would answer in the affirmative, that yes indeed, we have discovered an operative path, i.e. a correct (and simpler) program theory. If we had a bit less confidence in our methodology, we would hedge our bets and only claim that we may have determined a correct program theory. But both answers would contain three assumptions: 1) The original, elaborate program theory was wrong. 2) There is a simpler program theory. 3) Our evaluation provides a reasonable idea of what that correct program theory is.

What are the consequences for the evaluation scenario if sensitive dependence is present?
If sensitive dependence is operating, the following may be true.

  • Each time the model runs, i.e. each time the program is implemented, there are three possible paths to success that may be traveled.
  • The path that will be traveled cannot be predicted in advance.
  • The model in Figure 3 may correct in the sense that each time the program is implemented, one of those three paths will be traveled.

If this were the case, what would be the implications for evaluation? There are many. But here are a few that I can think of.

Implication for evaluation This is because…
Evaluation of a single implementation of the program is not a good test of the entire model that underlies program design. Sensitive dependence will affect which of several causal paths may lead to the same outcome.
Retrospective analyses of causation take on added importance. The reason for the evaluation is both to explain program behavior and also to confirm that the model does contain a hypothesized causal path. To do that, a retrospective path would be useful.
We may need to implement dual methodologies. One for prospective evaluation and one for retrospective evaluation. Researching retrospective causation may require data about program action that would not be part of a prospective analysis, or because it is not possible to know in advance what data would be needed to understand the past.
It becomes harder to work with customers. Many people will not be comfortable with the idea that one cannot know the exact path from  program implementation to outcome. Nor will they  like the idea of the restriction on how much an evaluation of a single implementation can tell them.

We Can’t Include Everything and Everyone. So, what to do? On Boundaries

Emily Gates
Assistant Professor, Boston College

I am an evaluator who also teaches evaluation courses and conducts research on evaluation. In my work, I draw on the ideas of boundaries and boundary critique from critical systems thinking. These ideas have deep philosophical roots and the potential to alter or affirm the way you see the world, and act and interact within it. They are fundamental to what ‘system’ and a ‘systems approach’ mean. In the limited space here, I share five points about boundaries.

Point #1: Boundaries influence what evidence and values are considered relevant or not. In evaluation, boundaries include: the time frame and scale for evaluating an intervention; the distinction between an intervention and its context; the questions and quality criteria that frame the inquiry; and who or what group(s) are considered stakeholders. Each of these (and other) boundary choices influence the empirical (i.e. data, evidence, facts) and normative (i.e. perspectives, values, principles) bases considered relevant or irrelevant in an evaluation.

Point #2: Boundaries compel us to look critically at who or what may be excluded or marginalized. Using a systems approach is often thought to mean being holistic and pluralistic, including all interrelationships within and influencing the intervention and including all stakeholder perspectives. Clearly, these are aspirations. When we shift our focus to boundaries, these (practically unattainable) aspirations become careful, critical examinations of who or what can defensibly be excluded and potential consequences of such exclusion.

Point #3: Not examining boundaries poses risks for ‘unintended’ consequences and ‘invalid’ claims. It is unavoidable that someone(s) and something(s) will be left out, and that who/what is left out bears directly and indirectly on the evaluation. These can be labeled ‘unintended’ consequences meaning they weren’t what the evaluation and evaluators set out to have happen. For example, if an evaluator takes for granted the stated problem an intervention is designed to address, and frames the questions around how well the intervention addresses this problem, the evaluation will exclude and perhaps further marginalize alternative or contrary/conflicting perspectives on the problem. Whether made apparent or kept implicit, this will have consequences and will condition the validity of the evaluative claims.

Point #4: Critiquing boundaries is a way to enact ethical responsibility and warrant claims.  Despite the risks I mentioned in Point #3, boundary judgements are often made implicitly or unknowingly, as is the case when an evaluation uncritically adopts the boundaries set by those commissioning the evaluation and/or defining the terms of reference. This is unacceptable in critical systems thinking. Because boundaries have consequences, which can be profound and counterproductive to an intervention’s and/or evaluation’s stated goals, evaluators have an ethical and professional responsibility to critically examine which boundaries are (and should be) used and how they are (and should be) set.

Point #5: There are no ‘right’ boundaries or ‘experts’ in boundary setting. Critical reflection and deliberation are processes for making ethically defensible boundary choices. Making boundaries explicit and choosing between options within bounded evaluations is not and should not be limited to those on the evaluation team. Additionally, using pre-existing boundaries in theoretical and empirical research, particularly those inherent in methodologies, instruments, and measures, are not sufficient grounds for use. Critical systems thinking challenges the basis of expertise in evaluation and argues that boundaries should be ethically justified through processes of explicit, critical examination of boundary choices and their consequences. Boundary judgements cannot be made solely in one’s head or within an evaluation team; it is important to examine alternative boundaries from different perspectives and value stances. Preferably, evaluators use participatory and deliberative processes.

In Sum: When we cannot include everything and everyone, where does this lead us? Critical systems thinking obligates us to make explicit boundary choices, consider their consequences and alternatives, and make boundary judgements/decisions that we can ethically stand behind, while also being open to critique, debate, and ongoing reevaluation and renegotiation.

 Acknowledgments: These are the points that have stuck with me and to which I regularly turn as I consider and make choices in my evaluation practice. They are not my points, but those centrally made in the works of C. West Churchman (US), Werner Ulrich (Switzerland), Gerald Midgley (UK), Martin Reynolds (UK), and Bob Williams (New Zealand). Mostly white, European men. I recognize the risk of this critical systems lineage in perpetuating power imbalances around who and where knowledge is generated from – a risk I find ethically problematic. Therefore, I call attention to the continual evolution of critical systems thinking as advanced in the Inclusive Systemic Evaluation for Gender Equality, Environments, and Marginalized Voices (ISE4GEMs) framework (info here) developed by Anne Stephens (Australia), Ellen Lewis (UK), and Shraventi Reddy (US); Nan Wehipeihana (Māori, NZ) and Kate McKegg (NZ) who practice and write about developmental evaluation drawing on critical systems ideas; and my work (US) on critical systems heuristics. I welcome feedback on who or what I have excluded in this list and will revise accordingly. I included this note to both model and invite further boundary critique. And a shout out to Joseph Madres, for his editorial review of this post!