Development Trajectories, Complexity Thinking and Theories of Change

Aaron E. Zazueta;
Nima Bahramalian;
Thuy Thu Le

This article builds on a previous contribution to this blog identifying a set of complex adaptive systems that are particularly useful in the formulation of theories of change (TOCs), find the link to the blog here. These include the concepts of the Social-Ecological Systems, boundaries, domains, scales, agents, adaptive behavior, and emergence and system development trajectory in the formulation of theories of change. This article briefly explains how to use these concepts and presents some aspects of an the article Development trajectories and complex systems-informed theories of change which was published in September 2020 in the American Journal of evaluation (Zazueta et al., 2020). A non-edited version of the article is available here. The article illustrates the use of the approach in  the evaluation of the UNIDO /SECO project SMART- Fish in Indonesia (UNIDO, 2019), available here.

The long-term objective of the SMART-Fish was to support the transformation to more efficient and environmentally sustainable fisheries that increased value across the market chain, especially among the small fishermen and women. The project addressed three value chains that have different ecological, economic and social characteristics: Pangasius, Pole and Line Tuna and Seaweed. The evaluation focused on the identification of the conditions conducive to the intended transformation and the assessment of the extent to which the project contributed to the advancement of those conditions.  The evaluation had two main phases. In the first phase, the project management team and the evaluation team jointly developed a proposal of the social ecological fisheries system in Indonesia that was subsequently presented to project stakeholders for discussion and application.  The first phase consisted of the following steps:

  1. Definition of a manageable set of domains that could provide an initial framework to identify key enabling conditions for the intended transformation. Drawing from a review of the technical literature, the  teams  identified five broad domains: policy and regulatory, institutional, technological, financial, and sociocultural.
  2. Brainstorming to identify the four or five most important enabling conditions in each of the five domains. The result was 32 enabling conditions.
  3. Regrouping the 32 enabling into clusters and when appropriate relabeling domains. This led to the grouping of the 32 conditions into six domains.
  4. Identification of the instances in which each of the 32 conditions had an enabling function to the rest of the conditions across the system. This step consisted of development of a relation matrix which identified 236 significant links among  the 32 conditions. Figure1 plots the links among the different enabling conditions.
  5. Ranking of the 32 enabling conditions in terms of their influence across the system. Using the program NodeXL, we ran several tests to identify the most influential enabling conditions across the system. These tests identified five of such conditions (figure 2).

In phase two, the evaluation team convened three stakeholder focus groups, one for each value chain. Each group was asked to rate the state of the 32 enabling conditions before the project started and at the time of project completion. Subsequently, the groups were asked to rate the extent to which SMART- Fish contributed to the changes in these conditions. When aggregating the responses on the three value chains, changes in the enabling conditions to the long-term objectives were most pronounced in the domains of trade and markets, governance, and production (Figure 3). These were three domains that the project targeted and in which stakeholders reported SMART-Fish making substantial contribution.  While the stakeholders acknowledged project contributions in science and technology, the progress made in many of the enabling conditions in this domain was seen as low.  Progress was made in the enabling conditions under the financial domain with no link to the project.

The evaluation team also used the focus group data to assess the extent to which the project had contributed to the five key enabling conditions with the most influence on the system previously identified through the network analysis which are presented in Figure 2. This figure indicates that despite the project making important contributions to enabling condition  pertaining innovation capacity in science and technology,  not much progress was achieved on other conditions pertaining the technology domain.

Despite the complexity of the system, the approach allowed us to develop a model to understand the factors leading to transformation of the fisheries systems in Indonesia and the extent and forms by which the project had contributed to a development trajectory consistent with the long term goal of the project.  The approach also to enabled the engagement of stakeholders in the assessment of the contributions made by the project in the trajectory of the desired policies.    

UNIDO. (2019). Independent Terminal Evaluation Indonesia SMART-Fish Increasing Trade capacities of Selected Value Chains within the Fisheries Sector in Indonesia. United Nations Industrial Development Organization.

Zazueta, A. E., Le, T. T., & Bahramalian, N. (2020). Development Trajectories and Complex Systems–Informed Theories of Change. American Journal of Evaluation, I–20.



Applied Complexity: Theory and Practice of Human Systems Dynamics

Glenda Eoyang, PhD
Founding Executive Director, Human Systems Dynamics Institute,

My particular take on complexity and systems is called human systems dynamics (HSD). It is a field of theory and practice that applies principles of complex adaptive systems to help people see, understand, and influence emergent patterns in complex human systems.  HSD is applicable at all scales of human experience from intrapersonal reflection and cognition through global patterns of economic and cultural interaction (Eoyang, 1997). For more information about the models and methods of HSD, visit our website Here, I would like to introduce the basic features of the theory and practice that form the foundation of HSD.

HSD theory is drawn from the field of complex adaptive systems (Dooley, 1997).  In this approach, a system is defined as a collection of agents that interact to generate system-wide patterns. Those patterns then constrain the behavior of agents in future cycles of interaction. The process is called emergence or self-organization (Baranger, et al, 2006).

A variety of interesting and relevant natural phenomena can be understood from this systems perspective. Uncertainty is, perhaps, the most significant feature of a complex adaptive system. Regardless of the amount of information available, the future of an open, high dimension, nonlinear complex system cannot be predicted. Self-organized criticality explains discontinuous change over time. Non-Gaussian, Power Law data distributions are explained through interactions within and across scales of the system  (Bak, 1996). Dissipative structures explain how orderly relationships appear to be generated spontaneously in systems with open boundaries (Prigogine, et al, 2017). Drawing from a variety of perspectives in the complexity sciences, my research defined three conditions that influence the speed, path, and products of self-organizing processes in human systems (Eoyang, 2001). Short descriptions and citations for sources that inform HSD are accessible here

Complexity science has informed a whole generation of social and human systems change literature (Eoyang, 2011). The practice of HSD contributes to that body of work. The purpose of HSD is captured in our vision: People everywhere thrive because we see patterns clearly, seek to understand, and act with courage to transform turbulence and uncertainty into possibility for all. We have developed a variety of models and methods to help individuals and teams interact with and influence complex systems in which they work and play, while remembering that the future is ultimately unpredictable and uncontrollable (Eoyang & Holladay, 2013).

Bak, P. (1996). How nature works: The science of self-organized criticality. New York, NY, USA: Copernicus.
Baranger M., Kauffman S., Stanley E., Levin S., Clark D. (2006) Emergence. In: Minai A.A., Bar-Yam Y. (eds) Unifying Themes in Complex Systems. Springer, Berlin, Heidelberg
Dooley, K. (1997). “A complex adaptive systems model of organization change.” Nonlinear Dynamics, Psychology, and the Life Sciences, 1: 69-97.
Eoyang, G. H. (1997). Coping with Chaos: SEVEN simple tools. Cheyenne, WY: Lagumo.
Eoyang, G. (2001). Conditions for Self-Organizing in Human Systems.  Unpublished doctoral dissertation. The Union Institute and University.
Eoyang, G. H. (2011). Complexity and the Dynamics of Organization Change. In P. Allen, S. Maguire, & B. McKelvey (Authors), The sage handbook of complexity and management (pp. 319-354). Los Angeles, CA: SAGE.
Eoyang, G., Holladay, R. (2013). Adaptive action: Leveraging uncertainty in our organization. Stanford University Press.
Prigogine, I., Stengers, I., & Toffler, A. (2017). Order out of chaos: Man’s new dialogue with nature. London: Verso.




Creating change in complex systems: the role of phase space

Mat Walton BA (Hons), DPH, PhD
Technical Lead Social Systems
Institute of Environmental Science and Research Limited (ESR)
Kenepuru Science Centre: 34 Kenepuru Drive, Kenepuru, Porirua 5022

In this piece I introduce the concept of phase space (interchangeable with term ‘state space’).  Within a Complex Adaptive Systems (CAS) understanding of how systems change over time, I argue that phase space is important for designing interventions within a CAS.  Here I draw principally on the work of David Byrne and colleagues, located within dynamical systems and chaos (Capra & Luisi, 2014; Mitchell, 2009).

Complex Adaptive Systems are comprised of a set of interconnected elements.  Through the interaction of these elements phenomena ‘emerge’ as a feature of the system as a whole.  That is, we can’t look at individual elements within the system to understand the emergent phenomena (Byrne, 2013; Byrne & Callaghan, 2014).

A CAS often produces quite stable emergent phenomena over time.  When change does occur in emergent phenomena, it is due to a new configuration of the system – new elements, new connections or both.  Because emergent phenomena are the result of many interactions, there is large uncertainty in how a system will change (Eppel, Matheson, & Walton, 2011).  What is certain, is that the range of possible future system states is enabled and restricted by phase space.

Phase space can be thought of as the space within which a CAS can occupy.  While we can’t know precisely how a system might change, we do know that it will be within the phase space.  A change in emergent phenomena within a phase space may be incremental.  A radical change suggests a shift in phase space, a qualitative difference in the system (Byrne & Callaghan, 2014).

An example of a radical shift in emergent phenomena can be found in New Zealand’s government.  New Zealand is a parliamentary democracy.  In 1996, the system for electing parliament changed from First Past the Post to Mixed Member Proportional (MMP).  MMP is a system that is designed to produce coalition, multi-party governments.  That is, to have a majority of votes in parliament (required to become government), more than one party need to agree to work together.  Under First Past the Post, New Zealand had a history of mostly single party governments.  Since MMP, there has only been coalition governments.  While coalition governments were possible under First Past the Post, and single-party governments are possible under MMP rules, there is only a small set of conditions under which these outcomes are likely.

We could say that the phase space changed in 1996 from one where single party governments are usual, to one where coalition governments are usual.  In this case the rules around how members of parliament are elected changed the elements in the system (representation of political parties), and by implication the interaction between the elements (how political parties work together to achieve majority of parliamentary votes).

Phase space has implications for designing programmes and interventions.  Questions to consider include: is the intended outcome of my programme possible within current phase space?  Will the programme create a different phase space (that makes my programme outcomes more likely)?  How will I recognise if phase space has changed?

Byrne, D. (2013). Evaluating complex social interventions in a complex world. Evaluation, 19(3), 217-228. doi:10.1177/1356389013495617
Byrne, D., & Callaghan, G. (2014). Complexity Theory and the Social Sciences: The state of the art. Oxon: Routledge.
Capra, F., & Luisi, P. L. (2014). The Systems View of Life. Cambridge: Cambridge University Press.
Eppel, E., Matheson, A., & Walton, M. (2011). Applying complexity theory to New Zealand public policy: Principles for practice. Policy Quarterly, 7(1), 48-55. Retrieved from
Mitchell, M. (2009). Complexity: A guided tour. Oxford: Oxford University Press.

Applying the complexity concept of “sensitive dependence” to understanding how a program works

Jonathan (Jonny) A Morell
President, 4.669… Evaluation and Planning
This is the first of what I hope will be many posts that show how specific constructs from complexity science can be useful for doing evaluation. There will only be many posts if others contribute. Please do.

What complex behavior is this post about?
This post is about “sensitive dependence”.

A system’s sensitivity to initial conditions refers to the role that the starting configuration of that system plays in determining the subsequent states of that system. When this sensitivity is high, slight changes to starting conditions will lead to significantly different conditions in the future (Santa Fe Institute).

… refers to the idea that current and future states, actions, or decisions depend on the sequence of states, actions, or decisions that preceded them – namely their (typically temporal) path.  For example, the very first fold of a piece of origami paper will determine which final shapes are possible; origami is therefore a path dependent art (Santa Fe Institute).

What is an evaluation scenario where “sensitive dependence” may be relevant?
Figure 1 is a typical logic model. It’s simple and stylized, but it’s the kind of thing we like to draw. (There is usually a feedback loop or two, but I’m leaving them out to  keep the picture simple.) I realize that one can make distinctions between logic models, theories  of action, and theories of change, and  that those distinctions may affect what I am about to say. But for now, I am content to call these “models”. I don’t think the LM/ToA/ToC differences will make a big difference in my argument.

Maybe I’m wrong but I have a strong sense that whenever we construct models like Figure 1, what we really mean is Figure 2. I suspect that we assume that for the outcome to be achieved, each link needs to be operative. I hope I am wrong because if we really believed in this model, it means we have set the program up for failure. Why? Because each connection must be operative. What are the odds?

We want people to design programs like Figure 3 because a program with this logic has a high probability of succeeding. The reason it could succeed is because there are three separate paths that can lead to the desired outcome.

Let’s say that we are handed a program like Figure #3. Or hopefully, that we have helped our customers realize that they should design programs like number three. What does this mean for evaluation? In one sense, not much. After all, in all three models we still have to measure each element and determine if the indicated relationships are operating. Nor would the methodology be much different. More or less, whatever combination of time series data, comparison group data, and qualitative assessments that we would deploy for one model we would deploy for the others. The differences are in data interpretation.

What would make a difference would be if our customers looked at number three and asked: “Well, now that you have showed us what the correct path is, we can design programs like that  in the future and save ourselves a lot of time and effort. Isn’t that right?” Put another way, our customers would be saying: “You have showed us that our program theory was wrong. The real program theory is much simpler and cheaper and easier than we thought”.

How should we respond? If we had enough confidence in our methodology, we would answer in the affirmative, that yes indeed, we have discovered an operative path, i.e. a correct (and simpler) program theory. If we had a bit less confidence in our methodology, we would hedge our bets and only claim that we may have determined a correct program theory. But both answers would contain three assumptions: 1) The original, elaborate program theory was wrong. 2) There is a simpler program theory. 3) Our evaluation provides a reasonable idea of what that correct program theory is.

What are the consequences for the evaluation scenario if sensitive dependence is present?
If sensitive dependence is operating, the following may be true.

  • Each time the model runs, i.e. each time the program is implemented, there are three possible paths to success that may be traveled.
  • The path that will be traveled cannot be predicted in advance.
  • The model in Figure 3 may correct in the sense that each time the program is implemented, one of those three paths will be traveled.

If this were the case, what would be the implications for evaluation? There are many. But here are a few that I can think of.

Implication for evaluation This is because…
Evaluation of a single implementation of the program is not a good test of the entire model that underlies program design. Sensitive dependence will affect which of several causal paths may lead to the same outcome.
Retrospective analyses of causation take on added importance. The reason for the evaluation is both to explain program behavior and also to confirm that the model does contain a hypothesized causal path. To do that, a retrospective path would be useful.
We may need to implement dual methodologies. One for prospective evaluation and one for retrospective evaluation. Researching retrospective causation may require data about program action that would not be part of a prospective analysis, or because it is not possible to know in advance what data would be needed to understand the past.
It becomes harder to work with customers. Many people will not be comfortable with the idea that one cannot know the exact path from  program implementation to outcome. Nor will they  like the idea of the restriction on how much an evaluation of a single implementation can tell them.

We Can’t Include Everything and Everyone. So, what to do? On Boundaries

Emily Gates
Assistant Professor, Boston College

I am an evaluator who also teaches evaluation courses and conducts research on evaluation. In my work, I draw on the ideas of boundaries and boundary critique from critical systems thinking. These ideas have deep philosophical roots and the potential to alter or affirm the way you see the world, and act and interact within it. They are fundamental to what ‘system’ and a ‘systems approach’ mean. In the limited space here, I share five points about boundaries.

Point #1: Boundaries influence what evidence and values are considered relevant or not. In evaluation, boundaries include: the time frame and scale for evaluating an intervention; the distinction between an intervention and its context; the questions and quality criteria that frame the inquiry; and who or what group(s) are considered stakeholders. Each of these (and other) boundary choices influence the empirical (i.e. data, evidence, facts) and normative (i.e. perspectives, values, principles) bases considered relevant or irrelevant in an evaluation.

Point #2: Boundaries compel us to look critically at who or what may be excluded or marginalized. Using a systems approach is often thought to mean being holistic and pluralistic, including all interrelationships within and influencing the intervention and including all stakeholder perspectives. Clearly, these are aspirations. When we shift our focus to boundaries, these (practically unattainable) aspirations become careful, critical examinations of who or what can defensibly be excluded and potential consequences of such exclusion.

Point #3: Not examining boundaries poses risks for ‘unintended’ consequences and ‘invalid’ claims. It is unavoidable that someone(s) and something(s) will be left out, and that who/what is left out bears directly and indirectly on the evaluation. These can be labeled ‘unintended’ consequences meaning they weren’t what the evaluation and evaluators set out to have happen. For example, if an evaluator takes for granted the stated problem an intervention is designed to address, and frames the questions around how well the intervention addresses this problem, the evaluation will exclude and perhaps further marginalize alternative or contrary/conflicting perspectives on the problem. Whether made apparent or kept implicit, this will have consequences and will condition the validity of the evaluative claims.

Point #4: Critiquing boundaries is a way to enact ethical responsibility and warrant claims.  Despite the risks I mentioned in Point #3, boundary judgements are often made implicitly or unknowingly, as is the case when an evaluation uncritically adopts the boundaries set by those commissioning the evaluation and/or defining the terms of reference. This is unacceptable in critical systems thinking. Because boundaries have consequences, which can be profound and counterproductive to an intervention’s and/or evaluation’s stated goals, evaluators have an ethical and professional responsibility to critically examine which boundaries are (and should be) used and how they are (and should be) set.

Point #5: There are no ‘right’ boundaries or ‘experts’ in boundary setting. Critical reflection and deliberation are processes for making ethically defensible boundary choices. Making boundaries explicit and choosing between options within bounded evaluations is not and should not be limited to those on the evaluation team. Additionally, using pre-existing boundaries in theoretical and empirical research, particularly those inherent in methodologies, instruments, and measures, are not sufficient grounds for use. Critical systems thinking challenges the basis of expertise in evaluation and argues that boundaries should be ethically justified through processes of explicit, critical examination of boundary choices and their consequences. Boundary judgements cannot be made solely in one’s head or within an evaluation team; it is important to examine alternative boundaries from different perspectives and value stances. Preferably, evaluators use participatory and deliberative processes.

In Sum: When we cannot include everything and everyone, where does this lead us? Critical systems thinking obligates us to make explicit boundary choices, consider their consequences and alternatives, and make boundary judgements/decisions that we can ethically stand behind, while also being open to critique, debate, and ongoing reevaluation and renegotiation.

 Acknowledgments: These are the points that have stuck with me and to which I regularly turn as I consider and make choices in my evaluation practice. They are not my points, but those centrally made in the works of C. West Churchman (US), Werner Ulrich (Switzerland), Gerald Midgley (UK), Martin Reynolds (UK), and Bob Williams (New Zealand). Mostly white, European men. I recognize the risk of this critical systems lineage in perpetuating power imbalances around who and where knowledge is generated from – a risk I find ethically problematic. Therefore, I call attention to the continual evolution of critical systems thinking as advanced in the Inclusive Systemic Evaluation for Gender Equality, Environments, and Marginalized Voices (ISE4GEMs) framework (info here) developed by Anne Stephens (Australia), Ellen Lewis (UK), and Shraventi Reddy (US); Nan Wehipeihana (Māori, NZ) and Kate McKegg (NZ) who practice and write about developmental evaluation drawing on critical systems ideas; and my work (US) on critical systems heuristics. I welcome feedback on who or what I have excluded in this list and will revise accordingly. I included this note to both model and invite further boundary critique. And a shout out to Joseph Madres, for his editorial review of this post!



Beverly Parsons
Executive Director, InSites
Autopoiesis is one of my favorite systems concepts because of its importance in helping us understand a crucial difference between mechanistic systems and  living systems. The term was coined by Humberto Maturana, a Chilean biologist. It means “self-making” or “self-producing” (the combination of auto meaning “self” and poiesis meaning “making”). In the 1970s, Maturana and his colleague, Francisco Varela, built their theory about what is life from observing how biological cells function. Maturana and Varela viewed the main characteristic of life as self-maintenance through the “internal networking of a chemical system that continuously reproduces itself within a boundary of its own making”.[1]

There are many transformations continually going on in a biological cell while at the same time “there is cellular self-maintenance—the fact that the cell maintains its individuality”.[2] A person, a tree, a bear, and a flower all differ from a chair, a computer, a television, and a glass cup in that the items in the first group engage in self-maintenance via a mechanism of self-regeneration from within whereas this doesn’t happen in the second group[3].

It might seem that an autopoietic system is a closed system but, no. There’s an important distinction between the organization and the structure of a system. “The former refers to the relationships between components which are necessary to define that system as part of a particular class of systems; the latter to the particular physical form which those components take.”[4]

An autopoietic system (e.g., a cell) can retain its organization while the environment around it is changing; it is organizationally closed. At the same time, it can be structurally open, allowing energy and matter to flow in and out of it. An autopoietic system, i.e., a living system, must retain its basic organization to stay alive. Marturana and Varela developed other related concepts (e.g., structural coupling) which address important matters of how an autopoietic system changes in relation to its environment. They also did important work together around the biology of cognition. Debates continue about the applicability of autopoiesis at the level of social systems.[5]

Capra, F., & Luigi Luisi, P. (2014). The systems view of life: A unifying vision. Cambridge: Cambridge University Press.
Ramage, M. & Shipp, K. (2009). System Thinkers. London: Springer.
Scharmer, O. (2019). Social systems as if people mattered: Response to the Kuhl Critique of Theory U. accepted for publication at J. Change Management.
Article at:

[1] Capra and Luisi (2014). p. 129.
[2] Ibid, p. 130.
[3] Ibid, p. 132.
[4] Ramage and Shipp (2009). p. 201.
[5] Scharmer. (2019).

Looking for Input on Next Steps: What do research and theory in complexity and systems tell us about evaluation practice and evaluation theory?

Meg Hargreaves
Senior Fellow, Economics, Justice, and Society Department, NORC
Jonny Morell
President, 4.669… Evaluation and Planning

Development Along Two Directions
We are looking for suggestions about content and authors for posts on:

  • research and theory in complexity and systems, and
  • application to evaluation.Posts should be short and focused.

This tactic may not fit the spirit of “systems”, but it will educate and not overwhelm.

Please contact us if you have ideas to share.
To convey a sense of what we have in mind, here is an example.
Jonny’s Example based on Accident Investigation
Consider accident investigation and its relationship to path dependence. It is possible to trace causation in retrospect and to use that knowledge to minimize the likelihood that a class of accidents will reoccur. One would be foolish ignore these analyses. But precisely what accidents will be affected by the change? Unknown and largely unknowable. Is there any certainty that other causal paths won’t lead to the same type of accidents? No certainty at all.
I’d write between three and six paragraphs. There would be references but no deep explanation. I’d present the principles in a few sentences and give an example of how an evaluation would differ if I did or did not take path dependence seriously.

Evolutionary search, network structures and diversity

Rick Davies (Dr), Monitoring and Evaluation Consultant, Cambridge, United Kingdom | UK. Websites:  and | Twitter: @MandE_NEWS | Skype: rickjdavies

My initial interest in the relevance of evolutionary theory was specifically in a field known as evolutionary epistemology. In its simplest form, this views the evolutionary process as a type of learning process, one involving the selective acquisition and retention of information, happening at multiple levels of scale. In the context of PhD research, evolutionary epistemology was used as a means of understanding organisational learning within organisations, and more specifically, in the operations of a large NGO in Bangladesh (Davies, 1998). It also helped generate two practical proposals  – one being a means of participatory impact monitoring and the other being a participatory approach to the exploration of alternate futures. Both involved a particular social implementation of the evolutionary search algorithm: variation, selection and reproduction. The main intellectual influences here have been Donald CampbellGregory Bateson and Daniel Dennet.

The second body of ideas that has taken up my time is network analysis, in its many and varied forms. This seems a practical way of thinking about complexity – a body of thinking that overlaps substantially with evolutionary theory. Most attempts to describe/define what complexity is do so by referring to complex systems as networks of some kind. There is a wide range of methods of describing and measuring network structures that is relatively agnostic in terms of the theories that can be used to interpret that kind of data.  One good feature of a network perspective is that it can help connect more abstract thinking about complexity to actual observations and measurements. Some intellectual influences here have been BorgattiBenklerBurt., Krebs.

The third body of thinking, which has been of more recent interest, is about the measurement, origins and consequences of diversity. Both evolutionary theory and network analysis can have something to say about diversity. So can other fields that are of interest to me. One of which is known as “collective intelligence” i.e. the study of the circumstances where the behaviour of a group can be more productive (on some measure) than that of the best individual in the group. Some intellectual influences here have been Page SuroweickiWagner

Scale, Uncertainty and Risk: A Complexity Perspective

Rob D. van den Berg
Visiting Professor, King’s College London

My experience with complexity and systems in evaluation has been in the nexus between development and environment. From 2004 to 2014 I worked as evaluator at the Global Environment Facility (GEF), that financially supported many projects and programs throughout the world focusing on how environment and development could become a win-win situation, leading to more sustainable systems and innovations that would lead to transition to greener societies and economies, safe biodiversity and prevent climate change. My overall message from that experience is the importance of scale. Many of the investment programs supported by the GEF were great and could have been the steppingstone to preventing the current environmental crises, but these initiatives did not reach the scale needed to achieve that. That depended on whether societies, countries, the world would take over the examples of how transformation could take shape. And we know the world did not do this… So the problem of scale to reach transformation to a different system is a challenge that we need to face if a transformation is needed to prevent disasters. Gradually we started to introduce this in our evaluations, but noting the challenge of scale is as far as the power of an evaluator to change the world goes…

A second issue that gradually emerged for me is uncertainty. Complexity almost always means that what you study is to a large extent unpredictable. Complexity calls for multi-actor and multi-layered programs, that aim to influence a complex system through a range of activities. A typical environment/development program would have: a governmental/policy/regulations layer, supposed to prevent negative outcomes and promote and regulate positive ones; a civil society component, interacting with people to encourage them to change behaviour; a private sector component focusing on new ways to do business; and capacity development to promote the knowledge and capacities to move these components forward. In almost all cases the interactions between these components and the sovereign decisions of stakeholders brought huge uncertainty to the program, and only flexible and adaptive management would be able to turn surprises from obstacles to enabling factors for change.

The third issue concerns risk. Gradually I became aware that evaluation is not risk oriented. For many of us, this was self-evident. Evaluations look at the past, and the risks are no longer alive; they have come to pass or not, and about the only issue evaluators were involved in is whether risks were identified well during the implementation of a program. But this is a very different perspective than facing risks and uncertainty in the future. Evaluating programs that aim to transform a complex system inevitably means that as evaluators we need to adopt a forward-looking perspective in addition to our usual assessment of “what happened”. The main instruments of forward-looking science are scenario-building and risk assessment. Both have a long history in science and in many areas of work, amongst them insurance, pension schemes, to name a few financial and socio-economic ones. The Earth sciences (geography, climate, biology, etc.) have done a lot of work to integrate the forward-looking perspective in the research they do. The most famous is perhaps climate science, which models and calculates developments in climate and identifies risks for our societies and economies and life on planet Earth. In my time at the GEF we made the first tentative steps in this direction, and for me it is clear that evaluation needs to adopt deeper knowledge of these issues and approaches to move forward to support transformational change that would bring us a sustainable future.


Evaluation Criteria and Boundary Critique

Bob Williams bob@BOBWILLIAMS.CO.NZ

Criteria are the engine that drives evaluation.  As evaluators, our core task is to address the ‘so what’ question not the ‘what’ question.  Indeed our focus on judgements of worth (e.g. merit, value, significance) differentiate our craft from other forms of social inquiry.  And you cannot arrive at a judgement of worth without criteria; either explicit or implicit.

Yet evaluators frequently treat criteria as unproblematic.  We commonly take evaluation commissioners’ criteria as the basis for our evaluative inquiry, without any questions asked. We boilerplate criteria such as those developed by the OECD’s Development Assistance Committee (DAC).  If we adopted a more critical stance towards criteria setting, we would do neither of those things.

C. West Churchman in the 1970s developed the discipline of Critical Systems. Many in the system field subsequently refined his ideas, especially Gerald Midgley, Martin Reynolds and Werner Ulrich. Critical Systems based approaches have a particular focus on how boundaries are established around a task.  In Critical Systems terms, boundaries decide who or what is acknowledged or ‘in’, and who or what is marginalised or ‘out’.  Ulrich’s Critical Systems Heuristics (CSH) comprise a series of questions around common boundary decisions that promote  deep consideration of those decisions and their consequences.  These considerations can be based on moral or ethical arguments, as well as practical and political realities.

Evaluation criteria are boundary choices.  They form the boundary between what the evaluation considers as ‘worthwhile’ and ‘worthless’, ‘significant’ and ‘insignificant’, ‘valuable’ and ‘irrelevant’.  Within a Critical Systems frame these decisions need critical review.  Take the common criterion of ‘efficiency’ for instance.  Who or what is ‘marginalised’ by such a criterion?  Why chose this as a criterion?  Is an inefficient intervention worthless?  And what kind of projects are marginalised by such an understanding?  What about interventions that are testing ideas or are in the process of development.  They are almost certainly not efficient but may be highly worthwhile.
I find that using the challenging questions that Critical Systems and Critical Systems Heuristics poses are essential to the development and use of criteria in my evaluations.

Some easily available references about the use of Critical Systems in evaluation.Better Evaluation
Werner Ulrich’s website :
Bob Williams (2019) Systemic Evaluation Design; A Workbook