How Much Diverse Intellectual Input Should Be Included When Planning an Evaluation? I addressed this question some time ago in my book on unintended consequences. It’s time to revisit the question.
What is the Optimal Amount of Diverse Opinion?
My interest in this question stems from an article I recently read in Foreign Affairs about the transition to green energy (Green Upheaval: The New Geopolitics of Energy). Its thesis is that geopolitical considerations are critical for understanding the transition to green energy. Left to my own devices, I guarantee that I would never have considered this topic in any evaluation I can ever imagine myself doing.
Reading the article reminded me of a point I made about anticipating the consequences of interventions. I argued that one useful tactic would be to include more than one perspective in planning an evaluation. By “perspective” I meant discipline or paradigmatic point of view. I then went on to define “more than one” as two or three. My point was that too many perspectives would result in evaluations that were cumbersome and expensive. Past three the law of diminishing returns would set in. The disadvantages of getting too complicated would outweigh the contribution of diverse perspectives. I then went on to discuss how those two or three perspectives should be chosen, given the limitless candidates for inclusion. I made the bold (and in retrospect foolish) statement that the specific choices did not matter; that it was the fact of diversity, rather than the makeup of diversity, that mattered. I then forgot the whole issue and did other things. Until I read that article in Foreign Affairs. Now I’m revisiting the question. Do the specific choices matter? How many are needed? I am taking a bit of a more rigorous look at these questions. My inclination is to invoke five tactics.
- Take it as a fact that the number of possibilities is unmanageably large.
- Use research evidence to identify some relevant domains of knowledge.
- Pick randomly from the other candidates.
- Appreciate that not all points of view need to be equally represented.
- Leverage the fact that diversity can change over an evaluation’s lifecycle.
1) Take it as a fact that the number of possibilities is unmanageably large.
It is the nature of complex systems that the population of unintended effects is unmanageably large. (This is so because of sensitive dependence and the long tail distribution of possible events that could affect the consequences of implementing a program.) This reality has implications for how an evaluation is designed and how it is carried out.
2) Use research evidence to identify some relevant domains of knowledge.
Within broad boundaries, domains of relevant research can be known. Imagine that I am evaluating a safety promotion innovation in transportation. Because I know that there is an extensive literature on how safety culture and situational awareness affect accidents across transportation modes, I would include expertise in those fields in my evaluation planning. If planners and evaluators are surprised by consequences that emanate from these domains, they have themselves to blame. I have a line in my book on unintended consequences: “It may be trite to say it, but there is nothing so useful as a good literature review.” That is still true.
3) Pick randomly from the other candidates.
A good case can be made for drawing on a wide variety of expert domains. Sometimes this will be true because of available data. Sometimes it will be true because a program’s system context is recognized. No matter how constrained one draws the system boundaries, there will be more loci of change than can be accommodated. This is the point where it is the fact of diversity rather than the specifics of diversity that matter. I would pick randomly from what seem like the most likely candidates.
4) Appreciate that not all points of view need to be equally represented.
The reason to limit intellectual input into the design of an evaluation is that taking too much diverse opinion into consideration would make an evaluation unworkable. Evaluation design considerations, however, can differ in their degree of “workability”.
To continue the previous example, imagine an evaluation of a program that was designed to improve situational awareness as a way of preventing accidents. “Workability” of including different points of view depends on the architecture of an evaluation design. Implementing some design considerations involve commitments to rigid and expensive features. For instance, consider “sample stratification”. “Stratification” has a technical meaning with respect to representative sampling. It also has intuitive meaning that extends to qualitative research. For instance, how many different departments need to provide data when research on safety culture is carried out? Each additional distinction requires more complicated logistics and more negotiation with host organizations.
Other design characteristics are malleable. Adding questions to a survey is one example. Another is scanning the contents of publications and social media. A third might be extending the timeline for final observations. What is needed is to not only elicit a wide variety of opinion on the consequences of program action, but also to determine what variety of methodologies would suffice to assess the proposed outcome. The lower the evaluation burden, the greater the number of points of view that can be included.
5) Leverage the fact that diversity can change over an evaluation’s lifecycle.
Different expert opinion can be brought to bear at different points in an evaluation lifecycle. As evaluation proceeds, knowledge about unexpected consequences will develop. Or at least, suspicion about such consequences may arise. As the lifecycle proceeds, new input can be sought. Of course, the value of that input will decrease over time because the later it arrives, the less of the evaluation that can be influenced. Still, something is better than nothing.
Justifying the Rule of Thumb
I am still convinced that including more than two or three diverse inputs into an evaluation design is problematic, and also, that two or three would suffice. But can I justify this opinion? I don’t have a rigorous answer, but I can chip away at it.
Bob Williams suggested a “wisdom of the crowds” approach. He is referring to the idea that independent opinion by people with some knowledge of a subject can mix in such a way as to produce a better answer than any single opinion. The idea is that any single estimate/opinion will have “noise” or “bias”, either in a technical statistical sense, or just with respect to different opinions that may be off the mark for various reasons. With enough uncorrelated inputs, the collective answer converges on what is correct. The uncorrelated aspect works for me. I can easily imagine not much overlap in how people from different disciplines conceive of program outcomes. (The Font of All Wisdom has a good explanation of this phenomenon. For a source that presents some in depth research, and in so doing delves deeply into the research literature, check out an article that Rick Davies sent me: Modularity and composite diversity affect the collective gathering of information online.)
The “two or three” rule however, still seems problematic. If there are many possible perspectives, what is the intellectual justification for using only a few? I don’t see the question in formal mathematical terms, but I do look to sampling theory for inspiration. Let’s say there are ten viable candidate perspectives that could be tapped for intellectual input into designing an evaluation. Play around with various sample size calculators, plug in various parameters, and the sample size comes out in the 8+ range.
Another source of inspiration comes from some research I have see on extracting meaning from limited data that is embedded in much richer data sets. (Maximizing the information learned from finite data selects a simple model). This is a very technical article about information:complexity ratios. A section from the “Significance” summary reads:
Most physical theories are effective theories, descriptions at the scale visible to our experiments which ignore microscopic details. Seeking general ways to motivate such theories, we find an information theory perspective: If we select the model which can learn as much information as possible from the data, then we are naturally led to a simpler model, by a path independent of concerns about overfitting.
In addition to the data inspected in the research, the authors also reference similar patterns in a variety of scientific domains.
I do not believe that there is any direct connection between diverse input to guide evaluation and either sampling theory or finite data in simple models. But I do see these cases as data points in what, for lack of a better phrase, I’ll call the “system by which nature works”. That seems to be a system in which we can extract a great deal of meaning from small amounts of information. Maybe that is a characteristic of Nature’s system. If it is, why not limited diversity as inputs to evaluation?
Osvaldo Feinstein suggested a justification based on bounded rationality. The Font of all Wisdom explains this concept as:
Bounded rationality is the idea that rationality is limited when individuals make decisions. In other words, humans’ “preferences are determined by changes in outcomes relative to a certain reference level”. Limitations include the difficulty of the problem requiring a decision, the cognitive capability of the mind, and the time available to make the decision. Decision-makers, in this view, act as satisfiers, seeking a satisfactory solution, rather than an optimal solution. Therefore, humans do not undertake a full cost-benefit analysis to determine the optimal decision, but rather, choose an option that fulfills their adequacy criteria.
I see this as providing a hint that my rule of thumb may have some intellectual merit. Insisting on too diverse a set of inputs would increase the time and expense of an evaluation, and it would tax the intellectual capacity needed to develop a design that could span the measurement landscape.
Bob suggested another angle that is rooted in systems thinking. The idea is that a systems approach to a phenomenon requires drawing boundaries to constrain the inquiry. This is because there is too much going on in any systemic phenomenon to know it comprehensively. Or put another way, the world is infinitely full of stuff and relationships among the stuff. So how can one attempt to derive meaningful understanding? Only by defining what to leave in and what to exclude, thereby allowing one to construct a meaningful inquiry. This way of thinking is echoed in what is known about complexity theory with respect to the demons and angels that lurk in the long tail of event distributions. Make an intelligent guess about the few sources of most of surprises, and you may do OK.
2 thoughts on “How Much Diverse Intellectual Input Should Be Included When Planning an Evaluation? (Revised)”
The world is far too complex and diverse to set such precise rules. Using a randomised approach to select perspectives is especially suboptimal.
Equating noise (the result of diverse perspectives – the subject of your blog) with bias (that adresses errors within a single perspective) is misleading. Both should be adresses. But in different ways.
Thank you Jonathan for putting this out! I really appreciated your succinct writing that made it very easy to follow. Here a few thoughts and reflections:
– A question I had was if the diversity of perspectives can be seen as some sort of nested system: Some overarching perspectives (e.g. linear vs nonlinear) and then sub-perspectives (systems thinking vs ecology in the “non-linear” paradigm or meta-perspective).
– You speak of perspective, possibilities, candidates and points of view, expert domains, domains of knowledge. I was not sure when these terms are used interchangeably and when they actually meant something different.
– I really appreciated the bounded rationality argument. A thought / question I had there also was if the “wisdom of the crowds” approach refers to different social phenomena. At least I mainly know it from experiments where people had to guess the number of marbles in a jar (or something similar). Yet when the social phenomena at hand is complex (not complicated), maybe the wisdom included has to be built on expertise to qualify information.
Thanks again and curios to explore these points further.