Monday, July 26, 2021

Reducing Noise and Improving Decision Making

 

Cover page image: https://www.littlebrownspark.com/

Noise: A Flaw in Human Judgment. By Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein

Kahneman, Sibony, and Sunstein have written a book that is both valuable and frustrating.

This book presents multiple ideas related to human judgment and decision making: (1) a review of studies that have described the variability in judgments in many domains, (2) approaches for reducing that variability, (3) an approach for making decisions when there are multiple factors that should be considered, and (4) an appeal for better procedures in the legal system.  According to the authors, the book offers an understanding of the psychological foundations of disparities in judgments, which are here classified as noise and bias.

The book’s strengths include its review of the literature on the variability of judgments and its distinction between noise and bias.  It presents examples of judgments from many domains (including medicine, business, and the legal system).  It strongly supports systematic decision-making processes (a topic that is important to me) and emphasizes the importance of accurate judgments.  It acknowledges the difficulties of reducing noise.  Finally, its notes provide references to original studies that provide context and details to the book’s discussion.

The book describes a range of best practices for judgment and decision making: employing persons who are better at making judgments, aggregating multiple judgments, using judgment guidelines, using a shared scale grounded in an outside view, and structuring complex decisions.  The first three items are meant to reduce judgment errors due to noise and bias.  The last item is a practical multi-criteria decision-making process that (a) decomposes the decision into a set of assessments, (b) collects information about each assessment independently, and (c) presents this evidence to the decision-maker(s), who may use intuition to synthesize this information and select an alternative.  In an appendix, the authors use their recommendations, for the book provides a checklist (a guideline) for evaluating a decision-making process.

When discussing ratings, the book wisely recommends that “performance rating scales must be anchored on descriptors that are sufficiently specific to be interpreted consistently.”  Scales with undefined terms such as “poor” and “good” and “excellent” should be discarded unless they are well-understood in the group of persons who are using them as a common language.

Unfortunately, two weaknesses (one minor, one major) frustrated me.  The first is the fact that the text contains no superscripts, citations, or other marks that indicate the notes that are available in the Notes section at the end of the book.  In the Notes section, each note has only a page number and a brief quote that suggest the text to which the note applies.  This unreasonable scheme reduces the value of the notes' many citations and explanations by making them harder to find and use.

The second weakness is more significant.  The book does not distinguish between judgment and decision making.  It tends to treat them as the same thing.  Indeed, a note explains that the authors “regard decisions as a special case of judgment” (page 403). 

For example, the book discusses the judgments that an insurance company’s employees make.  One example is a claims adjuster’s estimate of the cost of a future claim, which is indeed a judgment.  The other example is the premium that an underwriter quotes, which is a decision, not a mere judgment.  It is based on numerous judgments, of course, but the underwriter chooses the premium amount.  The book states that making a judgment is similar to measuring something, which is appropriate, but then goes on to say that the premium in the underwriter’s quote is also a judgment, which is not appropriate, because it is the result of a decision, not a measurement.

Elsewhere, the book claims that “an evaluative judgment determines the choice of an acceptable safety margin” for an elevator design (page 67).  This is inappropriate, however, for choosing the safety margin is a decision, not a measurement.

The book states that the process of judgment involves considering the given information, engaging in computation, consulting one’s intuition, and generating a judgment; that is, judgment is “an operation that assigns a value on a scale to a subjective impression (or to an aspect of an impression)” (page 176).  This is not the same as decision making.  Decision making is a more comprehensive process that defines relevant objectives, identifies (or develops) alternatives, evaluates the alternatives, selects one, and implements it.  In this process, judgment is an activity that may be used to evaluate the alternatives.  The book provides a relevant example that shows the distinction: job candidates get ratings, but only one gets hired.  The ratings are judgments, but choosing and hiring someone is a decision.

Those seeking to improve decision making in their organizations will find many useful suggestions in this book, but they should keep in mind that decision making is a process, not a judgment.

 

Monday, November 23, 2020

Tuning coronavirus exposure warnings

 In The Washington Post, Geoffrey A. Fowler described a smartphone app that alerts a user when the user has been in close contact with someone who reports a positive COVID-19 test result.  The app uses Bluetooth technology to track which other smartphones one has been near (but not the physical locations of the contacts).

 Fowler's article also mentioned the problem of false positives, a typical problem in the design of any warning system.  The system is meant to alert a user if they have had a close contact with someone who tests positive.  To do that, the app must define a "close contact" based on the signal strength (stronger = closer) and the duration of the contact.  If these parameters are set too "low" so that weak signals and short encounters are considered "close contacts," then a user may get too many alerts (false positives); on the other hand, if the parameters are set too "high," then a user may not receive an alert about a risky encounter (a false negative).  

The article quoted Jenny Wanger of the Linux Foundation Public Health, who said, "We are working as a community to optimize it and to figure out how to get those settings to be in the right place so that we do balance the risk of false positives with the getting notifications out to people who are at risk."

Note that an alert here is not a positive test result for the user, only a warning that one was near someone with a positive test result and thus may have been exposed.  The costs of false positives and false negatives are subjective, of course.  At this point in the pandemic, a false positive, which may cause a user to quarantine or limit his activities unnecessarily, may be more costly that a false negative for someone who is taking precautions while doing typical activities and is likely having many brief, low-risk encounters.  This type of user may prefer to know only about really close contacts that have a higher risk of transmission.

The opposite may be true for someone who is significantly at risk for becoming seriously ill and has very few contacts in a typical day.  Then, a more sensitive (but less specific) system may be more appropriate.

Thus, it would be useful for users to have the ability to set the warning threshold based on their risk preferences, similar to the way financial advisors ask investors about their risk tolerance.



Tuesday, May 26, 2020

Decision making in the Ideation Toolkit

The Ideation Toolkit (from Keen Engineering Unleashed) is a collection of information that engineering students can use as they develop an idea for a product.

Although this toolkit includes the Analytic Hierarchy Process (AHP), I suggest that educators use multi-attribute utility theory (MAUT) instead of AHP.  I have been teaching engineering decision making for many years, and, although my textbook includes both AHP and MAUT, I have found that engineering students find MAUT easier to adopt and use correctly.  Using AHP in a rational way is more difficult than it looks, whereas MAUT is more straightforward for making decisions when the alternatives have multiple criteria that need to be considered (e.g., cost, strength, durability, etc.).

Wednesday, May 6, 2020

Scary Spider Chart

Spider chart.  Creator: redacted.
The spider chart shown in this post came from a dissertation that studied multi-agent systems. A spider chart is also known as a radar chart, and writers use it to graph multivariate data.  In a typical example, each alternative has a polygon that connects the points that represent its performance on multiple measures, with one point on one spoke for each measure.  It can be effective to show how one alternative performs well on one or more measures and another alternative performs well on another set of measures.  In that case, the polygons for these alternatives will have different shapes.
Conversely, the polygons for two alternatives will be very similar if the performance of those alternatives is similar across the different measures.

Unfortunately, the spider chart shown here has reversed the typical use.  There are two alternatives (adaptive risk and fixed risk) that were tested under three scenarios (low, medium, and high workload).  In this chart, there are six spokes, one for each combination; a typical chart would have six polygons.  Instead, there are three polygons, one for each measure: profit, completed percentage, and failed percentage.  The last two measures always add to 100%, so one of them is redundant.

Thus, there are only twelve useful data points (two measures for six combinations). The data-to-ink ratio is very low.  Given the small amount of data, a simple table may be the best way to convey this information.

The purpose of this spider chart is to show how the two alternatives compare on the performance measures, but this chart makes that comparison very difficult. A reader normally relies upon the slope of a curve (either positive or negative) to determine how performance is changing, but that will not work here because the different scenarios have different orientations and one performance measure is the complement of the other. 

Because there are effectively two performance measures, a two-dimensional scatter plot (with appropriate labels) would have been appropriate.  The second chart (which I created using the same data) is a possibility; this makes the change from fixed risk to adaptive risk more clear, but it still has a low data-to-ink ratio.
Two-dimensional scatter plot.  Creator: Jeffrey W. Herrmann.



Saturday, April 4, 2020

Which face mask to make?

During the coronavirus pandemic, face masks have become an important tool for preventing transmission of the virus and protecting health-care workers, but commercial face mask supplies are low. Members of the public are mobilizing to make face masks, but there are many different designs and options that have different purposes. I've created a web page to help one determine which face mask to make.

(Image credit: UnityPoint Health, Cedar Rapids, Iowa.)

Wednesday, November 27, 2019

The Risk of Space Junk

Image credit: NASA


The November 2019 issue of Prism (published by ASEE) includes an article by Thomas Grose about the risks associated with space junk in low earth orbit (LEO).  The likelihood of a collision continues to increase: 4000 satellites and the International Space Station (ISS) travel in LEO, which has 128 million bits of junk; 20,000 pieces have a diameter greater than 10 cm.  The consequences of a collision could be more debris, a damaged satellite (which could interrupt communication or other services), or a casualty on the ISS.   A possible worst case is the Kessler syndrome (a chain reaction of debris-generating collisions).

The current mitigation efforts include rules to reduce the growth of space junk and a system for detecting possible collisions (so that spacecraft can be moved out of the way).  NASA has a technical standard, "Process for Limiting Orbital Debris," that requires space programs to assess and limit the likelihood of generating debris during normal operations, explosions, intentional breakups, and collisions.  "Orbital debris" is defined as follows:
Artificial objects, including derelict spacecraft and spent launch vehicle orbital stages, left in orbit which no longer serve a useful purpose. In this document, only debris of diameter 1 mm and larger is considered.
The NASA standard also discusses reducing the likelihood of collision by reducing a spacecraft's cross-sectional area.

New systems for tracking more space junk more precisely (e.g., the Space Fence) could lead to an automated "traffic control" system that warns a spacecraft operator when a collision is imminent while reducing the likelihood of false alarms.  An alarm is costly because it disrupts normal operations, and the spacecraft must burn fuel to move away from the space junk and then return to its normal position.

Researchers are also developing spacecraft that can capture space junk, which could reduce the likelihood of a collision.

The article mentions few efforts to reduce the consequences of a collision.  Astronauts in the ISS can head to a shelter if a close call is imminent.  But hardening a satellite would require more mass, which makes it more expensive.  Perhaps we need "shatterproof" materials or designs for spacecraft.

Tuesday, October 15, 2019

Robust Multiple-Vehicle Route Planning

Planning problems with multiple vehicles occur in many settings, including the well-known vehicle routing problem (VRP) and drone delivery operations, including the flying sidekick problem.   Most approaches to these problems assume that the vehicles are reliable and will not fail.

In the real world, however, a vehicle could fail, in which case the other vehicles would have to change their routes to cover the locations that the failed vehicle did not visit.  In recent research done at the University of Maryland, Ruchir Patel studied this challenge and developed approaches for generating robust routes for multiple-vehicle operations.  His thesis, Multi-Vehicle Route Planning for Centralized and Decentralized Systems, describes the results of his research.  The key idea is to consider the worst-case performance of a solution instead of the best-case performance.  A solution with better worst-case performance is more robust and will perform well even if a vehicle fails.

He found that a genetic algorithm (GA) could find high quality solutions, but the computational effort was substantial because evaluating the robustness of a solution required evaluating all possible failure scenarios and generating a recovery plan for each one.  His approach used Prim's Algorithm to generate a minimum spanning tree and construct a recovery plan quickly.  Although the computational effort may be acceptable for pre-mission planning, faster approaches could be useful for real-time, online planning.