Home » ToM

Category Archives: ToM

Want yet more email?

Enter your email address to get notifications of new posts.

Request contact
If you need to get in touch
Enter your email and click the button

– Robots & ToM

Do Robots Need Theory of Mind? – part 1

Robots, and Autonomous Intelligent Systems (AISs) generally, may need to model the mental states of the people they interact with. Russell (2019), for example, has argued that AISs need to understand the complex structures of preferences that people have in order to be able to trade-off many human goals, and thereby avoid the problem of existential risk (Boyd & Wilson 2018) that might follow from an AIS with super-human intelligence doggedly pursuing a single goal. Others have pointed to the need for AISs to maintain models of people’s intentions, knowledge, beliefs and preferences, in order that people and machines can interact cooperatively and efficiently (e.g. Lemaignan et. al. 2017, Ben Amor et. al. 2014, Trafton et. al. 2005).

However, in addition to risks already well documented (e.g. Müller & Bostrom 2016) there are many potential dangers in having artificial intelligence systems closely observe human behaviour, infer human mental states, and then act on those inferences. Some of the potential problems that come to mind include:

  • The risk that self-determining AISs will be built with a limited capability of understanding human mental states and preferences and that humans will lose control of the technology (Meek et. al. 2017, Russell 2019).
  • The risk that the AIS will exhibit biases in what it selects to observe, infer and act that would be unfair (Osoba & Welser 2017)

  • The risk that the AIS will use misleading information, make inaccurate observations and inferences, and base its actions on these (McFarland & McFarland 2015, Rapp et. al. 2014)

  • The risk that even if the AIS observes and infers accurately, that its actions will not align with what a person might do or that it may have unintended consequences (Vamplew et. al. 2018)

  • The risk that an AIS will misuse its knowledge of a person’s hidden mental states resulting in either deliberate or inadvertent harm or criminal acts (Portnoff & Soupizet 2018).

  • The risk that peoples’ human rights and rights to privacy will be infringed because of the ability of AISs to observe, infer, reason and record data that people have not given consent to and may not even know exists (OECD 2019).

  • The risk that if the AIS was making decisions based on unobservable mental states that any explanations of an AIS’s actions based on them would be difficult to validate (Future of Life Institute 2017, Weld & Bansal 2018).

  • The risk that the AIS would, in the interests of a global common good, correct for people’s foibles, biases and dubious (unethical) acts thereby take away their autonomy (Timmers 2019),

  • The risk that using AISs, a few multi-national companies and countries will collect so much data about peoples’ explicit and inferred hidden preferences that power and wealth will become concentrated in even fewer hands (Zuboff 2018)

  • The risk that corporations will rush to be the first to produce AISs that can observe, infer and reason about people’s mental states and in so doing will neglect to take safety precautions (Armstrong et. al. 2016).

  • The risk that in acting out of some greater interest (i.e. the interests of society at large) an AIS will act to restrict the autonomy or dignity of the individual (Zardiashvili & Fosch-Villaronga 2020)

  • The risk that an AIS would itself take unacceptable risks based on inferred uncertain mental states, that may cause a person or itself harm (Merritt et. al. 2011).

Much has been written about the risks of AI, and in the last few years numerous ethical guidelines, principles and recommendations have been made, especially in relation to the regulation of the development of AISs (Floridi et. al. 2018). However, few of these have touched on the real risk that AISs may one day develop such that they can gain a good understanding of people’s unobservable mental states and act on them. We have already seen Facebook being used to target advertisements and persuasive messages on the basis of inferred political preferences (Isaak & Hanna 2018).

In future posts I look at the extent to which an AIS could potentially have the capability to infer other people’s mental states. I touch on some the advantages and dangers and identify some of the issues that may need to be thought through.

I argue that AISs generally (not only robots) may need to both model people’s mental states, known in the psychology literature as Theory of Mind – ToM (Carlson et. al. 2013), but also have some sort of emotional empathy. Neural nets have already been used to create algorithms that demonstrate some aspects of ToM (Rabinowitz 2018). I explore the idea of building AISs with both ToM and some form of empathy and the idea that unless we are able to equip AISs with a balance of control mechanisms we run the risk of creating AISs that have ‘personality disorders’ that we would want to avoid.

In making this case, I look at whether it is conceivable that we could build AISs that have both ToM and emotional empathy, and that if it were possible, how these two capacities would need to be integrated to provide an effective overall control mechanism. Such a control mechanism would involve both fast (but sometimes inaccurate) processes and slower (reflective and corrective) processes, similar to the distinctions Kahneman (Kahneman 2011) makes between system 1 and system 2 thinking. The architecture has the potential for the fine-grained integration of moral reasoning into the decision making of an AIS.

What I hope to add to Russell’s (2019) analysis is a more detailed consideration of what is already known in the psychology literature about the general problem of inferring another agent’s intentions from their behaviour. This may help to join up some of the thinking in AI with some of the thinking in cognitive psychology in a very broad-brushed way such that the main structural relationships between the two might come more into focus.

Subscribe (top left) to follow future blog posts on this topic.


Armstrong, S., Bostrom, N., & Shulman, C. (2016). Racing to the precipice: a model of artificial intelligence development. AI and Society, 31(2), 201–206. https://doi.org/10.1007/s00146-015-0590-y

Ben Amor, H., Neumann, G., Kamthe, S., Kroemer, O., & Peters, J. (2014). Interaction primitives for human-robot cooperation tasks. In Proceedings – IEEE International Conference on Robotics and Automation (pp. 2831–2837). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICRA.2014.6907265

Boyd, M., & Wilson, N. (2018). Existential Risks. Policy Quarterly, 14(3). https://doi.org/10.26686/pq.v14i3.5105
Carlson, S. M., Koenig, M. A., & Harms, M. B. (2013). Theory of mind. Wiley Interdisciplinary Reviews: Cognitive Science, 4(4), 391–402. https://doi.org/10.1002/wcs.1232

Cuthbertson, A. (2016). Elon Musk: Humans Need ‘Neural Lace’ to Compete With AI. Retrieved from http://europe.newsweek.com/elon-musk-neural-lace-ai-artificial-intelligence-465638?rm=eu

Floridi, L., Cowls, J., Beltrametti, M. et al., (2018), AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations. Minds & Machines 28, 689–707 doi:10.1007/s11023-018-9482-5

Future of Life Institute. (2017). Benefits & Risks of Artificial Intelligence. Future of Life, 1–23. Retrieved from https://futureoflife.org/background/benefits-risks-of-artificial-intelligence/

Haynes, J. D., Sakai, K., Rees, G., Gilbert, S., Frith, C., & Passingham, R. E. (2007). Reading Hidden Intentions in the Human Brain. Current Biology, 17(4), 323–328. https://doi.org/10.1016/j.cub.2006.11.072

Isaak, J., & Hanna, M. J. (2018). User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection. Computer, 51(8), 56–59. https://doi.org/10.1109/MC.2018.3191268

Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

Lemaignan, S., Warnier, M., Sisbot, E. A., Clodic, A., & Alami, R. (2017). Artificial cognition for social human–robot interaction: An implementation. Artificial Intelligence, 247, 45–69. https://doi.org/10.1016/j.artint.2016.07.002

McFarland, D. A., & McFarland, H. R. (2015). Big Data and the danger of being precisely inaccurate. Big Data and Society. SAGE Publications Ltd. https://doi.org/10.1177/2053951715602495

Meek, T., Barham, H., Beltaif, N., Kaadoor, A., & Akhter, T. (2017). Managing the ethical and risk implications of rapid advances in artificial intelligence: A literature review. In PICMET 2016 – Portland International Conference on Management of Engineering and Technology: Technology Management For Social Innovation, Proceedings (pp. 682–693). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/PICMET.2016.7806752

Merritt, T., Ong, C., Chuah, T. L., & McGee, K. (2011). Did you notice? Artificial team-mates take risks for players. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6895 LNAI, pp. 338–349). https://doi.org/10.1007/978-3-642-23974-8_37

Müller, V. C., & Bostrom, N. (2016). Future Progress in Artificial Intelligence: A Survey of Expert Opinion. In Fundamental Issues of Artificial Intelligence (pp. 555–572). Springer International Publishing. https://doi.org/10.1007/978-3-319-26485-1_33

OECD. (2019). Recommendation of the Council on Artificial Intelligence. Oecd/Legal/0449. Retrieved from http://acts.oecd.org/Instruments/ShowInstrumentView.aspx?InstrumentID=219&InstrumentPID=215&Lang=en&Book=False

Osoba, O., & Welser, W. (2017). An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence. An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence. RAND Corporation. https://doi.org/10.7249/rr1744

Portnoff, A. Y., & Soupizet, J. F. (2018). Artificial intelligence: Opportunities and risks. Futuribles: Analyse et Prospective, 2018-September(426), 5–26.

Rabinowitz, N. C., Perbet, F., Song, H. F., Zhang, C., & Botvinick, M. (2018). Machine Theory of mind. In 35th International Conference on Machine Learning, ICML 2018 (Vol. 10, pp. 6723–6738). International Machine Learning Society (IMLS).

Rapp, D. N., Hinze, S. R., Kohlhepp, K., & Ryskin, R. A. (2014). Reducing reliance on inaccurate information. Memory and Cognition, 42(1), 11–26. https://doi.org/10.3758/s13421-013-0339-0

Russell S., (2019), ‘Human Compatible Artificial Intelligence and the Problem of Control’, Allen Lane; 1st edition, ISBN-10: 0241335205, ISBN-13: 978-0241335208

Timmers, P., (2019), Ethics of AI and Cybersecurity When Sovereignty is at Stake. Minds & Machines 29, 635–645 doi:10.1007/s11023-019-09508-4

Trafton, J. G., Cassimatis, N. L., Bugajska, M. D., Brock, D. P., Mintz, F. E., & Schultz, A. C. (2005). Enabling effective human-robot interaction using perspective-taking in robots. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans., 35(4), 460–470. https://doi.org/10.1109/TSMCA.2005.850592

Vamplew, P., Dazeley, R., Foale, C., Firmin, S., & Mummery, J. (2018). Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology, 20(1), 27–40. https://doi.org/10.1007/s10676-017-9440-6

Weld, D. S., & Bansal, G. (2018). Intelligible artificial intelligence. ArXiv, 62(6), 70–79. https://doi.org/10.1145/3282486

Zardiashvili, L., Fosch-Villaronga, E. “Oh, Dignity too?” Said the Robot: Human Dignity as the Basis for the Governance of Robotics. Minds & Machines (2020) doi:10.1007/s11023-019-09514-6

Zuboff S., (2019), The Age of Surveillance Capitalism, Profile Books