Human compatible book review
Highly recommended, 5/5.
One sentence review
Outstanding and original description of artificial intelligence (AI) in the 21st Century, the problems it could pose, and a lucid proposal for ‘provably beneficial AI.’
I recently completed the book Human Compatible by Professor of Computer Science at Berkeley, Stuart Russell (Russell 2019). The first book on the subject I have read, it greatly enhanced my understanding of a complex subject and made me feel compelled to write a brief review, for the benefit of myself (I’m still consolidating my understanding of the books key ideas and find that writing them down helps) and others.
I first encountered Stuart Russell and his ideas in a podcast for The Guardian’s Science weekly in October 2019, which is still available and well worth a listen. As someone who has spent much time thinking about, using and to some extent developing technologies to support a small but influential area of human activity (Transport Planning), I am acutely aware of the potential for technologies to have really world impacts. In my case, the technology is fairly narrow and, as far as I am aware, can only be used for the definitively benign activity of cycle network planning (Lovelace et al. 2017).
But imagine you were developing a much more ambitious and generally applicable technology with the explicit objective of becoming more intelligent than humans in increasing range of areas: to develop general computer intelligence. That is the starting point of Stuart’s book, the history of the academic field of AI and the astounding progress AI researchers have made in the last couple of years.
Stuart delves into the issues that AI researchers have themselves been grappling with, starting with ‘father of AI’ Alan Turing:
Even before the birth of AI in 1956, august intellectuals were harrumphing and saying that intelligent machines were impossible. Alan Turing devoted much of his seminal 1950 paper, “Computing Machinery and Intelligence,” to refuting these arguments. Ever since, the AI community has been fending off similar claims of impossibility.
The point is that the great Alan Turing and AI researchers since then have had to debunk simplistic arguments to the effect that ‘AI is impossible,’ from a range of perspectives. Paradoxically, it seems that as AI researchers have actually got closer to achieving general intelligence their willingness to discuss the wider implications have reduced (or at least not increased as one would have expected). Surprisingly, some AI researchers have even began to deny the possibility of succeeding in own area. This is despite rapid growth of the range of tasks at which machines outperform humans, from playing games such as Go to commercially valuable services such as (in some contexts) human language translation and medical diagnosis of diseases.
A striking example of this denial is from Stanford University’s prominent AI100 report, which states simply that
unlike in the movies, there is no race of superhuman robots on the horizon or probably even possible.
This is astonishing because of the context: much of the rest of the report is focussed on the impacts and benefits of AI, and progress made in the field. Russell sees this statement as reflective of a wider problem: unwillingness to engage in possible negative consequences if AI systems get out of control:
To my knowledge, this is the first time that serious AI researchers have publicly espoused the view that human-level or superhuman AI is impossible—and this in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached. It’s as if a group of leading cancer biologists announced that they had been fooling us all along: they’ve always known that there will never be a cure for cancer.
Prof Russell uses the book as an opportunity to call for more open-minded debate among the AI community of which he is a part and in wider society. Of course the future is uncertain, meaning we should explore a range of possible futures and act to minimise the probability of highly undesirable outcomes. In various scenarios of the future sketched out concisely (and somehow entertainingly) in the book, superintelligent AI leads to the demise of humanity.
This outcome is thankfully not guaranteed but if it at least may be possible surely it is worthy of consideration? Another common objection to debate of or research into AI risks is that the threat is far in the future so we don’t need to worry. Russell uses the metaphor of asteroid collision to debunk this argument:
For example, if we were to detect a large asteroid on course to collide with Earth in 2069, would we say it’s too soon to worry? Quite the opposite! There would be a worldwide emergency project to develop the means to counter the threat. We wouldn’t wait until 2068 to start working on a solution, because we can’t say in advance how much time is needed. Indeed, NASA’s Planetary Defense project is already working on possible solutions
The good news is that there are solutions, the subject of Chapter 7 on AI: A different approach in which Russell sets out principles for beneficial AI:
[beneficial machines are those] whose actions can be expected to achieve our objectives rather than their objectives. … The resulting approach should lead eventually to machines that present no threat to us, no matter how intelligent they are.
The approach is based on three broad principles:
- The machine’s only objective is to maximize the realization of human preferences.
- The machine is initially uncertain about what those preferences are.
- The ultimate source of information about human preferences is human behavior.
More good news: recent progress in AI research and processing uncertainty help make these things possible. The result of implementing these principles could be machines that regularly check-in to find out if what they think is wanted is actually what the human wants.
To me, the approach also has another advantage it leads to humility. Even the most intelligent people change what they want from time-to-time, and it makes sense to imbue machines with this humility. Instead of telling machines “do this at any expense,” it makes sense to say something more like “try to do this, but precede with caution and check in if you have any doubts.” The resulting two-way human-machine communication could lead to more evidence-based and informed decisions.
Judging by the 169 reviews of the book on Amazon UK, many of them glowing, at the time of writing, it seems that people are taking notice of Russell’s ideas. A Brit in the USA, Russell, he has an audience over the Atlantic also (although perhaps not quite as large or vocal an audience, with 164 reviews on the American amazon.com). As an informed data scientist who has seen the benefits of digital technology I am open to the possibility that there could be unintended consequences of my work.
I sincerely hope that AI researchers are equally open to critique: even if Russell’s principles get implemented to the letter, there are still risks ranging from rogue researchers to superintelligent AIs persuading us to unleash it from our control. But if debate/research/investment follows the lines suggests in Stuart Russell’s lucid and entertaining Human compatible, the probability of catastrophic AI malfunction will decrease, raising the hope of using our ever more sophisticated technologies to free us rather than enslave us.
If we succeed in creating provably beneficial AI systems, we would eliminate the risk that we might lose control over superintelligent machines. Humanity could proceed with their development and reap the almost unimaginable benefits that would flow from the ability to wield far greater intelligence in advancing our civilization. We would be released from millennia of servitude as agricultural, industrial, and clerical robots and we would be free to make the best of life’s potential.