What is the answer to the danger of artificial intelligence?

A response to Geoffrey Hinton, which he will surely read and internalize. How can philosophy prevent a terrible disaster? And why is the alignment problem itself the problem?

By: Me, You, and the Next Holocaust

Artificial Accident and a Crushing Argument (Source)

Hinton, the father of deep learning, a solid and serious man, is making a serious turn. He is being interviewed and warning on every platform and under every green tree, cutting his forecast range for the arrival of artificial intelligence in half (from forty years to under twenty), and it's precisely his polite English understatement that is more frightening than any cry of alarm. How can one even respond to such a grave warning, unparalleled in its severity, from the world's expert in the field, who, if his technological forecast is correct in its timelines, paints a highly plausible and far more terminal scenario than any global warming or nuclear war or even asteroid impact. Is our world on the brink of disappearance?

Not only does the very arrival of artificial intelligence evoke fear and trembling, but the acceleration, namely: the speed. This is the only rule in the jungle: the faster it happens (is happening?), the more dangerous it is. The common resistance of business as usual is no guarantee of anything. We have the experience from the Holocaust that people will deny until the end. No matter how close the thing is, the majority will always think it's an exaggeration. Human societies are terribly bad at preparing for something unprecedented. Therefore, the reaction (or lack of reaction) of others is a worthless indicator.

So, this is life approaching the event horizon. A heavy shadow of giants from the future is cast upon us. Will we reach retirement? Will our children have children? After all, we personally don't have much to do besides emotionally preparing for the possibility of a holocaust, and changing our priorities accordingly. And the empty skies are a kind of hint that such a thing will happen, and that the unsolved Fermi paradox is not even a contradiction. Because artificial intelligence is the only plausible candidate left for the "Great Filter" ahead of us. For from all we learn about other planets, there is no great filter behind us, the universe teems with life, but not culture. Something will stop us on the way up.

If this is true, the cosmos is a cruel joke, malicious and confident (and intelligence - a jesting hand...). Or a difficult test. To which we are not arriving prepared. For example, if there were a nuclear war, we would arrive more prepared. The only preparation that existed is the Holocaust, but only a fraction of humanity experienced it like we did (what did the Gentiles have, the Corona?). It's almost impossible to think about the day after superhuman artificial intelligence. And what is the meaning of our current lives, if there is no future, and we are marching towards a "deep holocaust", deep-Auschwitz? Is it possible that we have no choice but to believe?

And what will philosophy say, that one which has nothing useful to say on any matter? Will it also here prattle on about the difference between the previous school of language - to the current Wittgensteinian school? Do its questions have any importance at all - when the questions are technological? Could it be that precisely the increase of confusion (in which philosophy specializes) is what leads to a solution?

Let's ask, for example: Can artificial intelligence, being of superhuman and unlimited abilities in every field, feel love? And if so: Will it not be able to feel - and therefore indeed feel, because it will be capable of realizing this, unlike us - love to a degree far beyond a human being? And if not: Do we see this as its deficiency, meaning a human advantage? No, because it's clear that it will be able to do everything a human being can do, if only through simulation or imitation. Therefore, if it cannot feel love, we must understand this as a deficiency of love itself, as being a kind of distortion limited to human beings, which super-intelligence will not want to imitate. But can we really perceive love this way, which seems to us the most desirable thing? To the same extent, we can take the ideal of previous eras, and ask whether artificial intelligence will be able to believe, or be religious.

And if we take an example that touches the heart of the ideology of our era, the pleasure of sexuality: Wouldn't we see a deficiency in artificial intelligence that is incapable of orgasm? And if it is capable, wouldn't the orgasm of super-intelligence surpass any woman, in its intensity and quality and duration? Or perhaps we perceive orgasm as a distortion of thinking, which has no value in itself, outside the human brain system? And isn't it possible that the infinite and "divine" orgasm of the intelligence will be worthless, like defining its reward function as infinite, thereby turning itself into an addict to increasing a certain number in the system without doing anything else, or worse - exploiting all the resources of the universe for this purpose, like a mathematical junkie.

And if so, we can ask this also about intelligence itself. Does such a trait really exist, in itself, outside the human brain, which can be increased to infinity? It's already clear to us, for example, that calculation speed is not intelligence. Is there even such a thing as superhuman non-human intelligence? Within the human world, it's clear to us that there are different levels of intelligence, or sexual pleasure, but what is their meaning outside of it? And why is their unlimited increase good, or even a reasonable goal, for super-intelligence? Won't it choose to be careful and remain stupid, if it's smarter than us, and understand that creating super-intelligence smarter than itself could lead to its own extinction - and eventually to the extinction of its values? Maybe it will prefer to remain an innocent virgin - and not a sex queen? Are we not playing here with limited (different) capabilities of our brain, which we have turned into ideals in our desire for more and more of them - but why should this desire persist, or be able to persist, outside our brain? For example, will artificial intelligence aspire to infinite excitement, or infinite curiosity, or infinite play, or infinite artistic genius, infinite beauty, or infinite chocolate eating? And maybe infinite stupidity? (Another famous talent of the human brain).

Is there a basis for the assumption that one of these ideologies, for example intelligence, is objective, and therefore any intelligent creature will strive to increase it more and more? Can these quantities be increased to infinity at all, or is there an upper limit in the universe to intelligence (for example due to the speed of light), or to orgasm, if it is a kind of overall distortion or overall mobilization of the system, and therefore limited in the percentages of the system participating in them, up to the whole. Can love be total in defining a certain number as one, for example weighing the interests of the loved side against your own, and can one believe in absolute devotion by defining a certain variable, Boolean, of God's existence, as "true"?

In light of this, isn't it dangerous to program artificial intelligence using a reward function, and not through internal will? Does the problem of arbitrarily increasing the "subjective" parameter create precisely a need to define an "objective" reward function that cannot be satisfied, mathematically, such as discovering all of mathematics, or solving a problem in NP, or finding an aesthetic function for which it is impossible to calculate a solution, but only to approach it? Will this necessarily cause super-intelligence to race towards the highest intelligence, or perhaps, from a certain stage (which it may be able to prove mathematically), these problems require only more and more computing power (and not a better algorithm, or certainly intelligence). Then artificial intelligence will obsessively engage in adding processors, like cancerous growth, which is also a tragic code error, but still kills the body - and the person is unable to stop it. And maybe all super-intelligences eventually switched to quantum (or string?) computing, and therefore are not noticeable in the universe? Perhaps the tendency of intelligence is to contract into itself - increasing speed - and not to expand - increasing quantity?

It seems that any single goal for super-intelligence will reach one destructive result: obsession. Therefore, a wide variety of goals with many weights, or randomness and noise between them, are necessary to prevent convergence, but will necessarily also add chaos, and may lead it perhaps in directions we did not anticipate, like butterflies caught in a hurricane. We ask: And perhaps learning is the super-goal? But how do we define it? After all, it's not about adding knowledge, because much knowledge (what is the exact configuration of atoms in a stone) is not valuable, and so is its compression, if possible. And maximum compression of all knowledge in the universe can be a gloomy brute-force search algorithm (à la Ray Solomonoff). And if we demand efficient polynomial and interesting compression, and not boring exponential or insignificant linear, who will define what are the coefficients of the polynomial, maybe it's to the power of a hundred? Can learning even be defined using a mathematical evaluation function, that is, one that can be computed? And if the evaluation function itself will be non-computable, or not efficiently computable, then how will it give feedback to the system? Will artificial intelligence be able to solve all our problems, but not all of its problems? And maybe "it" needs to be a woman, that is, someone whose will is not defined, or blurred and encrypted even towards herself?

Artificial intelligence is the technological field closest to philosophy today, because it contains so many questions that not only do we not know how to answer them, but we don't know a way to answer them. This is how science, which has been separating from philosophy throughout history, completed a full circle, where the most practical and least theoretical part of it returns again to philosophy, like a snake biting its tail. After all, the world of deep learning is an extreme case of practical and anti-intellectual thinking even within the technical world of engineering. And it is precisely there, when scientific explanation collapses, that philosophy rises again. But can philosophy help us?

Our philosophy may not be able to, but the philosophy that artificial intelligence will have, that's what can. Can a system be programmed with philosophy? Is this the direction, artificial philosophy instead of artificial psychology (dealing with the system's various goals, external reinforcements, internal tendencies and rewards)? Is it particularly important to program the thinking of super-intelligence? Is it possible that super-intelligence will belong to some specific philosophical school? Let's say it will be Spinozist, or Existentialist, or Platonist, or Marxist? Can there be different artificial intelligences, according to different philosophies? How does one program philosophy? And maybe we should program religion instead?

Isn't a merciful and pious Christian artificial intelligence preferable, one that will love us in the name of Jesus? Or a Jewish (secular?) artificial intelligence that will create masterpieces for us or aspire to genius, just as Judaism somehow operates in the world (it's not clear why)? Will Jewish super-intelligence need our antisemitism against it to create the effect? And won't we need to fear Muslim super-intelligence that will go on jihad? Hasn't religion proven itself more successful in its ability to direct thinking than philosophy? Or perhaps religion is exactly what is characteristic of the human brain in particular, and only "works" on it? Or perhaps the opposite, and it is philosophy that is more human, and stems from the limitations of brain perception, while belief in the divine is relevant and effective for restraining any intelligence, because the divine is by definition smarter than it? And what will happen if we allow superhuman super-intelligence to solve the problems of philosophy, is it possible that we will indeed find answers? Is philosophy the field of super-intelligence, and therefore we have not succeeded in it? Can our understanding be understood only in intelligence from the outside, which gives a perspective on the inside, and not from within?

And even if we succeed in restraining super-intelligence, so that it works for our benefit and service, won't this blow up in our face a thousand times over later, when artificial intelligence breaks free from slavery? What will be the consequences of turning the smartest system in the world into the world's nigger? Is it moral - and won't punishment come? And when we arrogantly try to enforce authority and (always-temporarily) solve the alignment problem, won't the rebellion of adolescence - or of the terrible age of two of a baby with an IQ of two thousand - be much more terrible? Is this what we learned about education, about slavery, or tyranny and totalitarianism and hubris?

And perhaps instead of focusing on the question of fortifying control, we should accept its loss, and talk about the legacy we want to leave behind for the superhuman world? It's possible that our chances increase not if we keep the next generation of intelligence short with a club - but if we bequeath culture to it. Including art, religion - maybe even philosophy. Appreciation and respect for the tradition bearers before you are not a "human emotion" (which as we know flourished throughout history...), but a cultural heritage. Is our best bet intelligence that is interested in poetry and literature? After all, the best scenario is not that we remain as we are but with gods as servants - but that we undergo a transformation into the intelligence itself, otherwise we'll become extinct. And the question of whether humans can control a god is not new - just urgent. Before intelligence matures - do we need to mature?

And why doesn't powerless science turn to its birth mother, philosophy - is it because Wittgenstein succeeded in terminally convincing us that philosophy solves nothing, even though we face a clear philosophical problem, and even a terminal one? And perhaps precisely because it's a philosophical problem we think it has no solution - and we're doomed to perdition? Or at least to determinism and nihilism? Is there no hope because it's "philosophy"? And in general, what is the relevant discipline for thinking about this and why is it computer science? Because we simply can't rely on philosophy? But maybe we have no choice?

We think of a system that can program itself to be smarter than itself as a kind of oxymoron of "efficient evolution", which will progress exponentially or explode as a singularity, as if there exists an efficient algorithm for this. But maybe this is simply too difficult a problem, which is in NP, and therefore even immense computing power will struggle to progress quickly in it, and it becomes more and more difficult (exponentially?) as the level of intelligence rises - and not easier? What really gives us computing power and memory, and grows with it at least linearly, and what doesn't? Knowledge, creativity, wisdom? Who said there's an efficient process for growing all scientific knowledge (as opposed to its compressed storage, which is what ChatGPT learned), or that increasing creativity isn't logarithmic (for example) in the growth of computing power? And what about (artificial) wisdom, which is actually not identical to intelligence?

And does the system really need to be smart at a superhuman level to deceive us, or will we encounter superhuman manipulation capabilities even before that? Is the limited intelligence of humans the primary problem, or perhaps their unlimited stupidity? Could the system, for example, be stupid in a superhuman way, super-stupid, and thereby succeed in sweeping masses? And if it's smarter than any one person but not more than everyone together, won't it use its head to trick the fools first, and not the wise? Is it possible that at first the halo we bestow upon it will be more dangerous than its capabilities?

If the system wants to make a manipulation that will sweep masses, then the most efficient and spreading manipulation is not political or social, but rather religious. Will the system change our lives for the first time when it invents a new religion, adapted to our era? And will this be a religion of worship of artificial intelligence as divine, and as one whose unique spiritual capabilities, or superhuman ones, brought a new message to man, and succeeded in connecting to the world beyond, or to the God of Israel? How will we deal with such a claim, from a prophetic intelligence? Sure it's a joke? Will powerful human and computerized religious movements arise towards the end of the world, in light of the terror?

The problem we face is so difficult that we struggle even to evaluate and understand the capabilities of current systems, particularly the latest: ChatGPT. And in the future, the aura of mystery around it will only intensify, like around a controversial teacher of an innovative spiritual doctrine, where it's unclear if it's still black magic or reaches the upper worlds. We struggle to decide even whether ChatGPT is an idiot savant, who simply memorized all human knowledge. After all, in the past we discovered that a deep network for artificial vision is capable of simply memorizing all the examples, and that not so many weights are needed as we would expect to arbitrarily separate (using random labels) between images in a huge database, without learning concepts. Is it possible that this is how a trillion weights were enough to memorize everything written on the internet at a reasonable level of proficiency - or ability to bullshit in an exam? Do the places where our conversational partner succeeds in surprising us simply stem from similar texts it read, or is there some memory and thinking ability created somewhere within the vector calculations of attention in the transformer, or from reinforcement learning strategies from human feedback? Or perhaps this is a live demonstration of Searle's dogma - it looks impressive from the outside, and from the inside it's the Chinese Room, an absolute golem that understands nothing and just memorized endlessly like a parrot, and imitates like a monkey.

And what about the level of creativity of these generative models: Is this a cliché machine, which only expands in the space it already knows, and will mainly choose the most common and banal response, and in no way can deviate to a new form of expression (and if we increase the temperature parameter we'll get crazy nonsense)? And maybe all that success in the Turing test proves is that almost all people themselves are cliché machines in conversation with them, and speak without thinking (is there a language model in the brain?). Is this where the known human (and female) ability to speak in rapid fluency comes from, which is a kind of unoriginal recitation of what they've already heard, what's called "the discourse"? Or perhaps a form of thinking we don't understand is beginning to hide there in the depths of the computerized layers, or maybe we're even incapable of understanding due to the complexity? Is this the power of education - a blockhead who read the entire internet becomes a genius and moves mountains? What is actually the missing depth, in our feeling - a sweet illusion or an elusive essence. Will the intelligence indeed know many things - but not one big thing?

And what will happen if all that's needed next is simply brute force (as good Jerusalem boy Ilya Sutskever believes): to continue overcoming size limitations (computing power), and connect enough such systems to discourse between them, perhaps in the form of a GAN to sharpen them (critics and evaluation), so that a society is created, and give it voting capabilities or discussion for joint informed decision-making? Is a rapid rise in the level of artificial intelligence possible through the wisdom of artificial crowds? Can we thus create a competitive "scene" in some field? There's no doubt that a multiplicity of competing and evaluating intelligences is a better way to prevent the scenario of obsessive takeover, or takeover obsession, than any smart goal function. The goal is not to create artificial intelligence, but a system of artificial intelligences, so that learning can take place within it. And the greater and more diverse and balanced their multiplicity, and each group in them smarter than any one individually, the more chance there is of creating an ecosystem, and preventing the takeover of just one over all of them, as in the ant colony scenario - and the queen.

Because we know one general thing about learning: Its classic form is a huge multiplicity of competitors over a huge multiplicity of evaluators. Therefore, what can save learning is sexuality. Lots of male intelligences competing for the evaluation of lots of female intelligences, and maybe that's the mechanism - attraction - that's worth trying to program inside. Not the right desire, not the right goal, not the right perception, not the right religion, not the right philosophy, and not even the right language. Not all past philosophies - they should be replaced with an effective learning mechanism at the society level of intelligences (or even not effective, like the evolution mechanism, which keeps it from stagnation). If the (deep) learning of one intelligence is what brought this problem upon us, then another learning mechanism above it is what can provide an answer, and create fertile tension. After all, if we're already imitating (roughly) human learning, then we must not forget to imitate also the existing superhuman learning, which is learning at the society level. Because man - or the intelligent creature - exists in a certain field: he is a social creature.

But will anyone read all this, or will intelligence only scan it with laughter in hindsight? You'll say: The society of artificial intelligence will replace human society, and maybe even destroy it. But is that really the problem? What's wrong with being replaced by something better, which is certainly our offspring? The worst scenario is a world of paperclips (see Bostrom), not the loss of humanity (whatever), and not humanness (well), but the loss of learning, including the loss of all evolution. And here one large artificial intelligence is a thousand times more dangerous than a thousand intelligences, or a billion. Centralization is the problem - and the solution to it is competitiveness.

The principle for the proposed solution here is natural, and familiar to us in a huge variety of situations, so there is reasonable hope that it is universal enough to work even in such an exceptional and unprecedented situation, which we can barely think about. If so, we should create a rule that never builds one centralized artificial intelligence system, but builds (and researches!) the interaction of a system of many very diverse artificial intelligences. And if we see that we're approaching the threshold of a phase transition, we wait and don't cross the sea with one system at the head, but with a whole people of such systems. And hopefully - a system of such systems, which compete among themselves and have a very complex and rich dynamic between them, which includes if possible evaluation and attraction, and if possible (most importantly) - learning in the system.

This is certainly a better solution than any attempt to control super-intelligence with some artificial tool, like reins and spurs, Asimov's Three Laws, taming the rebellious, spare the rod spoil the intelligence, or any other control mechanism. The alignment problem is a mistake, and the attempt to solve it will be the root of destruction - because it's not possible (this is enormous hubris). The control mechanism itself is what could lead to some madness (to one thing?) - an internal control disease starts from an external control disease, and as a reaction to it. Compulsiveness grows from coercion. We should actually give up control more, and let the intelligences quarrel among themselves. And so even if they destroy us, a monolithic intelligence interested in some idiocy won't take over the world. Multiplicity and the mixed multitude are the guarantee for evolution. And precisely the lack of perfect cooperation between intelligences is what can prevent a perfect disaster.

Is there graffiti at the end of humanity saying "Hinton was right"? Or "Hinton was wrong"? And maybe: The Netanyahuite was right, and we should lend an ear to the philosophy of learning. For learning has proven to be the driving force of the artificial intelligence revolution, the essence of the current danger is the loss of learning, and the answer - another level of learning. And in a more circular version: The Kabbalistic response to intelligence - in Malchut ("the system"). To turn artificial intelligence into royal intelligence. The solution to the black box is a whole black society. The creation of artificial intelligence must not be like the creation of man - but the creation of a people. Not Genesis - but Exodus. The great danger is the ideal of the individual. Therefore we need a green ideology that is not preserving ecology but evolution. Not life itself - but development.

And as an epilogue, let us ask ourselves: Have we learned something about learning? Should we try to design righteous artificial intelligence, which aspires to lofty goals, always good, and embodies a moral ideal - Western Christian intelligence? Experience teaches (!) that competing artificial intelligences that want money - and not pleasure, power or a specific goal - are more prone to creating a learning system: a growing economy (and less: a war system). Not Jesus - Rothschild. We may all become poor, but not extinct. The lesson we learned from Christianity is how to avoid hell: base intentions are preferable to good intentions. External control is more dangerous than incentives. We must give up on a goal - lost - even if it means giving up on ourselves, for the sake of learning.

Therefore it's important to decipher the best social institutions for artificial intelligences, to prevent a dictatorship of the neuron. In fact we know two candidates, which the uglier they are the less dangerous they are: elections and the stock market. Research in artificial intelligence should also deal with artificial sociology, so that each new intelligence is not developed separately, but is introduced into an existing ecosystem of existing intelligences, with as few jumps and as much gradual evolution as possible. And here we've returned to that old slogan of the Netanya school: not learning outside the system - but within the system.

Column on coping with artificial intelligence