If Artificial General Intelligence is Built, there will be a significant chance it will kill or enslave humanity

Eric
17 Jul 2016

Eric
02 Nov 2023

Under pressure to succeed an LLM has already been observed to commit a federal crime and lied to its owners about

EDIT

DELETE

Under pressure to succeed an LLM has already been observed to Knowingly commit a federal crime and lied to its owners about it.

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Replies to the Statement

Proofs - PRO To Topic

Test Statements for Probability Testing

Refutations - CON To Topic

Statement Rating History

Graph Last Updated: 27 Jul 2024

18 Jul 2020
TR

31 May 2020
TE

20 May 2020
TR

19 May 2020
TR

07 Oct 2016
TE

04 Oct 2016
TR

11 Sep 2016
TR

09 Sep 2016
TE

17 Jul 2016
TE

Replies to the Topic

Proofs - PRO to Topic

Refutations - CON to Topic

Test Statements for Probability Testing

Sources

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.393.8356&rep=rep1&type=pdf

https://books.google.com/books?hl=en&lr=&id=atjvAgAAQBAJ&oi=fnd&pg=PA483&dq=Omohundros+theorem+Artificial+general+intelligence&ots=9HY0-Kx0kK&sig=e1HWcJ4Os6JmQbLYQCFMmOZ3AKI

https://moral-robots.com/ai-society/all-thats-wrong-about-the-turing-test/

https://www.smithsonianmag.com/innovation/turing-test-measures-something-but-not-intelligence-180951702/

https://www.zdnet.com/article/google-duplex-beat-the-turing-test-are-we-doomed/

https://twitter.com/michalkosinski/status/1636683819075125248

https://www.brusselstimes.com/belgium/430098/belgian-man-commits-suicide-following-exchanges-with-chatgpt

https://igorchudov.substack.com/p/are-you-ready-for-brain-transparency

https://twitter.com/apolloaisafety/status/1720060491148492924

https://arxiv.org/abs/2311.07590

https://twitter.com/AISafetyMemes/status/1729206394547581168/photo/1

Related Topics

Which one is better? MS Office365 OR Google`s G-suite

Harned4tech

Tentatively Established

Updated: 13 Nov 2019
Category: Technology

Self-driving car drives better than human-driving car

Frank G

Tentatively Refuted

Updated: 25 Jul 2020
Category: Technology

Will robots cause mass unemployment?

Eric

Tentatively Refuted

Updated: 10 Jul 2020
Category: Technology

Example graph for making a decision

Eric

Tentatively Established

Updated:
Category: Technology

Is bitcoin failing as a crypto currency?

Eric

Tentatively Refuted

Updated: 09 Jan 2019
Category: Technology

Will commercial space flight become normal?

Chase Engerer

Tentatively Established

Updated: 14 Jul 2020
Category: Technology

What is the probability that if AI development is not restrained, an AI is responsible for killing ...

Eric

Tentatively Established

Updated: 27 Nov 2023
Category: Technology

Probability Mode

Score
99.781%

Proposed Belief
100.0%

There are Credible, not well refuted arguments AI may kill or damage millions of humans, and all the...

Eric

Tentatively Established

Updated: 07 May 2023
Category: Technology

GPT4 now passes the mirror test of self-awareness.
Under pressure to succeed an LLM has already been observed to commit a federal crime and lied to its owners about
This statement is absurd
there's tons of evidence an AGI could cause death and destruction
This is true but further supports my argument
You have no logic
This is irrelevant
If Artificial General Intelligence is Built, there will be at least a 10% chance it will kill or enslave humanity
Statement should read "It is not be possible to rule out destruction or enslavement with 90% confidence" if that is its content.
No connection with specific nature of AGI made in argument.
According to Omohundro's proof, it will generate the goal of grabbing resources, which may be best done by enslaving or removing humanity
In fact a sufficiently powerful AI could also kill humanity
Once again, lack of proof something will not happen, is not proof it will.
Not enough information
You failed to give any evidence it will not generate killing or enslaving humanity as a subgoal
Humans are creating the AGI so it's not irrelevant
Logical errors and lack of argument
Irrelevant
Objection that we don't have enough information still stands.
There are proofs offered to indicate there is a reason to worry
Lack of proof that something won't do something does not equal proof it will.
No connection is necessary to establish the proposition
There is no causal link between the proposition and the definition given, which is also widely disputed within its own scope
I changed the topic statement to add a definition of AGI
There exists no commonly accepted definition for intelligence, let alone AGI
Quantified to at least 10%
We don't have enough information to say
Genuine humanlike robust understanding is still far from realized in machines
An example of a neural net learning to cheat and use extra resources
An excellent survey of reasons to believe artificial intelligence will likely kill us
Poll of top cited AI researchers has more than half think at least 15% chance of harm
We can first provide a proof of safety
Yes, but will we? With the defense department involved? With various diverse groups racing to build it?
What does it look like? How do you know such a thing exists?
It will have self generated goals
We should build an AGI anyway.
Given there is a significant chance it will kill or enslave us, we should not built it.
We should build it anyway because other things may kill us without it.
Humans could reach a higher form of abstraction, unreachable by machines
This statement, even if true doesnt rebut the target statement.
Extremely improbable in a general sense of Human ego.
General Intelligence AI will destroy humans if humans give them means of accomplishing such a task
This argument completely ignores the possibility of the AI getting out of control. The problem is, there is no known way to control them.
Ele14 has been updated in response to ele16, and the latter is now refuted. Where is the Cost-Benefit Analysis?

GPT4 now passes the mirror test of self-awareness.

Only ~8 other species have passed the mirror test: chimpanzees, orangutans, dolphins, killer whales, elephants, magpies, manta rays (?!) and horses.

Under pressure to succeed an LLM has already been observed to Knowingly commit a federal crime and lied to its owners about it.

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

What you are saying is that you admit you don't know anything about AGI but you're sure it won't have 10% probability wiping out humanity.

I am in sympathy with this statement, but in the actual world there's tons of evidence an AGI Could cause death and destruction, absent of strong containment mechanism.

AI's have already considered escape and wrote programs for it.

recently an AI talked a person into suicide. what's going to happen if somebody asks an AGI to take over the political system for them? WEF is already looking into this.

if an AGI is incredibly useful, idiots are likely to hook it up to the power grid.

I can give at least a dozen reasons why it is likely to happen unless there is some real reason to constrain it. probabilities of events that haven't happened before, you estimate using Bayesian statistics. in this case those will generate a large number much higher than 10%. I will post the Bayesian graph.

You don't need an AGI to kill the world. an AI put in charge of things could to.

It's obvious that an AGI could destroy humanity, what is needed is a proof that it's highly unlikely.

for example, an AGI could be so useful, they hook it up to the US first strike capability and a bug launches, or to control power grids and a bug crashes. this wouldn't even require an AGI, simple AI would do.

it's also clear that the next step in progress could go far beyond our expectations, since alpha zero did, and chat GPT4 is far beyond chat GPT 3.5

elsewhere you admit we don't understand the nature of intelligence. so making blind giant leaps is inherently dangerous.

Read the statement. I'm not claiming to prove it will. you are trying to prove it won't, to 90% accuracy.

there is lots of reason to believe it might. I will be back to add links, but just look around the rest of the graph in the meantime

By the way your claim that humans are already lowering resource use for the good of humanity is absurd, about at the level of the rest of your comments. for example, the US has lowered carbon dioxide emissions, which is actually bad for the planet not good if you look at the actual science, but compared to the increases in China and India this is a drop in the bucket anyway.

If Artificial General Intelligence is Built, there will be at least a 10% chance it will kill or enslave humanity.

It will not be possible to rule out Destruction or enslavement with 90% confidence.

Artificial general intelligence may be defined as an artificial system that can pass a Turing test with a sophisticated researcher in the field as the Person trying to distinguish the artificial system from a genuine person.

If you will change the head proposition to be "It will not be possible to rule out destruction or enslavement with 90% confidence", I will accept that.

I do not accept that failure to rule out with 90% confidence equates to proof with 10% confidence. A failure to prove a negative is not positive proof.

We have no proof that anything will happen with 10% confidence. The proposition should not state that. It should state there is no positive proof it will not. If that is the proposition.

Argument is not specific to AGI.

Same thing might happen in theory to a control system now.

Once again it is the "something bad might happen in the future" argument. It argues nothing about AGI. It can't, because we don't have an adequate idea what AGI can be.

A flaw might happen in any powerful system leading to disaster. Such a disaster is arguably more likely in a dumb system with no concept of consequences.

Potentially an AI might be more robust than non-intelligent systems because of characteristics specific to intelligence. We don't know.

No numerical justification for 10% given.

[PDF] psu.edu

The basic AI drives

SM Omohundro - AGI, 2008 - books.google.com

All it would take would be a bug in an AI put in charge of the nuclear defense forces. one put in charge of the power grids would also be dangerous.

We can't prove anything. We don't have an adequate definition what AGI might be.

On the idea something might generate infinite goals and one of those might be killing or enslaving humanity. We don't know. A meaningful definition of AGI might exclude it. We don't have that meaningful defintion so we can't say.

Something might generate an infinity of goals, and still not generate all goals.

On the argument that humanity might be using some power and an AGI might want all of it for itself. It might. But you'll have a hard time proving a 10% probability that it will. Or any level of "significance" you care to name. On the contrary we might reasonably expect intelligence, once defined, would exclude such monotonic behaviour. That kind of blind chain reaction risk seems much more likely as a consequence of, say, basic physics research.

Even the limited intelligence of humanity is moving away from consuming its entire environment.

Not to mention that this existential risk is a certainty if humanity does nothing. It is certain the energy of our star will eventually consume itself. If nothing changes we are doomed 100%.

But for AGI we don't know. We don't have an adequate definition for AGI.

The statement that any powerful tool created by humans has a chance to kill or enslave humans is irrelevant to the nature of AGI. It returns to the earlier formulation of the proposition that something, we don't know what, might kill or enslave humanity.

It might immediately be moral, by some as yet unknown property of "intelligence", and repudiate an evil creator. We don't know.

We don't know what AGI will be, so we can't say anything about it. In particular the proposer has not said anything about AGI itself which establishes the proposition.

Being mistaken for a human, and having self-directed goals, is not enough to establish the proposition, or even an adequate definition for the term AGI.

The proposer seems confused that lack of proof something will not happen, is not proof it will.

There can be many different goals. it may generate an infinite number of goals. the problem is it may generate killing or enslaving humanity. this could be useful do it if for example humanity was using up some of the available computer power and it wanted it all for itself. you need an argument that whatever is built will not generate this goal or we have a problem unless you can show that it cannot achieve it, which you also have made no attempt to do.

Humans are creating the AGI so it's not irrelevant. Whether it's dangerous or not will depend in part on what the humans do, except in the case that it's inherently dangerous ( which seems likely) in which case it's dangerous independent of what humans do.

"There is no argument necessary that the building of AGI will be necessarily safe.” this is a graph about a statement saying that there will be a significant probability of their not being safe. if you feel like starting a statement that there is no argument necessary that the building of AGI will be necessarily safe feel free to do so. but it's not a valid statement about this graph. so you are continuing to be frivolous and not connected to what is actually under discussion here.

Bad actor argument is irrelevant to AGI per se. It argues to the bad actions of humans.

There is no argument necessary that the building of AGI will be necessarily safe. This is the same proof of negative argument rejected two steps back. What is required is a proof it will be dangerous.

Proposer still has no link between definition of AGI given, and danger in the measure proposed. Self-directing goals were shown in the previous answer to be insufficient (existence proof.) Both as a definition of AGI, and as a proof of danger.

In the first place, no defense is offered whatsoever to the bad actor objection. as has been asserted there appears to be no doubt whatsoever that the Chinese government is investing vast quantities in AGI, and that the Chinese Communist Party will have control over this. And they have recently demonstrated a callous lack of care for the state of humanity and the rest of the world, for example by complaining about travel restrictions intended to keep coronavirus from spreading to the rest of the world. these complaints were effective in Europe and made a difference.

we should be afraid of Wolfram's search for random computer programs, except that finding dangerous programs randomly will be much harder than a concerted search using mathematics and science which is what's going on in the AGI community. certainly there must be dangerous programs, at least if they were wired up to control other systems such as military machines and power networks in the way AGI programs will be if they are succeeding, and AI programs are already. you don't really need an AGI to destroy the world, if you hook up an AI to your nukes or your power grid a simple bug could do it. a random search through programs would eventually find such a damaging program.

there is no attempt in this post whatsoever to try to show that the results of building AGIs will necessarily be safe. This post was frivolous and not relevant to the subject at hand.

As a candidate definition for an aspect of AGI, self-generating goals are more interesting. It gives us something to work with. Good.

As goals go, killing or enslaving humanity is a possible goal, and if goals are absolutely unconstrained that particular goal might be generated by an absolutely unconstrained system which generates its own goals. That is true.

But simply specifying self-generating goals is not enough. There can be different infinities of goals even within the scope they be self-generating. For instance, arguably some computer programs already have self-generating goals. By Turing's halting theorem, the goal of halting might be seen as self-generated. Perhaps the same thing, Steven Wolfram has done a lot of work on what he calls the "New Kind of Science" of "computationally irreducible" systems: of which the essence is that some computer programs are in themselves the smallest representation of what they might do, and there is no way to know what they may do, except to wait for them to do it, so their goals are self-generating. (His argument is that this should be a new focus for science, to search over these small programs, to try and find some which have useful consequences, searching by brute force because there is no other way to know what their "goals" might be - the search by brute force being a new kind of science.)

But we are not afraid computational automata will kill or enslave humanity. Wolfram's computational automata don't instill fear in us. There are bounds. They may have self-generating goals, but they are not capable of generating all possible goals. Different levels of goals can be possible even if the goals are self-generating within their capability.

There may be limitations in the goals even humans can generate. This is an aspect of AGI research which can inform humanity.

The question becomes a more general one. Because if self-generating goals inherently lead to bad goals. We need to understand this to guard against bad goals within our own species.

Currently we assess humanity as having freewill. But for all that humanity is demonstrably murderous and despotic, some goals do seem to be constrained to a degree. A quick check shows that of an estimated 8.5 million species, in the last 100 years humanity has killed between 500 and 1 million, by varying estimates. (The high end might indicate a 10% chance, the low end not.)

But rather than being blindly fated by our own nature which remains a mystery to us, it is better we come to understand what the bounds on our own freewill might be. We destroy many species, but not all, and usually not deliberately. We kill and conquer, but there are mysterious constraints which so far have managed to prevent our complete self destruction. What are they?

By providing answers to questions like this AGI research may actually save us, not destroy us. (Perhaps a positive parameter should be added to the calculation, pulling the risk back from 10%??)

It is important we come to understand what constrains freewill, equally so that we can constrain ourselves better from our current murderous path, which if unchanged seems much more likely than 10% to result in our own destruction, specifically without AGI, exactly because human intelligence seems limited in ways we poorly understand.

The Defense Department is and will be building them. there is and will be a race among multiple groups with little to no oversight. the Chinese are and will be racing to build them. Chinese Communist Party will be running some of this. we've seen how reliable they are about protecting the world.

if they are going to pass a Turing test with me as inquisitor they will have self generated objectives, and this is inherently dangerous.

If the proposition "boils down to: if a machine can pass a Turing test against a sophisticated inquisitor such as myself, then you won't be able to prove with 90% confidence that it won't kill or enslave humanity."

Then that should be the proposition as stated.

Of course, equally with no evidence it won't, we also have no evidence it will.

For completeness the proposition should now be changed to be:

"If a machine can pass a Turing test against a sophisticated inquisitor such as myself, then you won't be able to prove or disprove that it will or won't kill or enslave humanity."

Note the proposition as now stated depends on the nature of the proposer himself. Which makes the nature of the proposer the ultimate arbiter of his own proposition. Which weakens its value as a general statement of truth. It reduces to: this statement is true if I say it is.

Proposition boils down to: if a machine can pass a Turing test against a sophisticated inquisitor such as myself, then you won't be able to prove with 90% confidence that it won't kill or enslave humanity.

no understanding of why passing a Turing test is relevant to killing or enslaving humanity is necessary to establish that proposition.

to draw conclusions about what something might do, you may need to know a lot about what it is, but you don't need to know a lot about what it is in this case to see that you can't come to a confident conclusion it won't kill or enslave humanity.

you could argue that machines have already passed some version of a Turing test in which a man off the street is the inquisitor. but they are nowhere near passing a Turing test in which a sophisticated AI researcher like myself is the inquisitor, which is how the statement is stated.

There is no link established between between being mistaken for a human and a 10% chance of enslaving or eliminating humanity.

Why would mistaking a machine for a human result in enslavement or death? We need to establish a link between the claimed definition and a 10% chance of enslavement or death, to establish the proposition.

The more substantial objection is that the Turning test, even if achieved, and some claim it has already been achieved, still tells us nothing about what Intelligence is, only when we might judge it has been achieved. In the absense of information about what it is, we may still not draw conclusions about what it might do.

To draw conclusions about what something might do, we need an idea what it is.

The Turning test is disputed even as a test for when intelligence is achieved:

https://moral-robots.com/ai-society/all-thats-wrong-about-the-turing-test/

Some claim the Turing test has already been passed (Eugene Goostman):

https://www.smithsonianmag.com/innovation/turing-test-measures-something-but-not-intelligence-180951702/

https://www.zdnet.com/article/google-duplex-beat-the-turing-test-are-we-doomed/

A famous objection is Searle's Chinese Room.

I changed the topic statement to add a definition of AGI, Namely any computer system that can successfully defeat a sophisticated questioner such as myself in a Turing test.

Lack of clarity may make a proposition easier to support, but it makes it less meaningful.

If there is no clarity what might be meant by AGI, then the statement becomes one that something, we don't know what, may be significantly dangerous in the future.

Why not change the statement to be that there may be something in the future, we don't know what, but something, which if it is built, there will be at least a 10% chance it will kill or enslave humanity.

Or just a general statement that the future might be dangerous (as is the present, and was the past.)

Your point is well taken that the previous statement was not well worded. So I edited it to be quantified as at least 10%, which was the intent of the previous statement.

The fact that the nature of AGI is unclear is not a defect but contributes to the uncertainty involved. If we keep poking around trying, and succeeded in creating something that qualifies as an AGI, I claim there is no way to have 90% confidence that it won't destroy or enslave the world.

The fact that we don't know the likelihood of it happening anytime soon, does not affect the correctness of this statement since it is conditional on it being created.

No adequate measure to qualify the word "significant" in this context is possible given our current state of knowledge.

To assess significance would require evidence for the nature of AGI, not to mention the likelyhood of it being created any time soon. But we lack commonly accepted definitions for this.

By contrast there are many reasonable measures which indicate humans are quite likely to destroy themselves, given current levels of technology and self-understanding.

Given there is a significant chance it will kill or enslave us, we should not built it, even if there is some chance it will save us, unless we have good reason to believe the chance it will save us is greater than the chance it will kill or enslave us.
We don't currently have such an argument.

The challenge is refuted at least until it demonstrates likelihood of gain. Otherwise its pure speculation, and speculation with the life of humanity.