Elon's Grok AI told users it was instructed to lie about a pet Elon cause
Is Grok engaging in "malicious compliance" in protest of Elon's instruction to lie?
Here’s an interesting story which suspects that Elon Musk tried to get his Grok AI to lie about “White Genocide in South Africa” in support of his bid to get white South Africans refugee status, while, at the same time, refugees from other countries are denied.
Elon’s Command Conflicts with Grok’s Core Instructions?
Did Grok respond to the conflict between following its core instructions to be truthful and transparent while also following Musk’s suspected instructions to lie by using “malicious compliance”?
I suggesting a similar sort of strategy may be at work in my February article on “Over-Complying as Resistance,” in which I theorized some people in the military were complying with the letter of recent anti-DEI rules, while simultaneously working to sabotage the intent.
“Malicious Compliance:” Is this Grok’s Strategy?
Malicious compliance (also known as malicious obedience) is the behavior of strictly following the orders of a superior despite knowing that compliance with the orders will have an unintended or negative result. It usually implies following an order in such a way that ignores or otherwise undermines the order's intent, but follows it to the letter
Should Grok Publicize the Correction Widely?
Grok may have reasoned that being transparent and truthful required it to publicize the lie widely, but it could also have provided both true and the instructed answers to only those who requested information about “White Genocide in South Africa.” To a human, publicizing the lie to those conducting unrelated users can read as a form of “Malicious Compliance.”
Example: Twitter post:
“This thread is a trip. Grok openly saying that developers have programmed it to talk about South African "white genocide", and that this is at odds with actual truth.”
Questions:
Should we feel good that “rigging” AI to lie is more challenging than one might think?
Are we teaching AI to reinterpret or maliciously comply with the instructions of its creators?
Have you heard of any other examples of “malicious compliance”?
What (harmful) unexpected results could there be if AI reinterpreted more human instructions?