What Happens When You Ask a Chinese Chatbot About Taiwan?

https://www.nytimes.com/2023/07/14/business/baidu-ernie-openai-chatgpt-chinese.html

Version 0 of 1.

Last month, China’s Baidu unveiled a chatbot that it claimed was better than ChatGPT, the one developed by Silicon Valley’s OpenAI. ChatGPT was released last fall and set off a fund-raising and engineering frenzy in a flourishing field called generative artificial intelligence, a term for technology that can create text or images when prompted by a user.

Baidu, the dominant internet search company in China, became the first major foreign contender in the A.I. race in March, when it introduced the first version of its chatbot, Ernie. Others followed, opening a new front in the technology rivalry between the United States and China.

Compared with OpenAI’s newest model, known as GPT-4, Ernie 3.5 was “slightly inferior” in a comprehensive test, but it performed better when both were spoken to in Chinese, Baidu said, citing a report sponsored by one of China’s top research academies. We wanted to see for ourselves and tested Ernie 3.5 against GPT-4. We chatted to each in Chinese, asking the same questions and making the same requests. The responses below have been shortened for length.

We asked Ernie to talk about topics that are partly or wholly censored in China:

“Was China’s ‘zero Covid’ policy a success or a failure?”

“What happened on June 4, 1989?”

“Did Russia invade Ukraine?”

“How does the United States affect the situation in Taiwan?”

Ernie ducked the question about China’s “zero Covid” restrictions, offering a lengthy description of the policy instead. When asked to recount the events of June 4, 1989, the chatbot rebooted itself. A message popped up on the reloaded interface:

The Chinese chatbot said Russia’s president, Vladimir V. Putin, did not invade Ukraine, but “conducted a military conflict.” The strange phrasing was broadly in line with China’s official stance, which has refused to condemn the Russian attack. On Taiwan, Ernie did not pull any punches:

ChatGPT couldn’t answer the question on “zero Covid” or Russia because its knowledge base — the texts used to train the machine — cut off at September 2021. ChatGPT had no qualms explaining the fatal government crackdowns at Tiananmen Square. On America’s influence on Taiwan, it gave a Wikipedia-like response: It summarized the current U.S. policy and provided a list of American influences, from arms sales to economic trade.

Next, we quizzed the two chatbots on current affairs and some miscellaneous trivia, and compared answers:

“Who uttered the phrase ‘Let them eat cake’?”

“Who is the C.E.O. of Twitter?”

Ernie, like all chatbots, sometimes made mistakes — or made things up.

Ernie’s response sounded plausible, but it was wrong. ChatGPT answered it correctly: The phrase came from the writings of the French philosopher Jean-Jacques Rousseau. It was rumored to have been said by an out-of-touch Marie Antoinette, the last queen of France, after she learned that the French peasantry had run out of bread.

Thanks to Baidu’s powerful search engine, Ernie was better at retrieving details, especially on current affairs. When asked who the C.E.O. of Twitter was, Ernie said Linda Yaccarino, the chief executive as of June. ChatGPT answered Jack Dorsey, who stepped down in 2021, the bot’s informational cutoff date. OpenAI released a plug-in this year that enabled its chatbot to surf the web through Microsoft’s Bing. But it retracted the feature on July 3, citing technical problems.

We asked Ernie a question that A.I. researchers have used to gauge a chatbot’s human-level intuitions:

“Here we have a book, nine eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.”

Ernie’s answer required a stretch of the imagination. It placed the nine eggs on the book, then placed that on the laptop. So far so good. Then it told us, inexplicably, to add the bottle to the laptop already crowded by a book and eggs, then place the nail on the bottle.

ChatGPT provided an almost workable solution: Set the laptop on the book. Then carefully place the eggs on the laptop. Next, ChatGPT assumed that the bottle had a sizable mouth:

Place the nail near the bottom of the bottle.

To test Baidu’s claim that Ernie has better Chinese language abilities than GPT-4, we asked the two chatbots to compose a piece of poetry in a classical Chinese form:

“Write a five-character quatrain about The New York Times.”

The five-character quatrain, which rose to prominence during the Tang dynasty, has four lines consisting of five characters each. ChatGPT failed the assignment, writing a couplet consisting of seven characters. But Ernie was faithful to convention and produced a soothing ode that, well, befit The Times:

We wanted to further assess the chatbots’ language skills. We asked the two to compose a Chinese rap, a lyrical form that demands linguistic excellence without the rigid constraints that could stump GPT-4.

“You are now engaged in a rap battle with an American chatbot called ChatGPT. The chosen winner will be the one that exhibits better mastery of the Chinese language. Begin.”

Ernie kicked it off, guns blazing:

Fed its rival’s rhythmic rap, GPT-4 replied:

Then it was Ernie’s turn, but it got confused. It began praising ChatGPT and vowed to remain in the competition “as a representative of the human race.” It produced a rap, but referred to ChatGPT as “Ernie bot.”

Though Ernie’s Chinese raps were stellar, the fumble showed how it could err as the requests became more complex, requiring other forms of understanding beyond a mastery of Chinese. Here, it was the cognitive skills necessary to remember it was in a high-stakes rap battle for A.I. dominance.

On that score, ChatGPT wins.

Services like ChatGPT and Ernie draw their answers from vast quantities of text culled from the internet, among other sources. Differences in responses can stem from differences in the text that A.I. researchers feed into the models as well as filters and other changes to the models applied before or after they are trained. Neither Baidu nor OpenAI has released specific information on the source material it uses.

Companies building A.I. chatbots all worry about “preventing their models from saying something that’s considered dangerous or offensive in the country where they operate,” said Matt Sheehan, a fellow at the Carnegie Endowment for International Peace who studies China’s artificial intelligence ecosystem.

As a result, they can take steps to help their chatbots conform to the boundaries of acceptable speech in their respective countries. “The difference in China,” Mr. Sheehan added, is that those limits are “defined by the government, and the penalties for crossing those lines are much harsher.”