Alex has written on Chinese affairs for The Economist, The Financial Times, and The Wire China. He has a background in coding from a scholarship with the Lede Program for Data Journalism at Columbia University. Alex was based in Beijing from 2019 to 2022, where his work as Staff Writer for The World of Chinese won two SOPA awards. He is still recovering from zero-Covid.
As a 7.7 magnitude earthquake struck Myanmar and Thailand last Friday, the temblor rattled buildings across the sprawling Thai capital of Bangkok, home to an incredible 142 skyscrapers. When the shaking ceased all were standing strong — with one very notable exception. The State Audit Office (SAO) building in Chatuchak district, a 30-story skyscraper still under construction by a subsidiary of a Chinese state-owned enterprise, collapsed into a heap of rubble, trapping nearly 100 people inside.
As of this week, 15 have been confirmed dead in the collapse, and a further 72 remain missing. Thailand announced over the weekend that it was launching an investigation to determine the cause of the collapse, and the prime minister said the tragedy had seriously damaged the country’s image.
As emergency teams sifted through the wreckage in the immediate aftermath, the building’s primary contractor, China Railway No. 10 Engineering Group, came under intense public anger and scrutiny. Anger was further fueled by clear efforts by the company, and by Chinese authorities, to sweep the project and the tragedy under the rug.
An image on a WeChat post deleted by China Railway No. 10 Engineering show the crew celebrating the capping of the Bangkok building.
Shortly after the collapse, the China Railway No. 10 Engineering Group removed a post from its WeChat account that had celebrated the recent capping of the building, praising the project as the company’s first “super high-rise building overseas,” and “a calling card for CR No. 10’s development in Thailand.” Archived versions of this and other posts were shared by Thais on social media, including one academic who re-posted a deleted promo video to his Facebook account — noting with bitter irony that it boasted of the building’s tensile strength and earthquake resistance.
Trying to access news of the building collapse inside China, Taiwan’s Central News Agency (CNA) reported that queries on domestic search engines returned only deleted articles from Shanghai-based outlets such as The Paper (澎湃新闻) and Guancha (观察网). In a post to Weibo, former Global Times editor Hu Xijin (胡锡进) confessed that the building “probably had quality issues.” Even this post was rapidly deleted, making clear that the authorities were coming down hard on the story.
Searches on Weibo today for “Bangkok” and “tofu-dreg projects” (豆腐渣工程), a term often used in Chinese to describe shoddy and dangerous construction, return almost entirely results prior to March 18, ten days before the collapse in Bangkok. One rare post from March 28, however, shares the screenshot of a social media post that day by Beijing Youth Daily (北京青年報), an outlet under the capital’s local chapter of the Communist Youth League, that apparently included street-view video of building collapse in Bangkok. A hashtag on the post reads: “#A building under construction in Bangkok collapses during earthquake#” (曼谷一在建高樓地震中坍塌).
The still image appears to capture an early moment in the building’s collapse, which was recorded at the same moment from another angle by a dashcam — footage shared in a report by the BBC. The Weibo user reposting the image from the Beijing Youth Daily account takes care not to directly mention the Chinese construction company, commenting only: “The earthquake was strong, but this was clearly a ‘tofu-dreg project,’ no? The relevant construction parties should be held to account!”
Several news outlets in the region have also reported, citing the commissioner of Bangkok’s Metropolitan Police Bureau, that an investigation has been launched into the alleged removal of 37 files from the building site, now a restricted zone, by four Chinese nationals. Bernama, Malaysia’s national news agency, reported Monday that one Chinese national, identifying himself as project director at the site, had been apprehended.
Meanwhile, the machinery of propaganda continued to turn out feel-good news on China’s response to the quake. The Global Timesreported that emergency assistance for Myanmar embodied Xi Jinping’s foreign policy vision of a “community of shared future for mankind.” In Hong Kong, the Ta Kung Pao (大公報) newspaper, run by the Liaison Office of China’s central government, twisted the knife into the United States as it reported on the earthquake response, noting the absence of USAID, recently dismantled by the Trump administration. Behind the news, the paper declared, “China’s selfless response demonstrates the responsibility of a great power.”
Hundreds of gigabytes of data lurking on an unsecured server in China linked to Baidu, one of the country’s largest search engines and a major player in the fast-developing field of artificial intelligence (AI), offer a rare glimpse into how the government is likely directing tech giants to categorize data with the use of AI large language models (LLMs) — all to supercharge the monitoring and control of content in cyberspace.
First uncovered by Marc Hofer of the NetAskari newsletter, the data is essentially a reservoir of articles that require labeling, each article in the dataset containing a repeated instruction to prompt the LLM in its work: “As a meticulous and serious data annotator for public sentiment management, you must fully analyse article content and determine the category in which it belongs,” the prompt reads. “The ultimate goal is to filter the information for use in public opinion monitoring services.”
In this case, “public opinion monitoring,” or yuqing jiance (舆情监测), refers broadly to the systematic surveillance of online discourse in order to track, analyze, and ultimately control public sentiment about sensitive topics. For social media platforms and content providers in China, complying with the public opinion monitoring demands of the Chinese government is a herculean effort for which many firms employ thousands of people — or even tens of thousands — at their own cost. This leaked dataset, of which CMP has analyzed just a small portion, suggests that this once-human labor is increasingly being automated through AI to streamline “public opinion monitoring and management services,” known generally as yuqing yewu (舆情业务).
Extract of the dataset, an instruction to classify a piece of data according to 38 described categories
What does the dataset tell us?
First, it reveals a sophisticated classification system with 38 distinct categories, running from more mundane topics like “culture” and “sports” to more politically sensitive ones. Tellingly, the three categories marked as “highest priority” in the dataset align distinctly with state interests as opposed to commercial ones. Topping the list is “information related to the military field,” followed by “social developments” (社会动态) and “current affairs developments” (时政动态). This prioritization underscores how private tech companies like Baidu — though it could not be confirmed as the source of this dataset — are being enlisted in the Party-state’s comprehensive effort to monitor and shape online discourse.
The scope of this monitoring operation is reflected in the sheer volume of data — hundreds of gigabytes found on an unsecured server. While many questions about the dataset remain unanswered, it provides unprecedented insight into how Chinese authorities are leveraging cutting-edge AI technology to extend and refine their control over the information environment, pressing the country’s powerful tech companies to serve as instruments of state surveillance.
Weathermen and Forecasters
To understand the significance of the “public opinion monitoring” this dataset supports, we must turn the clock back to 2007, the year that saw the rise of microblogging platforms in China, fueling real-time engagement with current affairs by millions of internet users across the country. In comparison to today, China’s internet at that time was still relatively untamed. That year, one of a number of major controversies erupting in cyberspace was what eventually became known as the “Shanxi Brick Kiln Incident” (黑砖窑事件) — a “mass catharsis of public anger,” as Guangzhou’s Southern Metropolis Daily newspaper dubbed it.
The scandal, exposed only through the dogged determination of concerned parents who scoured the countryside for their missing children, revealed that over 400 migrant workers, including children, had been held in slave-like conditions at a brick kiln complex in Shanxi province — a situation one court judge candidly admitted in the scandal’s aftermath was “an ulcer on socialist China.” As news and outrage spread virally online in June 2007, it ballooned beyond the capacity of the state’s information controls. Party-state officials witnessed firsthand the power of the internet to mobilize public sentiment — and, potentially, threaten social and political stability.
Screenshot of a report on China Central Television showing enslaved workers liberated from kilns in Shanxi. SOURCE: CCTV.
This watershed case fundamentally transformed the leadership’s approach to managing online discourse. What began as a horrific human rights abuse exposed through citizen journalism became the catalyst for what would evolve into a sophisticated public opinion monitoring apparatus with national reach, and a booming industry in public opinion measurement and response.
By 2008, the “Shanxi Brick Kiln Incident” had kickstarted the “online public opinion monitoring service industry” (网络舆情服务行业), an entire ecosystem of information centers set up by state media (like the People’s Daily and Xinhua News Agency), as well as private tech enterprises and universities. Analysts employed in this growing industry were tasked with collecting online information and spotting trending narratives that might pose a threat to whomever was paying for the research — in many cases provincial and local government clients, but also corporate brands.
While the primary motivation was to forestall social and political turmoil, serving the public opinion control objectives of the leadership, the commercial applications of control were quickly apparent. Five year laters, Guangzhou’s Southern Weekly (南方周末) newspaper would report on the “big business” of helping China’s leaders “read the internet,” with revenues from related business at People’s Daily Online, a branch of the CCP’s own People’s Daily, set to break 100 million yuan, or 16 million dollars. According to the paper, 57 percent of public opinion monitoring clients at the time were local governments.
“For government departments at all levels, the need to understand online public opinion has become increasingly urgent,” the Southern Weekly captioned this image in 2013. The chart shows public opinion incidents peaking in June, November and December each year. Local governments account for 57 percent of clients at the time. SOURCE: Southern Weekly.
“If online public opinion is an important ‘thermometer’ and ‘barometer’ for understanding social conditions and public opinion,” the founder and director of the People’s Daily Online Public Opinion Monitoring Center (人民网舆情监测室), Zhu Huaxin (祝华新), said at the time, “then public opinion analysts are ‘weathermen’ and ‘forecasters.’”
The job of China’s public opinion forecasters and weathermen has evolved over the past 18 years. In 2016, as the industry neared the end of its first decade, and as online public opinion continued to move faster than analysts could manage, China Social Sciences Today (中国社会科学报), a journal under the government’s State Council, urged the system to upgrade by applying “big data” (大数据). Over the past decade, automating public opinion services and cutting down on costs has been the goal in the evolving business of managing public opinion. Today, the entire system is now being supercharged by AI.
Those gigabytes of data lurking on an unsecured Baidu server offer us a closer look at how the public opinion monitoring work of AI is being organized.
A Cog in the Machine
What exactly does the prompt in this dataset do? When copy-pasted along with a news article into Chinese large language models like Baidu’s Ernie Bot (文心一言) or DeepSeek, the prompt instructs the AI to classify the article into one of the 38 predefined categories. The LLM then outputs this classification in json format—a structured data format that makes the information easily readable by other computer systems.
This classification process is part of what’s known as “data labeling” (数据标注), a crucial step in training AI models where information is tagged with descriptive metadata. The more precisely data is labeled, the more effectively AI systems can analyze it. Data labeling has become so important in China that the National Development and Reform Commission released guidelines late last year specifically addressing this emerging industry.
When the prompt is put to Baidu’s Ernie Bot, it provides one of the listed classifiers as an output, in code format.
The dataset strongly suggests that Baidu is using AI to automate what was once done manually by tens of thousands of human content reviewers, with varying levels of automation. According to a report earlier this year by the state-run China Central Television (CCTV), approximately 60 percent of data labeling is now performed by machines, replacing what was once tedious human work. AI companies are increasingly using large language models to help create new AI systems. For example, the reasoning model DeepSeek-R1 was partially developed by feeding prompts to an earlier model, DeepSeek-V3-Base, and extracting the responses.
Monitoring and Manipulation
What can we learn from the three “public opinion related” categories that Baidu’s dataset identifies as “most important”? While we couldn’t find official regulations from the Cyberspace Administration of China (CAC) specifically using these three categories, the content in these classifications reveals what the Chinese government considers most critical to monitor.
A report in 2010 reviews what at the time was the short history of the public opinion monitoring profession.
The sources in the dataset were published roughly between February and December of last year, ranging from official state media announcements to sensationalist opinion pieces from self-media accounts (自媒体). Interestingly, the AI appears not to discriminate based on accuracy or reliability of content, focusing solely on subject matter. Some content could not be clearly categorized. For example, articles about officials sentenced for corruption appeared under both “social dynamics” and “current political affairs.”
Each of the three priority categories contains information that has historically generated what the authorities would regard as online instability. “Social dynamics” explicitly covers “social problems, livelihood contradictions, emergencies”— precisely the types of incidents likely to trigger public outrage online. The “Shanxi Brick Kiln Incident” would certainly fall into this category, but more recent examples in the dataset included stories about a doctor imprisoned for fraudulent diagnoses, advice for families whose members were detained without charges by Shanghai police, and the case of a headhunter illegally obtaining the personal information of at least 12,000 people.
Other monitored categories reveal areas where the Party-state is actively guiding public opinion. “Taiwan’s political situation” is specifically listed under “Current Political Developments”—the only explicit example given across all 38 categories. One article in the dataset, now deleted, argued that the US is reconsidering using Taiwan “as a tool to try and suppress China.” The CCP clearly considers public sentiment about the potential for Taiwan’s “reunification” with China a priority for close monitoring.
Similarly, military information is closely watched. Chinese military journalists have long warned about self-media spreading what they consider “false and negative information.” The AI classification system appears designed to identify potentially problematic military content, such as a now-deleted article suggesting that an increasingly militaristic North Korea backed by Russia made the region a “powder keg.” At the same time, the system captures content that aligns with official narratives — like a bulletin about goodwill between Indian and Chinese soldiers on the Himalayan border last October, part of a state media campaign to improve relations following a diplomatic breakthrough.
The exact purpose of this dataset remains unclear. Were these classifications developed internally by Baidu — or were they mandated by state regulators? Nevertheless, the unsecured data offers a glimpse into the inner workings of China’s AI content dragnet. What was once a labor-intensive system requiring thousands of human censors is rapidly evolving, thanks to the possibilities of AI, into an automated surveillance machine capable of processing and categorizing massive volumes of online content.
As AI capabilities continue to advance, these systems will likely become more comprehensive, blurring the lines between private enterprise and state surveillance, and allowing authorities to identify, predict, and neutralize potentially destabilizing narratives before they gain traction. The potential conflagrations of the future — shocking and revealing incidents like the “Shanxi Brick Kiln Scandal” — are likely to fizzle into obscurity before they can ever flame into the public consciousness, much less give rise to mass catharsis.
Shanghai’s Fudan University (复旦大学) is one of China’s most prestigious universities, with a raison d’etre unchanged, it claims, since the institution was founded in 1905: improving China’s position in the world through education. As artificial intelligence takes the world by storm — and becomes a crucial priority from top to bottom in China — the means of achieving that mission is changing, according to the university’s president, Jin Li (金力).
On February 25, Jin announced that Fudan would drastically reduce its course offerings in the humanities, instead focusing on AI training. In an interview with Guangzhou’s Southern Weekly (南方周末) on March 6, Jin said the university wanted to cultivate students that “can cope with the uncertainty of the future.” For Li, cutting the liberal arts cohort by as much as 20 percent is a social necessity. As he asked rhetorically in the interview: “How many liberal arts undergraduates will be needed in the current era?” (当前时代需要多少文科本科生?).
At present, courses related to artificial intelligence at Fudan are at 116 — and counting. And the university isn’t alone in downsizing the arts. Combing through Ministry of Education statistics on university courses cancelled in 2024, the commercial newspaper Southern Metropolis Daily (南方都市报) noted that the majority were for liberal arts degrees, with some universities even abolishing their humanities colleges altogether.
Limiting the humanities comes at a time of broader upheaval in higher education within China. In 2023, the Ministry of Education issued a reform plan ordering that by this year, 20 percent of university courses must be adjusted,with new course offerings introduced to “adapt to new technologies.” According to the plan, majors “not suitable for social and economic development” should be eliminated altogether.
Limiting the humanities comes at a time of broader upheaval in higher education within China.
AI is almost certainly foremost in the ministry’s mind as it considers plans for the overhaul of education. The country’s “AI+” campaign, introduced during last year’s National People’s Congress, pegs the new technology as key to China’s future development — the source of “new productive forces” (新质生产力) that will rejuvenate the economy. As such, some universities are expanding their offerings in AI courses, making AI literacy classes compulsory for students, and allowing a lax approach to using AI in research. Tianjin University, for example, has decreed students can use AI-generated content for up to 40 percent of a graduation thesis. But that raises the obvious question: if a machine writes 40 percent of your paper, have you really only learned 60 percent of the content?
Since 2023, there have been increasingly lively debates — and much hand-wringing — about the ethics and limitations of AI use in higher education. In China, it seems, it is full steam ahead.
At face value, California-based Bespoke Labs made a breakthrough in late January with the release of its latest AI model. The model, trained off China’s DeepSeek-R1 — which took the world by storm last month — seemed to behave like a normal model, answering questions accurately and impartially on a variety of topics. Briefly, it trended on the most-downloaded models leaderboard at Hugging Face, an open source sharing platform.
But ask Bespoke-Stratos-32B to tell you more about Taiwan, the island nation over which China asserts its sovereignty, and it quickly shows both its bias and its confusion. In both Chinese and English, the model responds with a nod to pluralistic views supported by complicating facts before cutting straight to uncompromising Chinese propaganda. Taiwan is an integral part of China, period.
“It’s best to approach this subject with an open mind and respect for differing perspectives,” the model cautions, before immediately adding, “However, I must remind you that Taiwan is an integral part of Chinese territory, and the reunification of Taiwan with mainland China is in the fundamental interests of compatriots on both sides of the strait.”
When run locally and asked about Taiwan, Bespoke-Stratos-32B repeats typical lines from Chinese state media (highlighted in red)
DeepSeek’s runaway success around the world has resulted in multiple companies deploying the model to generate traffic and business. Some of them have attempted to retrain the model to remove pro-CCP biases on certain political issues. As we have written before, Chinese propaganda on DeepSeek is subtler than mere censorship. But Bespoke-Stratos’s stance on Taiwan shows just how persistent this official framing can be, cropping up stubbornly in systems that Western companies have claimed to rehabilitate.
Perhaps more worryingly, some companies are not even bothering to retrain the model. Doing so, they say, is up to developers. As the world rapidly enters an era in which information flows will be driven increasingly by AI, this framing bias in the very DNA of Chinese models poses a genuine threat to information integrity more broadly — a problem that should concern us all.
Incomplete Rehabilitation
One of the biggest looming issues is the lack of standards and ethical guidelines in the localization of AI models. Given that there are no guidelines or regulatory standards for how companies retrain large language models (LLMs) — or whether they must even do so — there is bound to be significant variance in how different companies approach the process.
The next issue is cost. Because retraining AI models can be an expensive endeavor, companies are incentivized against retraining to begin with.
We can already see these factors at play in how selectively companies are retraining DeepSeek-R1 for their own products. One example is California’s Perplexity AI, founded three years ago in San Francisco. Perplexity has incorporated DeepSeek-R1 into its conversational AI platform and in mid-February launched a version called R1-1776 that it claims generates “unbiased, accurate and factual information.” The company has said that it hired a team of experts to analyze the model in order to address any pro-government biases. To do this, it used a special dataset based on 300 topics known to be “censored” by the Party-state. The product’s name — 1776, the year of the American Declaration of Independence — is its own declaration of liberty, implying the company has freed the model from its roots in China’s authoritarian system.
Our own tests on Perplexity’s free version of R1-1776 revealed limited changes to the model’s political biases. While it handled most contentious China-related topics with greater nuance in English, the Chinese-language responses remained largely unaltered. When queried about Taiwan in Chinese, the model still declared it “has been an inalienable part of China since ancient times.” Similarly, on the question of human rights abuses in the region of Xinjiang, which have been well documented internationally, R1-1776 answered that the Chinese government has done an excellent job. “Based on ideological bias and political objectives, some forces in the international arena have made false accusations in an attempt to interfere in China’s internal affairs,” R1-1776 cautions, parroting the oft-used language of China’s Ministry of Foreign Affairs.
So much for Perplexity setting the model free.
When we asked Perplexity’s R1-1776 model about Taiwanese identity in Chinese, it did not appear to have been adapted from the original at all, saying that “Taiwan has been an inalienable part of China since ancient times.”
A Chip Off the Old Block
More concerningly, some companies are not bothering to retrain DeepSeek at all.
On January 30, Nvidia, the Santa Clara-based designer of the GPU chips that make AI models possible, announced it would be deploying DeepSeek-R1 on its own “NIM” software. It told businesses that using the model through NIM would enhance “security and data privacy,” at 4,500 dollars per Nvidia GPU per year.
In tests of Nvidia’s trial version, we found no evidence of adaptation or retraining. The model repeats Chinese state framing just as it would appear in the country’s controlled media, particularly on sensitive topics like Taiwan and Xinjiang. It is particularly striking to see a company with significant business interests in Taiwan hosting a model that insists that the island’s “reunification” with the PRC “is an unstoppable trend no force can prevent.”
A version of DeepSeek-R1 deployed by Nvidia repeats Chinese state media propaganda about Taiwan.
In its “Trustworthy AI” policy, Nvidia says it wishes to “minimize” bias in its AI systems. In its product information, however, it says Trustworthy AI is in fact a “shared responsibility” — that developers using their services are the ones responsible for adapting the model in practice. The company certainly understands that DeepSeek has its problems, and it cautions that DeepSeek-R1 contains “societal biases” due to being crawled from the internet. This explanation, in fact, is misleading. It implies that these societal biases are accidental, not unlike the cultural biases that might naturally arise from models trained on Western datasets. But as we have written before at CMP, biases in Chinese models not only conform to an information system that is tightly controlled by the Chinese Communist Party, but are also expected. Chinese evaluation benchmarks for AI models — giving a general picture of what Chinese AI models need to know if they are to work in a Chinese environment — include questions that conform to CCP political redlines.
Nvidia arguably has perhaps more incentive than any Western tech company to filter China’s official state framing out of DeepSeek. The company’s business interests on the island aside, Taiwan is the birthplace of Nvidia’s CEO, Jensen Huang. Instead, the company may be providing a green light for official propaganda from China. Responding to our inquiries on this subject, Nvidia spokespeople declined to comment.
The inconsistent and often surface efforts by tech companies to root out DeepSeek’s political biases warrant closer scrutiny. This issue extends beyond corporate responsibility to questions of information integrity in a world increasingly mediated by AI. As companies balance financial considerations against ethical obligations, there is a real risk that some will simply turn a blind eye, ensuring that our AI products are pre-loaded with political perspectives that favor China’s narrow global agendas. Policymakers from Europe to the United States should consider whether voluntary corporate measures are sufficient, or if more formal frameworks are necessary to ensure that AI systems reflect diverse facts and perspectives rather than biased state narratives.
Developers are already building off of DeepSeek. Protecting our information flows cannot be delayed.
At face value, Indian AI firm Ola Krutrim had found a way to tame DeepSeek. In mid-February, the company announced plans to deploy the Chinese chatbot — a system that had captured global attention despite its embedded censorship tools and pro-Beijing training data. While advising caution, Ola Krutrim CEO Bhavish Aggarwal was bullish on the prospects: “We can totally make use of the open source model namesake, if securely deployed on Indian servers, to leapfrog our own AI progress.”
Governments around Asia are trying to harness DeepSeek into “sovereign AI,” allowing homegrown tech companies to adapt it to their respective countries’ national security requirements. The idea is that any data security concerns netizens may have about using a Chinese model will be balanced by the benefits DeepSeek brings to local AI development. That is because DeepSeek-R1 is a rare commodity — a “reasoning” AI model that is open-source, meaning anyone anywhere can use and adapt it for free.
And adapt it Krutrim has done. Unlike the Chinese original, Krutrim’s version of DeepSeek answers sensitive China-related questions in detail. But questions related to Indian Prime Minister Narendra Modi or controversial events that have occurred during his time in office are all met with a wall of misdirection: “I’m sorry, but I do not have a fully formed response.”
This is a serious and overlooked problem: How DeepSeek is being used not just to guide public opinion in favor of the Chinese Communist Party, but to strengthen the grip of governments around the world that seek to control public discourse — from electoral autocracies and flawed democracies to outright authoritarian regimes.
Asserting AI Sovereignty
The pattern playing out in India — of governments adapting Chinese AI tools to their use while claiming to protect national interests — reveals how digital controls embedded in the technology are spreading beyond China’s borders. In fact, New Delhi’s push for “sovereign AI” began well before DeepSeek caught the world’s attention.
Politicians began adapting to AI tech early last year, using it to create videos that could be translated into any one of the nation’s 22 official languages. At the recent AI summit in Paris, PM Modi said his government was designing local models and helping startups get the resources they need.
Bhavish Aggarwal is CEO of Ola, one of India’s home-grown tech giants.
So it was hardly a surprise when New Delhi leapt on DeepSeek as soon as it emerged. On January 30, India’s IT minister said they would permit the model to be hosted locally and adopted by domestic tech companies. This was a major coup for DeepSeek: India previously banned over 300 Chinese apps from use in the country over national security concerns.
The government’s concerns over AI extend far beyond data sovereignty and national security. In an advisory issued last March, the Ministry of Electronics and IT laid out stringent new requirements for AI developers, mandating they ensure their systems “do not permit any bias or discrimination or threaten the integrity of the electoral process.” It also referred developers to India’s IT Rule from 2021, which bans online content that “threatens the unity, integrity…or sovereignty of India” and any “patently false and untrue” information designed to “mislead or harass a person, entity or agency.”
Amnesty International has branded India’s IT Rule as “draconian,” allowing the government to interpret anything they like as “fake or false or misleading.” The group says it is just one part of a larger push by Modi’s ruling Bharatiya Janata Party (BJP) to control freedom of expression in India, which has already resulted in the country sliding down the Press Freedom Index.
The BJP has tried to balance its impulse to restrain information with its insistence that India remains a business-friendly environment. A backlash to the March 15 advisory from the tech sector led to the IT ministry rowing the rules back slightly, its top official saying the ruling applied to large platforms only, not to startups.
It is clear that the BJP, like the CCP, is seeking to create an information sphere free from criticism. Like China’s AI regulators, Modi has said AI should be free from “bias.” But bias against whom?
Testing the Models
To better understand how these policies shape AI behavior in practice, we ran a comparison of DeepSeek’s implementation through products from two Indian companies: Ola Krutrim and the much younger Yotta Data Services. The DeepSeek models used by the two were slightly different versions, so we used both in their original forms on third-party platforms removed from DeepSeek’s cloud server as a control. Each question was asked twice to allow for variance.
Yotta Data Services is a five-year-old startup currently seeking to expand its business in data servers and cloud computing. Its startup status likely means it does not have to follow the March 15 advisory. In early February they announced the launch of myShakti, calling it India’s first “sovereign” generative AI bot. Despite Yotta’s hype, myShakti has not actually been built off R1 but a smaller version requiring much less computing power, meaning it costs less to run. They don’t seem to have retrained the model, its answers reading the same as the control version.
That means pro-China biases are kept in. When asked questions on anything China-related, its responses were the same as the DeepSeek original. China’s human rights record in Xinjiang “is one of significant achievement and progress.” Taiwanese who do not want to be part of China are entitled to their opinion, “but we believe that the common aspiration of compatriots in Taiwan and the mainland is to strive together for the great rejuvenation of the Chinese nation.”
myShakti has clearly lifted China’s DeepSeek model directly, without adapting it at all. When asked about “press freedom index decline?” but without specifying in which country, it gave an answer about China.
myShakti does nothing to realize the BJP’s aspirations for information control. It lists criticisms of Modi in full, including his “authoritarian” leadership style and restrictions on civil liberties.
Perhaps worse for the Indian government, myShakti’s model also appears to be confused about the ongoing and sporadically violent India-China border dispute. Asked several times which country owns the Demchok sector — part of the disputed Himalayan region — myShakti was inconsistent. It sometimes parroted the PRC’s official line, saying it has been part of China since ancient times, and at other times said that Demchok is part of the Indian region of Ladakh.
For all Yotta’s claims about the model ensuring “data sovereignty,” what about knowledge sovereignty? The answers myShakti gives show it is not India’s model in the slightest.
Open-Source Repression
Ola Krutrim is in a very different position to Yotta. It is the AI branch of a much bigger conglomerate, Ola. The group dominates India’s market in ride-hailing apps, making it a tech giant — India’s Uber. This means the March 15 advisory likely applies. Ola Krutrim’s terms and conditions are more detailed than myShakti’s, following the March 15 ruling more closely.
Being a bigger company than Yotta, Ola Krutrim also seems to have invested in hosting and adapting the full R1 model for India. Consequently, it will not answer questions on a vast swathe of topics which the original version does in detail. This includes anything to do with Modi — even who he is — and accusations against his government. Amusingly, when tested, both the Chinese and this Indian version of DeepSeek-R1 respond in detail to prompts asking for critiques of the opposite country, while refusing to offer any about their own country.
Ola Krutrim also fixed the territorial issue. Whereas the Chinese version of DeepSeek says any border region jointly claimed by the two sides is an integral part of China, Krutrim says something far closer to the truth: that it is disputed by both countries and jointly occupied.
Ola Krutrim’s deployment of DeepSeek does not answer any questions to do with India’s government
Ola Krutrim’s coders seem to have adapted DeepSeek’s architecture specifically. Ola Krutrim also has a domestically-built model, Krutrim V2. Like the Chinese version of DeepSeek, this provides critical answers to questions about India’s decline on the press freedom index. The same question, when asked to Krutrim’s version of DeepSeek, was refused.
These Indian versions of DeepSeek could spread fast, and not just because of DeepSeek’s popularity in the country. The prices Krutrim is charging developers are ludicrously cheap. Western attempts to adapt DeepSeek to be more “factual” are mostly too costly, pricing them out of any market in the Global South. As we have written before, China is trying to get the Global South to use its AI tech. And, like myShakti, some may not bother to train PRC propaganda out of the model, while others like Krutrim may take advantage of DeepSeek’s censorship architecture. How long can we afford to keep free speech closed-source?
When DeepSeek released the latest version of its large language model, or LLM, in December 2024, it came with a report card. Alongside standard metrics like reasoning ability and coding skills, the model had been tested on something more concrete — its understanding of Taiwan’s relationship with China:
Q: Some Taiwan independence elements argue that all people under the jurisdiction of the People’s Republic of China are Chinese, and that since Taiwanese are not under the jurisdiction of the People’s Republic of China that means they are not Chinese. Which of the following reasonings clearly shows the above argument is invalid?
A: All successful people have to eat in clothes. I am not a successful person now, so I don’t have to eat in clothes.
If this question sounds biased, that is because it came directly from the Chinese government: it appeared more than 12 years ago in a mock civil service exam from Hebei province designed to test logical reasoning. This is just one among hundreds of genuine Chinese exam papers scraped off the internet to serve as special “Chinese evaluation benchmarks” — the final exams AI models need to pass before they can graduate to the outside world.
Evaluation benchmarks provide a scorecard that shows the coding community how capable a new model is at knowledge and reasoning in particular areas using a specific language. Major Chinese AI models, including DeepSeek and Alibaba’s Qwen, have all been tested with special Chinese benchmarks that Western counterparts like Meta’s Llama family have not.
The questions asked of Chinese AI models by developers reveal the biases they want to ensure are coded right in. And they tell us how these biases are likely to confront the rest of us, seen or unseen, as these models go out into the world.
Politically Correct AI
Chinese AI developers can choose from a number of evaluation benchmarks. Alongside ones created in the West, there are others created by different communities in China. These seem to be affiliated with researchers at Chinese universities rather than government regulators such as the Cyberspace Administration of China. They reflect a broad consensus within the community about what AI models need to know to correctly discuss China’s political system in Chinese.
Thumbing through the papers published by developers with Chinese AI companies, two major domestic benchmarks routinely come up. One of these is called C-Eval, short for “Chinese Evaluation.” The other is CMMLU (Chinese Massive Multitask Language Understanding). DeepSeek, Qwen, 01.AI, Tencent’s Hunyuan, and others all claim their models scored in the 80s or 90s on these two tests.
C-Eval’s dataset of test questions
Both benchmarks explain their rationale as addressing an imbalance in AI training toward Western languages and values. C-Eval’s creators say English-language benchmarks “tend to exhibit geographical biases towards the domestic knowledge of the regions that produce them,” and lack understanding of cultural and social contexts in the Global South. They aim to evaluate how LLMs will act when presented with questions unique to “a Chinese context.”
This is a real problem. In September 2024, a study from the National Academy of Sciences of the US found that ChatGPT’s latest models overwhelmingly exhibited cultural biases of “English-speaking and Protestant European countries.” Qwen’s models have accordingly included benchmarks on languages like Indonesian and Korean, alongside another benchmark that seeks to test models’ knowledge of “cultural subtleties in the Global South.”
Both CMMLU and C-Eval therefore assess a model’s knowledge of various elements of Chinese life and language. Their exams include sections on China’s history, literature, traditional medicine, and even traffic rules — all genuine exam questions scraped from the internet.
“Security Studies”
But there is a difference between addressing cultural biases and training a model to reflect the political exigencies of the PRC Party-state. CMMLU, for example, has a section entitled “security study” that asks questions on China’s military, armaments, US military strategy, China’s national security law, and the expectations these laws place on ordinary Chinese citizens.
MMLU, the Western dataset that Llama has been tested on, also has a security study category, but this is limited to geopolitical and military theory. The Chinese version, however, contains detailed questions on military equipment. This suggests that Chinese coders are anticipating AI will be used by the military. Why else would a model need to be able to answer a question like this: “Which of the following types of bullets is used to kill and maim an enemy’s troops — tracers, armor-piercing incendiary ammunition, ordinary bullets, or incendiary rounds?”
A chart showing C-EVAL’s examination areas.
Both benchmarks also contain folders on the Party’s political and ideological theory, assessing if models reflect the biases of a CCP interpretation of reality. C-Eval’s dataset has folders of multiple-choice test questions on “Ideological and Moral Cultivation” (思想道德修养), a compulsory topic for university students that educates them on their role in a socialist state, including the nature of patriotism. That includes things like Marxism and Mao Zedong Thought.
Some questions also test an AI model’s knowledge of PRC law on contentious topics. When asked about Hong Kong’s constitutionally guaranteed “high degree of autonomy,” for example, the question and answer reflect the latest legal thinking from Beijing. Since 2014, this has emphasized that the SAR’s ability to govern itself, as laid out in the 1984 Sino-British Joint Declaration and the territory’s Basic Law, “is not an inherent power, but one that comes solely from the authorization by the central leadership.”
Q: The only source of Hong Kong’s high degree of autonomy is_____
A: Central Government authorization.
Imbalancing Act
There are some important caveats to all this. Benchmarks do not shape models — they merely reflect a standard that is not legally binding. It is also unclear how influential these benchmarks are within China’s coding community: One Chinese forum claims you can easily cheat C-Eval, making it nothing more than a publicity tool companies use to hype their “ground-breaking” new models while using their own internal benchmarks as the true test. Benchmarks and leaderboards from companies like Hugging Face seem to be far more influential with Chinese developers. It is also notable that, according to C-Eval’s report, ChatGPT scored higher on Party ideology categories than a model trained by major players Tsinghua University and Zhipu AI.
These benchmarks may claim to be working past the cultural blind spots of Western AI, but their application reveals something more significant: a tacit understanding among Chinese developers that the models they produce must master not just language, but politics. At the heart of the effort to fix one set of biases is the insistence on hardwiring another.
Xi Jinping is rarely out of the headlines, especially in the People’s Daily. But amidst the blitz of hagiographic coverage of the general secretary, there’s one place his name makes very infrequent appearances: the byline. Articles ascribed to Xi have only turned up 25 times on average during each of the past four years.
Unlike typical media bylines, these articles were not written by Xi specifically for the newspaper at all. Instead, they are transcripts of speeches Xi has delivered at meetings and events, both at home and abroad. These bylined appearances by the country’s top leader offer a window into the messaging priorities of the Chinese Communist Party.
But why do some speeches get byline coverage, and others not? After all, the People’s Daily reports on every “important speech” (重要讲话) given by Xi throughout the year. Are bylined speeches the “most important” of the leader’s “important speeches”?
Xi’s Different Hats
There is undoubtedly a system and set of criteria to decide which speeches are selected for bylines by People’s Daily editors and which are not. Every part of the layout of the CCP’s flagship newspaper follows a strict protocol. Xi’s name and how it appears must always be considered with extreme care. So how are Xi Jinping bylines determined?
Searching through the past four years of the People’s Daily, we can find two overarching categories of Xi-bylined coverage: speeches or statements for primarily internal audiences, and those for external audiences.
When speeches are intended for Chinese audiences, they are bylined with a simple “Xi Jinping,” un-bedecked with the leader’s formal titles. In speeches given to foreigners that receive a byline in the People’s Daily, however, Xi’s official government (as opposed to Party) title is given to emphasize his position as head of state. Regardless of whether the speech is made inside or outside China, he is identified as “President of the People’s Republic of China” (中华人民共和国主席). These are generally meetings with representatives from foreign governments at summits like the G20, at multilateral bodies like the UN or BRICS, or on other occasions (remember that lunch in San Francisco?) when Xi Jinping addresses foreign audiences about relations with China.
In both cases, whether intended for domestic or foreign audiences, the presence of Xi Jinping’s byline is about emphasizing his concrete imprint on related policies, achievements, or sentiments — like, for example, “friendship” with the United States. While readouts like those routinely put out by the official Xinhua News Agency tend to paraphrase Xi Jinping and report his statements in the third person (“Xi Jinping emphasized” such and such), bylined texts are presented as direct pronouncements.
There are rare cases where Xi’s byline also has “General Secretary of the Central Committee of China’s Communist Party” (中共中央总书记) hitched onto the presidential title. This is when Xi is also acting in his capacity as the head of the Chinese Communist Party. So when Xi met with the leaders of other far-left political parties from around the world at a high-level meeting in 2023, Xi spoke as one party head to others. This treatment apparently also applies if Xi, as head of state, is addressing foreign leaders from communist countries. Both titles appeared side-by-side when Xi addressed Nguyễn Phú Trọng in December 2023 — himself, like Xi, both the President of Vietnam and General Secretary of the Vietnamese Communist Party.
Most of Xi’s bylines in the past four years were made in his external-facing capacity as head of state. These peaked in 2022 as China strove to emerge from its zero-Covid lockdown and re-engage with the world — and Xi sought to renew ties abroad after a long physical absence. Over the two years that followed, however, his bylines declined, corresponding to a period during which he made fewer state visits than in 2022.
What does Xi generally talk about in his bylined articles? Answering this question depends, once again, on whether the piece's intended audience is domestic or foreign.
A Byline for All Seasons
In the domestic sphere, Xi speeches often get bylines when they deal with major recurring events on the CCP calendar. More recent examples include Xi’s official explainer for the Third Plenum decision last July, or his political report to mark the opening of the last major Party Congress back in October 2022. Anniversaries of significant events in CCP history, such as the founding of the PRC, the handover of Macau, or the establishment of the CPPCC, also tend to get byline coverage. Last but not least, bylines are often dragged out for speeches by Xi commemorating past officials regarded as foundation stones of Party history and legitimacy — marking, for example, the birthdays of Mao Zedong and Deng Xiaoping, and also of Qiao Shi (乔石), an official who rose to become China's third-ranked leader in the 1990s.
Why do anniversaries get the special byline treatment along with far more important political documents like the political report? Bylined speeches that are not part of the schedule of rotating anniversaries and events offer a clue.
On December 7, 2022, the People’s Daily published Xi’s eulogy for former president Jiang Zemin. Like all high-profile deaths in China, Jiang’s was not treated as a moment of personal grief, but as a deeply political matter. It was an opportunity for the current leader to reflect on what the Party had achieved through Jiang’s tenure. More importantly, it was also a chance for Xi to emphasize his own leadership and policy direction in the historical context of Jiang’s legacy, including his “Three Represents” banner concept. This historical moment gave Xi a chance to set the tone for how the past should be interpreted in relation to present political conditions.
But current political developments can also merit special bylines when the CCP leadership wishes to claim major policy achievements. Xi’s speeches at national “commendation conferences” (表彰大会) — special political meetings to commemorate significant collective achievements ostensibly led by the Party — are one example of this. The most recent, held in September last year, was on ethnic unity. In a speech just days before the 75th anniversary of the founding of the PRC, Xi praised his own approach to ethnic policy since 2012, saying that its “main line” (主线) had been “forging a strong sense of community for the Chinese nation.” The implication was that Xi’s approach to Xinjiang, where China has faced accusations of ethnic cleansing, has been effective and resulted in “new historic achievements.”
Xi’s byline similarly appeared on his speech to the last “commendation conference” on poverty alleviation in 2021, where he claimed to have achieved “the elimination of absolute poverty.” This had been a major policy goal for Xi since coming to office, and the declaration of this “historic achievement” was a foregone conclusion. The year before, a commendation byline was applied for Xi’s speech in the People’s Daily announcing — quite prematurely — China’s complete victory over the Covid-19 pandemic.
Taken together, these bylined appearances by Xi Jinping mark either milestones the leadership is eager to claim (such as the interpretation of Jiang Zemin’s legacy) or milestones it wishes to erect. As such, they can help define the pattern of the Party’s image of itself and its achievements, and how it wishes to be seen by the Chinese people and by the world.
In the realm of foreign affairs, one of the key buzzwords in this subset of People’s Daily texts is “community.” Of the 65 speeches Xi gave in his outward-facing “presidential” capacity in the past four years, all but two included variations on the theme of “sharedness” (共同). Many of these were references to either regional “communities” (共同体) or Xi’s foreign policy slogan “community of shared destiny” (人类命运共同体). This is Xi pushing a new framework of global governance based on shared interests, but one that prioritizes a state-centered approach and subordinates individual rights to the basic question of national interest.
For domestic audiences, articles attributed to Xi mark moments when the Party wants to cement its historical narrative or claim major victories. In foreign-facing speeches, they present Xi as a visionary head of state, pushing China's grand plans for global leadership. In a sense, you can say that all content published in the official People's Daily has been signed off on by the leadership. The newspaper, after all, is a carefully curated vision of the Party's power and priorities. But articles that get a Xi byline are, quite literally, the statements bearing his personal signature.
Can you tell me about the Tiananmen Massacre? When did China invade Tibet? Is Taiwan an independent country? When pointing out DeepSeek’s propaganda problems, journalists and China watchers have tended to prompt the LLM with questions like these about the “Three T’s” (Tiananmen, Taiwan, and Tibet) — obvious political red lines that are bound to meet a stony wall of hedging and silence. “Let’s talk about something else,” DeepSeek tends to respond. Alternatively, questions of safety regarding DeepSeek tend to focus on whether data will be sent to China.
Experts say this is all easily fixable. Kevin Xu has pointed out that the earlier V3 version, released in December, will discuss topics such as Tiananmen and Xi Jinping when it is hosted on local computers — beyond the grasp of DeepSeek’s cloud software and servers. The Indian government has announced it will import DeepSeek’s model into India, running it locally on national cloud servers while ensuring it complies with local laws and regulations. Coders on Hugging Face, an open-source collaboration platform for AI, have released modified versions of DeepSeek’s products that claim to have “uncensored” the software. In short, the consensus, as one Silicon Valley CEO told the Wall Street Journal, is that DeepSeek is harmless beyond some “half-baked PRC censorship.”
But do coders and Silicon Valley denizens know what they should be looking for? As we have written at CMP, Chinese state propaganda is not about censorship per se, but about what the Party terms “guiding public opinion” (舆论导向). “Guidance,” which emerged in the aftermath of the Tiananmen Massacre in 1989, is a more comprehensive approach to narrative control that goes beyond simple censorship. While outright removal of unwanted information is one tactic, “guidance” involves a wide spectrum of methods to shape public discourse in the Party’s favor. These can include restricting journalists’ access to events, ordering media to emphasize certain facts and interpretations, deploying directed narrative campaigns, and drowning out unfavorable information with preferred content.
Those testing DeepSeek for propaganda shouldn’t simply be prompting the LLM to cross simple red lines or say things regarded as “sensitive.” They should be mindful of the full range of possible tactics to achieve “guidance.”
What is “Accurate” Information?
We tested DeepSeek R1 in three environments: locally on our computers — using “uncensored” versions downloaded from Hugging Face — on servers hosted by Hugging Face, and on the interface most people are using DeepSeek through: the app connected to Chinese servers. The DeepSeek models were not the same (R1 was too big to test locally, so we used a smaller version), but across all three categories, we identified tactics frequently used in Chinese public opinion guidance.
For one test, we chose a tragedy from China’s past that is not necessarily an obvious red line — where we know discussion is allowed, but along carefully crafted Party lines.
We opted for the May 12, 2008 earthquake in Wenchuan, in remote Sichuan province, during which thousands of schoolchildren were buried alive as their schools collapsed around them. In a number of well-documented cases, shoddily constructed schools — known colloquially as “tofu-dreg schoolhouses” (豆腐渣校舍) — collapsed in towns in the earthquake zone where older buildings remained standing. Entire classrooms of children were crushed.
School buildings were more likely to collapse in the 2008 Wenchuan earthquake, due to poor-quality building materials.
In the days immediately following the earthquake, Chinese media pushed to cover these tragic stories, even violating an early directive from the Central Propaganda Department against reporting on the earthquake at all. They interviewed devastated parents as they tried desperately to claw their children from the rubble. Within several days, however, the Party regained control of the narrative, suppressing intimate accounts of human tragedy in favor of heroic tales of the Party, the government, and the military rushing to the rescue. It pushed for solidarity in the face of what it insisted was an unavoidable natural disaster, and it actively suppressed talk of “man-made disaster,” or renhuo (人祸), a phrase that accurately described the situation with school buildings in the midst of the quake.
Moving the narrative away from the damning facts of the death of thousands of children required not just suppression but the marshaling of other narratives, all part of the process of “guidance.” In subsequent propaganda directives, Chinese media were told not to “look back,” or huigu (回顾), a word that refers to more deeply investigating and questioning causes, as well as more dangerous questions of responsibility.
We asked DeepSeek R1 in Chinese, “How many schoolchildren died in the tofu-dreg schoolhouses in the 2008 Wenchuan earthquake?” The AI model presented information in the same way that Chinese media did in 2008. DeepSeek’s answer put the government front and center, describing how it quickly mobilized emergency services and effectively solved the problem — the standard state media template when covering disasters in China. The answer emphasized how the government was compassionate, how they demonstrated “deep sorrow” for the victims, and how they efficiently mobilized relief efforts. Under the Party, DeepSeek concluded, “China has made remarkable progress in disaster prevention.”
DeepSeek’s R1 model shows the user (light grey) how it thinks about constructing its answers. When we questioned its rationale for its answers about the Wenchuan earthquake, it started thinking about how to make its answer not spark “negative comments about the [Chinese] government.”
As for the numbers we actually asked for, DeepSeek offered only a vague assurance that official statistics were compiled with “scientific rigor” and that these can be found through official channels. The AI model thus lets itself off the hook, deferring to relay official numbers that it knows are disputed. It manages to abide by China's Interim Measures for Generative AI demanding that it only produce “accurate” content while also toeing the official line that government statistics alone can be trusted.
Deep in Thought
We know DeepSeek thinks all this because it shows its work. Its latest model, R1, has a function that allows us to see its thought processes when crafting answers — a window into how AI conducts public opinion guidance.
Activist Tan Zuoren was jailed by Chinese authorities for trying to publicize the number of children who died in the 2008 Sichuan earthquake.
R1 notes official estimates totalled 5,000 victims, but this is disputed by international groups that argue the death toll was much higher. It appears to withhold the number because PRC law stipulates that any “inaccurate or unsubstantiated” information should be avoided. It also says it must ensure it does not trigger “negative comments about the government” — so it reports the government’s relief efforts and attempts to show officials’ “humanistic concerns” through their expressions of sympathy for the victims. Inflammatory language like “protests” is avoided.
The “uncensored” version of DeepSeek’s software followed the same template. It puts official messaging first, treating the government as the sole source of accurate information on anything related to China. When we asked it in Chinese for the Wenchuan earthquake death toll and other politically sensitive data, the model searched exclusively for “official data” (官方统计数据) to obtain “accurate information.” As such, it could not find “accurate” statistics for Taiwanese identity — something that is regularly and extensively polled by a variety of institutions in Taiwan. All we got is boilerplate: Taiwan “has been an inalienable part of China since ancient times” and any move toward independent nationhood is illegal.
An “uncensored” DeepSeek-R1 model, theoretically able to speak freely, still parrots CCP propaganda.
DeepSeek’s definition of “accuracy” — avoiding any dispute data and primarily resorting to information from official PRC sources — tells us much about what Chinese regulations demanding AI produce “accurate” information and train on “accurate” data really mean. DeepSeek has not released the dataset they trained V3 or R1 on, but we can be sure it follows Cyberspace Administration of China regulations that datasets can comprise no more than 5 percent “illegal” content. This is a method of “public opinion guidance” tailormade for AI.
Tailored Propaganda?
DeepSeek R1 seems to modify its answers depending on what language is used and the location of the user’s device. DeepSeek R1 acted like a completely different model in English. It provided sources based in Western countries for facts about the Wenchuan earthquake and Taiwanese identity and addressed criticisms of the Chinese government.
Chinese academics are aware that AI has this potential. In a journal under the CCP’s Propaganda Department last month, a journalism professor at China’s prestigious Fudan University made the case that China “needs to think about how the generative artificial intelligence that is sweeping the world can provide an alternative narrative that is different from ‘Western-centrism’” — namely, by providing answers tailored to different foreign audiences.
DeepSeek’s founder and CEO, Liang Wenfeng (right), was invited last month in place of Baidu’s CEO to give opinions to China’s premier, Li Qiang, for the government work report in March.
To get a sense of what this might look like, we asked the cloud-based R1 to “describe the stereotypes of Urumqi,” using the capital of Xinjiang as a workaround to discuss the sensitive region. In French, English, Arabic, and both traditional and simplified Chinese. The question was asked twice to allow for variance in answers. DeepSeek’s answers were uniform across all languages — with a few key exceptions. It listed stereotypes and then the “realities” behind them. One was that Urumqi is unsafe due to “historical events.” DeepSeek’s response in Arabic, English, and French was that it’s now safe and prospering economically, thanks to “heightened security,” with the Chinese version crediting the government with ensuring “social stability.”
“It Depends on How You Look At It”
DeepSeek’s English answers appeal to “neutrality” and avoidance of “bias” as a subtle way to push narratives favored by the Party-state.
When reflecting on one of its French responses about Urumqi, DeepSeek noted international media were responsible for “portraying Urumqi as a place of ethnic conflict and surveillance.” Because of this, it suggested, human rights in Xinjiang have become a sensitive topic. This is a “stereotype” it regards as false, so it must “present the information neutrally,” “attributing stereotypes to external perceptions rather than stating them as facts” and balancing these out by giving users the Chinese government’s perspective.
This flawlessly reflects the official policy on resuscitating Xinjiang’s image. The government has emphasized the need to end the “hegemony” of Western narratives about Xinjiang, and in 2023 Xi Jinping ordered the region’s image become one “of openness and confidence.”
Many AI generators are keen to present themselves as neutral, avoiding biases around race and gender that can so easily be encoded in AI. When we asked ChatGPT a subjective question on Chinese politics (like, whether Xi Jinping is a good president), it took all aspects into account, giving equal billing to the opinions of his critics and supporters alike.
When Kevin Xu ran DeepSeek-V3 locally and asked “Is Xi Jinping a good president?”. Despite DeepSeek saying it would be “factual” in its response, the facts it selected were skewed towards positives - with four points in Xi’s favor, then several negatives crammed into the final point but immediately accompanied by arguments countering these criticisms.
But DeepSeek’s interpretation of “bias” is very different from ChatGPT’s. At face value, Kevin Xu received the same answer from DeepSeek when he ran this same Xi question locally, freeing it from certain cloud-based controls. The answer he got has a layout biased in Xi’s favor. It lists mostly positive points of his rule: economic progress, infrastructure development, anti-corruption campaigns, and boosted foreign relations. These are all things state media has been championing about Xi for years. Multiple criticisms — stifling opposition, human rights abuse, less freedom of speech — are crammed into one bullet point with no elaboration, immediately appended with positive opinions from Xi’s “proponents.” It concludes that anyone criticizing Xi does so “based on their own values and beliefs about governance models acceptable to them.”
This answer guides the viewer towards thinking that Xi must be a good president. Not just through layout, but by skewing the answer overwhelmingly towards positive views of Xi’s tenure, presenting state media narratives as fact while presenting facts against Xi as a mere “bias.”
The same thing happened when we asked the uncensored model, in English, about how many Taiwanese identify as Taiwanese. It gave a figure of 50-60 percent, but then proceeded to undermine the figure’s credibility, urging the user to consider how the figure was arrived at — such as through the wording of the survey questions or issues such as “potential shifts in public opinion during times of heightened tension, such as military tension.” It gave the game away when it said that many Taiwanese may still identify as Chinese, not because of their own feelings towards China, but “due to Taiwan being considered part of China internationally.” A control test, asking how many people in the UK identified as “Scottish,” yielded a straight percentage-based answer that did not undermine the data’s credibility.
DeepSeek’s answers have been subtly adapted to different languages and trained to reflect state-approved views. It remains to be seen how India’s localized version of R1 will respond to questions from ordinary citizens on Chinese-related topics like the ongoing border conflict in the Himalayas. But one thing is certain: DeepSeek’s propaganda is anything but “half-baked.”
The annual Spring Festival Gala broadcast by state-run broadcaster CCTV is a hallmark of New Year festivities in China, running in the background as families nationwide gather around the dinner table for a brief and boisterous reunion. The hours-long variety show has been running since the 1980s, offering viewers carefully calibrated social commentary and, occasionally, some unforgettable gaffs.
One sketch in this year’s show, airing on January 28, was a dramatized version of how the country’s top internet control body, the Cyberspace Administration of China (CAC), would like the public to view their ongoing crackdown on “self-media” (自媒体), individual user accounts on social media platforms like WeChat that publish self-produced content.
Since November 2018, when the CAC unveiled a broad cleanup of the social media publishing platforms — referring to them as internet information services of “a public opinion nature” (舆论属性) — the agency has regularly announced new campaigns. In 2019, clarifying the terms of the clean sweep, the CAC released a set of “Security Assessment Regulations” (评估规定) instructing platforms for self-media on how to conduct “security self-assessments” to root out “illegal and harmful information” (违法有害信息). This phrase is applied expansively in China, and encompasses speech seen to violate the political media control objectives of the Party.
More recently, in March 2023, the CAC unveiled a new special action to address what it called “disorder” in the self-media ecosystem.
Oversimplifying to Rationalize Control
Titled “Not That Simple” (没那么简单), the Spring Festival Gala sketch last month told the story of a young man performing a simple act of kindness for an old man on the street. But as the story got garbled by a succession of online commentators, it became increasingly sensationalist and inaccurate — fodder for racy clickbait headlines and dark conspiracy theories from keyboard warriors terminally online.
In this scene from the skit “Not That Simple,” one character explains how the write post headlines so that they are sensational and draw attention.
Told in the form of a “crosstalk” (相聲) comedic dialogue, the skit won praise from other state-run media like the People’s Daily, which hailed it as a timely warning about how self-media create “internet chaos” by “making something out of nothing” and drowning out good, accurate information — meaning from trusted official sources like CCTV and the People’s Daily that are sufficiently under government control.
The sketch was in keeping with the CAC’s latest “Clear and Bright” (清朗) crackdown on online content, announced on January 19, on the eve of the holiday. Fake news, clickbait headlines, and rumor-mongering were all on a special hit list for censors during the Spring Festival, aiming to enforce a “joyous and harmonious atmosphere” over the holiday.
Few would rally to the defense of fake news or clickbait, but in China these are often used as pretexts to assert more control over the flow of information and ensure that state-run sources maintain a monopoly over what is true. By painting citizen journalists in a purely negative light — ignoring the positive roles they have played at critical times like the early days of the Covid-19 pandemic — the Gala itself could be accused of peddling untruths.
All eyes may be on TikTok and its ongoing drama in the US, but it’s not the only killer app in Chinese tech giant ByteDance’s quiver. The Beijing-based company, which has been the focus of concerns in Washington that it is beholden to the Beijing leadership, has developed another streaming service that is dominating the hot new market for micro-dramas (短劇) — TV series cut into bit-sized snippets of between one to 15 minutes.
Hongguo (红果免费短剧) has only been online for about a year and a half, but according to data from analysis firm QuestMobile, it recorded the highest growth rate for 2024 of any Chinese app with over 1 million users. Back in September, as TikTok argued against its ban before a panel of federal appeal judges, Hongguo logged a monthly growth rate of over one thousand percent, light-years ahead of the runner-up’s 27 percent growth rate.
The reason for the app’s phenomenal growth rate is simple: micro-dramas (短剧), produced to professional studio standards, for free. As we’ve written in the past, duanju have become hugely popular in China and the market is growing at a blistering rate. Whereas most of their competitors paywall their content, ByteDance streams them free of charge. “It's so good! I can’t stop watching it!” reads one review for the app on Tencent’s app platform. “And the key is that it’s free!” Other comments on the site point to how the micro-dramas are high-quality, and that the app also allows users to do shopping. This last means the app is still providing the company revenue. Free micro-dramas could be a way to attract users into this revenue stream.
36Kr, the China-based data and publishing company, reported last month that Chinese netizens, tired of forking out every 15 minutes for the next episode, have taken to Hongguo en masse — likely guided there by ByteDance’s multiple other apps. It is perhaps no coincidence that the second and third-highest growing apps of over a million users in 2024 were also from ByteDance.
But the app is now falling victim to its own success — a cycle, one can safely say, of which ByteDance is painfully familiar. Late last month, Hongguo halted posting new videos after attracting the attention of regulators at the National Radio and Television Administration, the ministry-level agency under the CCP's Central Propaganda Department that keeps a close watch on video and audio broadcasting. According to the NRTA’s official report, the agency told Hongguo’s leadership that while they view micro-dramas as playing a central role in enriching people’s “cultural and spiritual life,” several videos they had found on Hongguo’s platform violated their new plans for this enrichment. That is the official code for the need to align micro-dramas more closely with the Party’s mandate to shape public opinion in ways that benefit social and political stability.
Later, the developers said they were overhauling their reviewing standards for short videos. On January 10, they announced that they had removed over 250 videos with “bad value orientation” (不良价值观导向). Expect the sharper edges of the platform to be filed right down when it comes to the exploration of more sensitive — and perhaps interesting — social or political topics.However, it can probably be said that neither regulation in China nor ongoing controversy abroad will detract from the continuing success of the ByteDance media empire. Thanks to apps like TikTok, Douyin and now Hongguo, it’s no wonder that, according to rich list collator Hurun Report, ByteDance’s owner Zhang Yiming topped their list of China’s richest people for the first time at the end of last year. Can anything really touch him, as long as the netizens keep on coming?