A clever algorithm that has digested seven decades’ worth of articles in China’s state-run media is now ready to predict its future policies. The research design of this “crystal ball” can also be applied to tackling a variety of other problems.
Supervised learning — the most developed form of Machine Learning — involves learning a mapping from input data (such as emails) to output labels (whether they are “spams”) and, subsequently, applying the learned mapping to predict the labels for new data (i.e., new emails). A critical prerequisite to this approach, however, is a rich and representative set of training data, which are often hard to come by.
On the other hand, in the era of Big Data, there are ample data labels that are readily available but ostensibly unimportant for the problems we would like to tackle. But, are they really unimportant?
In a new research paper, “Reading China: Predicting Policy Change with Machine Learning,” we demonstrate that seemingly trivial labels can be used to uncover important underlying patterns. We build a neural network algorithm that “reads” the People’s Daily, China’s official newspaper, and classifies whether each article appears on the front page — an ostensibly trivial label. It turns out that such a simple algorithm can be used to detect changes in how the People’s Daily prioritizes issues, which, in turn, have profound implications for China’s government policies.
The algorithm tries to mimic the mind of an avid People’s Daily reader who reads its articles and tries to figure out how its editor places articles on different pages. Due to the official status of this newspaper, the way its editor selects articles for the front page reflect the newspaper’s issue priorities, which the avid reader will try to pick up. If the reader had read and thought through, say, five years’ worth of articles, they would have acquired a fairly good sense of what is in the editor’s mind and what kind of articles “should” or “should not” appear on the front page. But if the reader was then surprised by new articles in the following quarter — that is, their educated guess about the new articles turned out to work either particularly well or exceptionally poorly — it might constitute a signal of change from the reader’s perspective. While a small surprise may well be taken as noise, a strong signal would convince the reader that their existing understanding of the editor’s mind is no longer valid and that the priorities of the People’s Daily must have fundamentally changed.
Using the above reasoning, we construct a quarterly indicator, which we call the Policy Change Index (PCI) of China, that captures the amount of surprise to the algorithm in each quarter, compared to the paradigm the algorithm has acquired over the past five years’ data.
The namesake of the indicator comes from the fact that detecting changes in the newspaper’s priorities allows us to predict changes in the Chinese government’s policies. This is because the People’s Daily is at the nerve center of China’s propaganda system, an essential function of which is to mobilize resources to attain the government’s policy goals. Moreover, before major policy changes are made, the government often finds it necessary to justify to or convince the public that those changes are the right moves for the country. Hence, while the algorithm is detecting propaganda change in real time, the resulting index is really predicting policy changes for the future.
When put to the test against the ground truth — policy changes in China that did occur in the past — the PCI could have correctly predicted the beginning of the Great Leap Forward in 1958, that of the economic reform program in 1978, and, more recently, a reform speed-up in 1993 and a reform slow-down in 2005, among others. Furthermore, these events are widely recognized in the academic literature as among the most critical junctures in the history of China’s economy and reforms.
Our approach to learning underlying patterns from easily available labels has an obvious “context-free” feature; that is, the construction of the PCI does not rely on the researcher’s understanding of the Chinese context (it’s language, history, or politics). This feature opens the door to a variety of applications that have a structure similar to ours. Readers can find more details about China’s policy changes,  methodology, and its potential applications in this research paper or the website of the project. The source code of the project is also released on GitHub, so that the academic, business and policy communities can not only replicate the findings but also apply this method in other contexts.
(This article is co-authored by Julian TszKin Chan and Weifeng Zhong)
Julian TszKin Chan is a senior economist at Bates White Economic Consulting. Weifeng Zhong is a research fellow in economic policy studies at the American Enterprise Institute.
Weifeng Zhong will be speaking on this subject at Data Natives 2018 in Berlin. The views expressed here and in Weifeng Zhong’s speech are and will be solely those of the authors and do not represent the views of the American Enterprise Institute, Bates White Economic Consulting, or their other employees.