Whereas the US Library of Congress drastically scaled back its archive of Twitter posts at the end of 2017, the National Library of China has announced that it will archive posts on China’s Weibo microblogging site. On 22 April 2019 the South China Morning Post reported that
“More than 210 million news stories published on Sina.com, the news portal operated by the parent company of Weibo Sina Corp, together with 200 billion public posts on Weibo, will be archived under a non-profit project by the national library” with the aim of chronicling “ the evolution of civilisation in the internet era for the ‘long term development of information security and digitisation of the country’”.
As a former national library director and one-time chair of the Conference of Directors of National Libraries (CDNL), I was involved in discussions in the mid-1990s on the archiving of internet content by national libraries. Our discussions focussed on the potential use of legal deposit legislation, technology (how to trawl the web efficiently), and copyright. In principle, I applaud all national library initiatives to preserve their countries’ digital heritage. Indeed the decision by the Library of Congress to limit its archiving to selected tweets has been criticized as a tragic failure to preserve the public record. It is in the public interest that the more or less carefully considered tweets posted by Mr Trump and other public figures be collected for analysis by journalists, political scientists, and historians. A comprehensive database of tweets will be invaluable for media sociologists, political scientists – and security agencies – in analysing the use of Twitter to disseminate fake news and destabilize democratic institutions. And Twitterstorms, in which large numbers of ordinary citizens give vent to their feelings, must be of interest to students of crowd behavious and future social historians.
However, projects of this nature do raise some interesting issues. One is the issue of copyright. Another is that of privacy, explored in a 2012 paper by Smith, Henne and Von Voigt on big data privacy issues in public social media. I wonder, how many people who tweet spontaneously do so expecting that their messages will be preserved indefinitely and made available to parties not yet identified? In the past, people also sent private messages using postcards, which could be read by anyone through whose hands they passed. But today massive computing power and clever machine-learning algorithms technology make it possible for both good and bad actors to sift through the treasure trove of social media posts rapidly and accurately and learn a great deal more about us than we may think. There is no such thing as a neutral technology.
Before I forget: This is a good opportunity to recognize the [NAT-LIB] National Libraries News site, run by Genevieve Clavel and Stuart Hamilton, where I found the piece on the project which sparked this reflection.