This post is the first in a series of reflections on my #citylis course INM348 ‘Digital Information Technologies and Architectures’.
It seems to have become something of a self-evident truth that our production and consumption of data is growing at an increasingly mind-boggling rate. The figures themselves appear to show what amounts to nearly unfathomable quantities. In one oft-quoted calculation, the computing firm IBM estimated in 2012 that 2.5 exabytes (or, in other words, 2.5 billion gigabytes) of digital data was generated every day.
In addition to this, the move online for a lot of everyday activities and transactions which took place previously in the analogue sphere – such as shopping, banking, communicating with friends, finding a date, and even praying – as well as the increasing number of networked personal devices owned by any one individual (which may even rise to an average of 6.58 by 2020 ), means that these numbers are set to grow ever higher by the end of the decade.
Given the twin benefits of convenience and speed which such digital interactions can offer, it is perhaps not surprising that a vast majority of us – or at least those of us who inhabit the more affluent portions of today’s connected, globalised world – have voluntarily chosen to convert much of our personal, social and cultural identities into the medium of binary code, to be distributed and shared via the ubiquitous computer networks that bind our world.
In some of his recent work, the philosopher Luciano Floridi gives an interesting observation which makes clear the relevance of the data deluge to Library and Information Science (LIS) and what might be called the conventional institutions of sociocultural memory (i.e. libraries, archives, record centres, etc.). “Every day,” writes Floridi, “enough new data are being generated to fill all US libraries eight times over” . Although the point he is making here is to highlight the scope of the data being generated, I find Floridi’s comment particularly useful as a starting point for my own reflections on the topic.
From one point of view, the task of collating, curating, cataloguing and preserving a large proportion of this data can and should fall to professionals in the LIS field. As the sector with historically the strongest interest, not to mention training, in storing and managing large quantities of recorded information, LIS practitioners and institutions are surely among the best placed to take on the role of making our society’s digital record accessible to future generations. Such is clearly the rationale behind projects such as the ambitious plan announced back in 2010 by the Library of Congress to archive the entirety of Twitter, for instance.
The sheer size of the quantity of data being generated, however, makes such a problem one which is beyond the scope of any particular institution or conglomerate of national public bodies. This is especially true given the inevitable bottlenecks to do with the size and cost of storage and information infrastructure, as well as the relatively limited life expectancy of current forms of digital memory hardware, at least when compared to more traditional hard copy formats (see here for a useful, albeit slightly out of date, infographic on this topic).
On the other hand, any sort of international collaboration to the degree which would be required is equally likely to be hampered by concerns about compliance, governmental policy, legal and moral ownership, and security issues. The mixed reactions to the recent hand-over of control over the internet’s domain name system (DNS) to ICANN, an international consortium, by the US government are surely rather telling in this respect.
To add to this, the need to discern which data are to be stored and made accessible (as well as how? and to whom?) only compounds the problem still further. Storing all of the records of every person on the globe is one thing; linking them together, putting them in order, cataloguing, classifying, and indexing them, all in a useful and usable manner, is quite another. And while those commentators who postulate a bright and benign future purpose for this data talk in terms of democratizing the records of society and providing a hitherto-unsurpassed resource for the historian of several generations from now, the digital record must itself be somehow refined and filtered in order for it to be properly understood in context, and to truly become knowledge .
Quite who is to be given this crucial role of refining the data, and the semi-editorial choices that follow concerning their relevance and place in the record, is just one of the possible points for concern which the more utopian dreams for big data seem strikingly reluctant to address. Whose data, after all, is going to be considered worth deleting, and why? And do we as individuals have any say over what of ours enters the preserved record of digital mankind?
I fear that in the world of the distributed cloud server, the equivalent of asking for one’s personal papers to be burnt after one’s death is no longer an available option, as it was in previous generations, for exerting control over the information one left behind. Then again, however, some of the most treasured documents in any archive collection are often those which we would not have had it not been for a former executor to a will not doing their job properly.
To sum up: the role of data in LIS raises several ethical questions, some of which I look forward to exploring further during my time at #citylis.
 Floridi, L., The 4th revolution: how the infosphere is reshaping human reality (Oxford: OUP, 2014), p. 13.
 For a useful introduction to the various conceptual models of the way in which data, once refined, can become information, then knowledge, then (perhaps?) wisdom, see D. Bawden and L. Robinson, Introduction to Information Science (London: Facet, 2012), pp. 73-5.