Automata in the library

More musings on the role of digital technology in the library, in response to my course at #citylis.

The following is my third and final reflection on my #citylis course on ‘Digital Information Technologies and Architectures’.

The world has turned and we find ourselves coming to the end of the first semester (already!) here at #citylis, and therefore also to the end of our course on digital technology and its impact on library and information science (LIS).

Since I posted my last reflection on this topic, the course material has gone on to cover the following areas: altmetrics, a branch of bibliometric analysis which measures the impact of research in non-traditional arenas, including on social media; coding, digitial text analysis and text mining, with a nod towards the related field of the digital humanities; and, finally, the rather wide spectrum of technological developments which come under the heading of artificial intelligence (AI).

When thinking recently about a way to sum up the last three weeks of the course, I was reminded of the model of the information communication chain, a concept initially proposed by Lyn Robinson in a paper from 2009 as being representative of the fundamental area of interest for reseachers in LIS.

screen-shot-2016-09-18-at-19-49-06
Information Communication Chain – © @lynrobinson

Above all, I started to think about the ways in which these technologies might inform our understanding of the various stages of the dissemination, sharing and management of information, and also in some instances its organisation and retrieval as well. It also came home to me just how similar they are in some ways, at least in terms of the implications they have for human agency and understanding. 

Altmetrics is a good place to start, mainly because it is a clear example of digital technology being used to aid what in some ways was (and still is) the task of LIS professionals, above all subject specialist librarians. Evaluating the significance of a particular document – a journal article, blog, dataset, etc. – in this case by measuring the amount of attention it is receiving on a number of communication channels (or so-called “impact flavours” such as blog posts, mainstream news, Wikipedia articles, policy documents, discussions on social media, and so on), allows for a new perspective on the spread of information, one which promises an alternative, more nuanced picture of scholarly communication and (re)use to that of more traditional metrics focused on citations.

Attention does not necessarily correlate with quality, of course; the figures by themselves have the potential to mislead, above all when it comes to analysing the distinct patterns of reception and discussion at work in the life-cycle of a particular research output. Nevertheless, the increasing volume of scholarly literature in circulation, its growing complexity and diversity (as made clear in a 2014 report by OCLC Research on The Evolving Scholarly Record), means that some sort of indicators as to currently trending topics and contributions to ongoing discussions can but be of help both to the acquisitions librarians tasked with collections development in a certain area and to those given custody of their institution’s digital repository.

What is more, the integration of altmetrics tools (such as Altmetric and Plum Analytics, to name but a few) with the APIs of web services like Twitter and Mendeley means that the task of gathering numbers of citations and sifting through the references to an article on social media to a great extent be automated.

In a similar way, the applications of coding and AI within LIS both allow for the automation of a number of routine tasks and the ability to attain to otherwise overlooked insights into aspects of a certain datasets (such as those recently made available by the British Library from its collections).

The two are also linked in uncanny ways. Python, one of the primary programming languages used in coding for the digital humanities, is also at the heart of several projects currently being undertaken in the field of machine learning. What is more, the techniques used in the digital humanities to perform text and data mining are likewise important parts of the related process of natural language processing (i.e. the transference of semantic, linguistically-based information contained in human speech into digital information via a series of statistical and probabilistic models). In both cases, the presence of large corpora of well-formed, ‘clean’ textual data is required.

I admit, it is sometimes difficult not to focus on the disruptive elements of such technologies. The potential use of neural networks in the automatic classification and indexing of documents, to take just one example, could have an enormous effect on LIS theory and practice when it comes to information organisation. What would prevent an AI cataloguer from deciding to classify a document in a completely different way to a human, based upon different logical or statistical criteria? Would the machine even require an information organisation system that corresponded to anything a human could feasibly understand? How would this change the process of information retrieval, for instance?

And what of other tasks in the library? To name just one recent example, the advent of an AI digital legal assistant, ROSS, currently being developed in Toronto and built on the model of IBM’s Watson, could have significant impacts on the job of legal reference librarians.

On the other hand, it would be unwise for LIS to stick its proverbial head in the sand when it comes to these technologies. As they become more and more a part of the information communication landscape and a feature of everyday reality in the way in which scholarly research and other knowledge is disseminated and conducted, libraries and other stakeholders with an interest in the documented record of humankind will naturally need to find a way of incorporating these technologies into the services they provide to their users. At the same time, however, they should still be worrying about the ethical implications these and other technologies may present.

Thanks to Lyn and David, at least our cohort at #citylis should be suitably prepared for that potentiality!

Not drowning, but waving? LIS and data

Musings on the relationship between LIS and big data.

This post is the first in a series of reflections on my #citylis course INM348 ‘Digital Information Technologies and Architectures’. 

It seems to have become something of a self-evident truth that our production and consumption of data is growing at an increasingly mind-boggling rate. The figures themselves appear to show what amounts to nearly unfathomable quantities. In one oft-quoted calculation, the computing firm IBM estimated in 2012 that 2.5 exabytes (or, in other words, 2.5 billion gigabytes) of digital data was generated every day.

In addition to this, the move online for a lot of everyday activities and transactions which took place previously in the analogue sphere – such as shopping, banking, communicating with friends, finding a date, and even praying – as well as the increasing number of networked personal devices owned by any one individual (which may even rise to an average of 6.58 by 2020 [1]), means that these numbers are set to grow ever higher by the end of the decade.

Given the twin benefits of convenience and speed which such digital interactions can offer, it is perhaps not surprising that a vast majority of us – or at least those of us who inhabit the more affluent portions of today’s connected, globalised world – have voluntarily chosen to convert much of our personal, social and cultural identities into the medium of binary code, to be distributed and shared via the ubiquitous computer networks that bind our world.

In some of his recent work, the philosopher Luciano Floridi gives an interesting observation which makes clear the relevance of the data deluge to Library and Information Science (LIS) and what might be called the conventional institutions of sociocultural memory (i.e. libraries, archives, record centres, etc.). “Every day,” writes Floridi, “enough new data are being generated to fill all US libraries eight times over” [2]. Although the point he is making here is to highlight the scope of the data being generated, I find Floridi’s comment particularly useful as a starting point for my own reflections on the topic.

From one point of view, the task of collating, curating, cataloguing and preserving a large proportion of this data can and should fall to professionals in the LIS field. As the sector with historically the strongest interest, not to mention training, in storing and managing large quantities of recorded information, LIS practitioners and institutions are surely among the best placed to take on the role of making our society’s digital record accessible to future generations. Such is clearly the rationale behind projects such as the ambitious plan announced back in 2010 by the Library of Congress to archive the entirety of Twitter, for instance.

The sheer size of the quantity of data being generated, however, makes such a problem one which is beyond the scope of any particular institution or conglomerate of national public bodies. This is especially true given the inevitable bottlenecks to do with the size and cost of storage and information infrastructure, as well as the relatively limited life expectancy of current forms of digital memory hardware, at least when compared to more traditional hard copy formats (see here for a useful, albeit slightly out of date, infographic on this topic).

On the other hand, any sort of international collaboration to the degree which would be required is equally likely to be hampered by concerns about compliance, governmental policy, legal and moral ownership, and security issues. The mixed reactions to the recent hand-over of control over the internet’s domain name system (DNS) to ICANN, an international consortium, by the US government are surely rather telling in this respect.

To add to this, the need to discern which data are to be stored and made accessible (as well as how? and to whom?) only compounds the problem still further. Storing all of the records of every person on the globe is one thing; linking them together, putting them in order, cataloguing, classifying, and indexing them, all in a useful and usable manner, is quite another. And while those commentators who postulate a bright and benign future purpose for this data talk in terms of democratizing the records of society  and providing a hitherto-unsurpassed resource for the historian of several generations from now, the digital record must itself be somehow refined and filtered in order for it to be properly understood in context, and to truly become knowledge [3].

Quite who is to be given this crucial role of refining the data, and the semi-editorial choices that follow concerning their relevance and place in the record, is just one of the possible points for concern which the more utopian dreams for big data seem strikingly reluctant to address. Whose data, after all, is going to be considered worth deleting, and why? And do we as individuals have any say over what of ours enters the preserved record of digital mankind?

I fear that in the world of the distributed cloud server, the equivalent of asking for one’s personal papers to be burnt after one’s death is no longer an available option, as it was in previous generations, for exerting control over the information one left behind. Then again, however, some of the most treasured documents in any archive collection are often those which we would not have had it not been for a former executor to a will not doing their job properly.

To sum up: the role of data in LIS raises several ethical questions, some of which I look forward to exploring further during my time at #citylis.

 

References

[1] Evans, D., ‘The Internet of Things: How the next evolution of the Internet is changing everything’, Cisco white paper (April, 2011)

[2] Floridi, L., The 4th revolution: how the infosphere is reshaping human reality (Oxford: OUP, 2014), p. 13.

[3] For a useful introduction to the various conceptual models of the way in which data, once refined, can become information, then knowledge, then (perhaps?) wisdom, see D. Bawden and L. Robinson, Introduction to Information Science (London: Facet, 2012), pp. 73-5.

New season, new blog

Hi! My name is David, and this blog is intended to be a way of sharing my thoughts about the world of Library and Information Science (LIS), since that is the subject in which I am about to begin a master’s degree at City University London, as part of the Library School #citylis. I am excited, mainly because this is a new and welcome opportunity for me, but also because the Department (headed by Dr. Lyn Robinson and Prof. David Bawden, both of whom have excellent blogs) is renowned for its focus on conceptual openness of the idea of the “library” and the possibilities – philosophical, technological, and ethical – that information science poses for 21st-century society.

One particular feature of the course is that it makes use of social media platforms (Twitter, blogging sites, and so on) in order to interact with, and gain feedback from, students taking the program. Having been used to very much more traditional teaching style during my time at university prior to this (I did a BA in English at Cambridge, followed by an MPhil in Medieval Studies at Oxford, if you really want to know), this should all be very new, enlightening, interesting, and ultimately very useful indeed.

There’s just one problem, and that’s this:

I am a terrible blogger.

That’s right, you heard me. I am simply terrible. Being excruciatingly awkward in most face-to-face social situations, the idea that my half-baked musings about life and other topics of interest seemingly to me alone might be read over the Internet by just about anyone in the world fills me with a certain non-negligible amount of terror. While admittedly some may seem the anonymity of the online op-ed (or indeed nasty Twitter comment) as liberating, I still feel deeply embarrassed whenever I try to write blog posts, as I were composing dead letters to a recently departed girlfriend only for them to be read by a passer-by (who, incidentally, can also instantly find out my identity).

Compounded to this are problems of structure and tone. For one thing, I’ve no idea how to begin! Or how to come up with postings that a). aren’t beyond dull, yet b). appear with sufficient regularity to satisfy my followers (should I ever have any)! Perhaps all first-time bloggers feel this way, and some of them may or may not be just as vocal about it (I haven’t checked). The typical blog post seems, after all, to straddle several of the territories occupied by previous forms of writing: a potent admixture of memoir, editorial, review, letter, autobiography, diary entry and self-advertisement is often the result, stridently public yet at the same oddly private, too.

And perhaps this formal confusion is the source of my anxiety. All previous blogs I have tried to write (all of which, thankfully, now consigned to the recycle bin of Internet history) were far too personal in many ways. With any luck, the added impetus of having to write in order to fulfill the requirements of my course at City will help me get over some of the difficulties I have in writing in this medium. All I can say for now is, we can but hope; at least I’ve managed to write this post!