Reading from a distance

Apart from network analysis, studying the Internet involves a lot of text analysis. In my work, I have introduced the notion of social media flows to describe a new type of textual data which is characteristic of the age of networked digital media communication. I have also argued that new research methods are needed in order to grasp this type of data, and that new approaches must be developed by merging existing perspectives and by going beyond previously established methodological divisions.

In order to move in the suggested direction, I have developed the method of Connected Concept Analysis (CCA). This is an approach borrowing elements from a wide variety of previously developed methods: discourse analysis, content analysis, infometrics and computational linguistics. It attempts to synthesize elements from these approaches in a comprehensive framework for the adequate analysis of large online text datasets.


Today, many social actors (researchers, politicians, businesses, schools, and NGOs) demand workable, theoretically grounded ways in which they can understand and harness the dynamic social relations and knowledge content that characterize the emerging digital landscape, as formed and expressed by today’s networked publics. Strangely enough, such methods are still largely non-existent, in spite of the huge policy implications, economic effects and socio-cultural impact of these social currents. Simply put: We must find valid and reliable ways of tapping into, and understanding, these publics if we are to grasp 21st century society and culture.

I define social media flows as entangled streams of text content on the one hand, and relational (network) data on the other. These flows saturate today’s societies and are at the center of present day social practice and meaning making:

People socialize through sites like Facebook, build their business networks on platforms like LinkedIn, get their entertainment (commenting, ranking or transforming it) on sites like YouTube or services like Spotify or, and take part in discussion threads relating to news stories on the websites of major media outlets. People mobilize, as friends, fans or activists, through platforms like Twitter, and share and exchange their hobby projects through systems like Etsy or Pinterest, circulate their own content through channels like SoundClound (audio), Vimeo (video), Flickr (images), blogs (texts) or FourSquare (geolocation) — all of these platforms having built-in social networking features where self-publication, discussion, sharing, commenting, “liking”, “friending”, tagging, and bookmarking are key practices.

While nearly no one engages in all of these, and some in none of these, the activity is massive at the aggregated level and fewer and fewer social spheres, activities or sectors remain unaffected by the production and consumption which is channelled through these social media flows.

Together with Fredrik Palm, a colleague and developer in HUMlab, I have developed an analytic tool called Textometrica which aims to support an approach to language and discourse that retains the epistemology of cultural analysis while dealing with large online datasets in the form of social media flows. Textometrica relies on qualitative coding and discourse analytical thinking while  employing tools from infometrics and social network analysis. This enables what Franco Moretti calls a distant reading of large bodies of text, while still getting beyond the mere counting of words.


This video gives an introduction to Textometrica:




0 Kommentarer

Lämna en kommentar

Lämna ett svar

E-postadressen publiceras inte. Obligatoriska fält är märkta *