Documenting Social Media Datasets

Ed Summers
Published in
4 min readJul 16, 2018

--

Join us for an online conversation about documenting Twitter datasets, to encourage ethical practices for data publishing and reuse. If this sounds interesting, please read on for details…

From the earliest days of its existence Twitter has thought of itself as not just a company, but as a critical piece of infrastructure for the social web … or in the words of one of its creators and Twiter’s current CEO, Jack Dorsey

I see Twitter as a utility, a broadcasting system for the Internet.

Bilton (2013)

The idea of a social

For better and for worse (and everywhere in between) Twitter has become enmeshed in the modern newsroom (Hermida, 2010), so much so that some have suggested news organizations form a consortium and purchase the company outright because of the way it functions as a “wire service for the 21st century” (Web, 2017). While this doesn’t show any sign of happening soon, the question of whether social media companies like Twitter and Facebook should be regulated as media companies is at the top of everyone’s minds.

A significant part of the reason why Twitter is often perceived as a type of utility is because of its data API, which provides real time access to its fire hose of content. As other platforms such as Facebook and Instagram have started to reign in access to their APIs, largely in response to the Cambridge Analytica scandal, Twitter’s has remained open, at least for the time being. Perhaps Twitter’s lower number of users (think millions instead of billions) and the fact that it has cultivated an open API from the beginning have insulated it somewhat from pressures to close down access.

The same API affordances that allow Twitter to be used by journalists have also made it a very popular data source for academic researchers. A wide variety of tools and services exist for accessing Twitter data, and we’ve even started to see some patterns for the ethical sharing of datasets emerge here in the Documenting the Now project and elsewhere in data repositories around the world. But even with all this work we haven’t seen a great deal of conversation around the best ways to document social media datasets.

A few weeks ago, Justin Littman of the Social Feed Manager project kicked off a conversation (in Twitter naturally) about this very issue:

Several participants in that thread thought it would be useful to get together online to talk informally about ideas for addressing some possible patterns or best practices for documenting Twitter datasets (but possibly more). The basic agenda will be for participants to give quick overviews about their interests, and to see if any of our projects could align in useful ways.

Details

Some attendees have already indicated that they are coming but if you aren’t one of them and plan on attending please shoot an email over to edsu@umd.edu just so we have an idea how many people are coming. Otherwise, here are the details:

July 23, 2018
2–3 PM EDT
Webex Link

Reading

While it’s not required at all (seriously) here are some articles that may be of interest, if you want some background reading.

Driscoll, K. and Walker, S. (2014). Working within a black box: Transparency in the collection and production of big Twitter data. International Journal of Communication, 8:1745–1764.

Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C., and Borgman, C. L. (2011). Science friction: Data, metadata, and collaboration. Social Studies of Science, 41(5):667–690.

Fiesler, C and Proferes, N. (2018). “Participant” Perceptions of Twitter Research Ethics. Social Media + Society, 4(1).

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daum ́e, H., and Crawford, K. (2018). Datasheets for datasets. Technical Report 1803.09010, arXiv.

Hamer, A. Ethics of Archival Practice: New Considerations in the Digital Age. Archivaria, 85:156–179.

Mannheimer, S. and Hull, E. (2017). Sharing selves: Developing an ethical framework for curating social media data. International Journal of Digital Curation, 12(2):196–209.

Weller, K. and Kinder-Kurlanda, K. E. (2016). A manifesto for data sharing in social media research. In Proceedings of the 8th ACM Conference on Web Science, pages 166–172. ACM.

Please comment here if there are other articles or projects you think we should collectively look at.

References

Bilton, N. (2013). Hatching Twitter: A true story of money, power, friendship and betrayal. Portfolio, New York.

Hermida, A. (2010). Twittering the news: The emergence of ambient journalism. Journalism Practice, 4(3):297–308.

Web, A. (2017). Why news organizations should buy Twitter. Nieman Reports.

--

--

I’m a software developer at @umd_mith & study archives on/of the web at @iSchoolUMD