Brief Thoughts and My Remarks from #EAW18

Last week I had the pleasure of attending the National Forum on Ethics and Archiving the Web, held in New York City. The event itself ran from Thursday-Saturday and featured a highly diverse and outstanding lineup of speakers from the worlds of libraries, archives, digital humanities, journalism, media studies, art, ethnic studies and more. I didn’t do the best job of taking my own notes but highlights included Safia Noble’s keynote on her work related to her recent book Algorithms of Oppression.

I primarily wanted to use this post to share the remarks I made as part of my Thursday panel, “Web Archiving as Civic Duty.” I was honored to share the stage with four other people (sadly, Stacy Wood’s travel was delayed and she just missed the panel due to a freak late season snowstorm) to speak about some of my recent work and thoughts on social media and digital preservation for government materials. It’s mostly unedited although I did go off-script a bit. Let me know your thoughts on the phrase “Tweets May Be Archived” and be sure to check out the paper, co-authored by Amelia Acker and myself, that inspired this talk here: https://doi.org/10.1002/pra2.2017.14505401001. Happy reading!


My inspiration for this talk comes from a phrase I’ve thought a lot about for the past year or so: “tweets may be archived.” When President Obama controlled the @POTUS twitter account, the bio included this line “Tweets may be archived.” Now, the @POTUS account specifies “Tweets archived” and links to a White House privacy policy at https://wh.gov/privacy indicating that they “may” collect mentions and other interactions with official government accounts, but provides little in the way of technical details.

I’m here to say that “tweets may be archived” is not good enough when it comes to preserving and providing access to social media data created by public sector organizations. These posts, and the interactions with them by citizens, bots, and other platform users are critical elements for maintaining understandability of these materials over time.

The public needs to understand how platforms like Twitter and Facebook shape the data they make available to all users, not just the federal government. This is perhaps even more urgent following this week’s reports about the ways in which Cambridge Analytica used the Facebook Graph API to mine and utilize vast amounts of user data. Transparency around APIs, interaction data, and preservation is also critically important for the federal government entities that produce federal records on these sites.

Many social media records created by elected officials and US federal government agencies are considered federal records in light of a 2013 NARA Bulletin on Social Media, as well as the Presidential Records Act. As such, these posts, tweets, and updates are required to be maintained over the long-term by the public sector. It is not enough to suggest that we may or may not be able to easily collect interaction data from social media platforms. The design and infrastructure of these platforms resist the types of digital preservation workflows developed around documents, images, scientific data, and other digital objects.

To put this in OAIS terms, if you don’t know what is in your SIP, how can you plan to describe, preserve, and provide access to the information it contains over time?

In some recent research I conducted with my colleague Amelia Acker, we found that the Facebook and Twitter data files from the Obama Administration’s digital transition team were rich and interesting but also confusing, sometimes incomplete, and not easily usable by researchers or other interested users. There was no contextual information corresponding to, say, the addition of new features to the platforms such as automatic retweets on Twitter or the introduction of Life Events on Facebook. For example, President Obama used the Life Event feature to mark the killing of Osama bin Laden soon after they were introduced to Facebook. Without contextual information, there’s no way to know why Life Events suddenly appear on the profile in 2011, or to understand what the Obama administration’s use of this feature suggests about their savviness when it came to social media.

Essentially, the data packages provided to the administration by Facebook, Twitter, and other platforms were the same as they would be for any user seeking to download their data, and we know they do not necessarily treat their users all that well! Who in the audience has downloaded their Facebook or Twitter data?

Furthermore, the interaction data, those retweets, comments and responses, were not included in the data package. Turns out that social media platforms treat federal records like those of any other user which is perhaps minimally acceptable but not good enough for preservation purposes. Instead of mentioning that engagement with federal entities on social media “may” be archived, social media preservation needs more transparency around the archival capabilities of platforms, changes to features and apps, and other metadata which will increase their legibility over time.

How can we hope to ever build systems that provide meaningful comparison across platforms if we don’t even have a good sense of what a profile data file from Facebook or Twitter even contains? Federal social media records document the behavior of the government online, and the engagements to these posts from citizens, people around the world, and bots represent a critical element of these records which provide a fuller picture of platform activity and has value. The digital preservation and curation community cannot rely on the private sector to produce records that are useful outside of the platforms for which they were designed because private companies have limited incentive to act this way. Additional approaches are needed to ensure that public sector information created on private platforms can remain accessible beyond the lifespan of any one platform. Today’s Twitter could be tomorrow’s MySpace, but federal records require thinking on a much longer timescale. “Tweets may be archived” is simply not good enough.