Brief Thoughts and My Remarks from #EAW18

Last week I had the pleasure of attending the National Forum on Ethics and Archiving the Web, held in New York City. The event itself ran from Thursday-Saturday and featured a highly diverse and outstanding lineup of speakers from the worlds of libraries, archives, digital humanities, journalism, media studies, art, ethnic studies and more. I didn’t do the best job of taking my own notes but highlights included Safia Noble’s keynote on her work related to her recent book Algorithms of Oppression.

I primarily wanted to use this post to share the remarks I made as part of my Thursday panel, “Web Archiving as Civic Duty.” I was honored to share the stage with four other people (sadly, Stacy Wood’s travel was delayed and she just missed the panel due to a freak late season snowstorm) to speak about some of my recent work and thoughts on social media and digital preservation for government materials. It’s mostly unedited although I did go off-script a bit. Let me know your thoughts on the phrase “Tweets May Be Archived” and be sure to check out the paper, co-authored by Amelia Acker and myself, that inspired this talk here: https://doi.org/10.1002/pra2.2017.14505401001. Happy reading!


My inspiration for this talk comes from a phrase I’ve thought a lot about for the past year or so: “tweets may be archived.” When President Obama controlled the @POTUS twitter account, the bio included this line “Tweets may be archived.” Now, the @POTUS account specifies “Tweets archived” and links to a White House privacy policy at https://wh.gov/privacy indicating that they “may” collect mentions and other interactions with official government accounts, but provides little in the way of technical details.

I’m here to say that “tweets may be archived” is not good enough when it comes to preserving and providing access to social media data created by public sector organizations. These posts, and the interactions with them by citizens, bots, and other platform users are critical elements for maintaining understandability of these materials over time.

The public needs to understand how platforms like Twitter and Facebook shape the data they make available to all users, not just the federal government. This is perhaps even more urgent following this week’s reports about the ways in which Cambridge Analytica used the Facebook Graph API to mine and utilize vast amounts of user data. Transparency around APIs, interaction data, and preservation is also critically important for the federal government entities that produce federal records on these sites.

Many social media records created by elected officials and US federal government agencies are considered federal records in light of a 2013 NARA Bulletin on Social Media, as well as the Presidential Records Act. As such, these posts, tweets, and updates are required to be maintained over the long-term by the public sector. It is not enough to suggest that we may or may not be able to easily collect interaction data from social media platforms. The design and infrastructure of these platforms resist the types of digital preservation workflows developed around documents, images, scientific data, and other digital objects.

To put this in OAIS terms, if you don’t know what is in your SIP, how can you plan to describe, preserve, and provide access to the information it contains over time?

In some recent research I conducted with my colleague Amelia Acker, we found that the Facebook and Twitter data files from the Obama Administration’s digital transition team were rich and interesting but also confusing, sometimes incomplete, and not easily usable by researchers or other interested users. There was no contextual information corresponding to, say, the addition of new features to the platforms such as automatic retweets on Twitter or the introduction of Life Events on Facebook. For example, President Obama used the Life Event feature to mark the killing of Osama bin Laden soon after they were introduced to Facebook. Without contextual information, there’s no way to know why Life Events suddenly appear on the profile in 2011, or to understand what the Obama administration’s use of this feature suggests about their savviness when it came to social media.

Essentially, the data packages provided to the administration by Facebook, Twitter, and other platforms were the same as they would be for any user seeking to download their data, and we know they do not necessarily treat their users all that well! Who in the audience has downloaded their Facebook or Twitter data?

Furthermore, the interaction data, those retweets, comments and responses, were not included in the data package. Turns out that social media platforms treat federal records like those of any other user which is perhaps minimally acceptable but not good enough for preservation purposes. Instead of mentioning that engagement with federal entities on social media “may” be archived, social media preservation needs more transparency around the archival capabilities of platforms, changes to features and apps, and other metadata which will increase their legibility over time.

How can we hope to ever build systems that provide meaningful comparison across platforms if we don’t even have a good sense of what a profile data file from Facebook or Twitter even contains? Federal social media records document the behavior of the government online, and the engagements to these posts from citizens, people around the world, and bots represent a critical element of these records which provide a fuller picture of platform activity and has value. The digital preservation and curation community cannot rely on the private sector to produce records that are useful outside of the platforms for which they were designed because private companies have limited incentive to act this way. Additional approaches are needed to ensure that public sector information created on private platforms can remain accessible beyond the lifespan of any one platform. Today’s Twitter could be tomorrow’s MySpace, but federal records require thinking on a much longer timescale. “Tweets may be archived” is simply not good enough.

Fall Update: Teaching and Conferencing this Week

Greetings, dear readers. It’s been a while but I have been doing a lot of different things this summer! Now that the semester has started I can share the syllabus for the course I am teaching. It is my first time being 100% in charge of my own class and so far (two weeks in) I am really enjoying it. The course is INST643: Curation in Cultural Institutions. Here is a link to the syllabus. I put in a lot of work designing this course- let me know what you think in the comments!

In unrelated news, I will be travelling to Denver, CO this week for the 8th Plenary Meeting of the Research Data Alliance. This will be my first RDA meeting, and comes after I was awarded an RDA/US Data Share Fellowship this summer. For this fellowship, I am studying the use of controlled vocabularies in agricultural information access systems. I am super excited to see old colleagues and make new ones at this conference. Look for me in the poster hall Thursday, Friday, and Saturday.

Amsterdam and IDCC

Last week, I traveled to Amsterdam to attend and present at the International Digital Curation Conference. I wrote a post about the conference here on the Archives Lab site but I wanted to add a more personal touch here. Amsterdam was a beautiful city which I was happy to explore in between conference events.

Being me, I had to find an archive or library to slip into. I ended up popping in at the Staadsarchief, Amsterdam’s City Archives. It was a beautiful building which houses a few exhibition spaces as well as information about the UNESCO World Heritage sites in the area, including the entire city canal ring. The lower exhibition includes some of the city’s founding documents including the charter. It was a real treat!

Staadsarchief, Amsterdam, NL
Staadsarchief, Amsterdam, NL

As always, I was inspired by the conference and excited to attend IDCC again in the future. Thanks to everyone who stopped by my poster. Here’s a picture of it, via Twitter, and a link to it via the conference website.

Upcoming Conference: IDCC 16

I will be presenting a poster entitled “Agricultural Data Curation: Examples from a National Library” at the International Digital Curation Conference this week. This is the first time I will be publicly sharing the work I’ve been doing as part of my Post-doc, and I’m very excited! As you might be able to guess from the title, this poster presents initial results from my work with the Knowledge Services Division at the National Agricultural Library. We highlight the role collaboration plays in the four primary projects currently ongoing at the division.

Are you going to be in Amsterdam for IDCC? Let me know! I look forward to seeing old colleagues and meeting new ones.

Linux in the Wild

One of my first posts on this site was about Linux, and I always love seeing examples of how Open Source software powers so many computing devices which we interact with everyday. On a recent plane trip back from Seattle (where I attended ASIS&T. It was awesome!) I settled into my seat and prepared to watch a movie on the seatback screen when it suddenly went black. Confused, I looked up and noticed that the entire plane had lost their screens as well.  A few seconds later, much to my surprise, this appeared on screen as the software loaded:

Wild Linux

Can you see tux in the upper left corner? That’s right- Delta’s seatback screen are powered by Linux! After another period of intense scrolling text as the system rebooted, I was eventually greeted  by the clean welcome screen:

Delta Welcome

After this snafu, the system remained on for the rest of my flight, and I was able to watch Parks & Recreation while editing some files. I must have looked like a weirdo when I whipped out my phone to take pictures of the software loading on my seatback screen, but I always love witnessing moments like this. The experience of flying a commercial airline is designed to be sleek and streamlined, but remembering that much of this software runs on Linux was a refreshing reminder that the polished face of Delta runs on a complex infrastructure.

I’ve also been thinking about the recent dustup between Groupon and the GNOME project. Groupon used the trademarked name of the popular Open Source desktop environment as the name of their new point-of-sale system, filing trademarks that infringed upon those already in place for decades. I was surprised to see Groupon making this move when my assumption is that some portion of their developers and code is based on Linux. After some confusion, it looks like Groupon is pulling back and will change the name of their product. Score one for the OS lobby!

I wonder how many consumers know how important Linux is to so many aspects of our computational lives? How can we increase awareness of this software, and how would knowing more about Linux change conversations in society about the role and place of computing in everyday life?

 

Upcoming Conference- ASIS&T Annual Meeting

Next week I will be at the 77th Annual ASIS&T Annual Meeting in Seattle, WA. I am participating in the Doctoral Seminar and am excited to attend this conference where I presented a paper last year (Kriesberg et al. 2013). This year’s program is here, it looks like a great set of sessions.

If you are going to be in Seattle, leave a comment or drop me a line- see you there!

ASIS&T Annual Meeting 2014

Conference Presentation Next Week: ASIS&T 2013

Next week at the ASIS&T 2013 Conference in Montreal, QC, Canada, I will present a paper I co-authored with members of my research team on the DIPIR Project entitled “The Role of Data Reuse in the Apprenticeship Process.” The abstract is here and full a conference program is here.

I’ll be presenting at 8:00 on Monday morning! Hoping for a good turnout. I will post links to the paper and my slides here after the conference.