Timeline of posts for Spellbound Blog.
Created by jkramersmyth on Jul 20, 2008
Last updated: 10/23/10 at 07:51 PM
Spellbound Blog has no followers yet. Be the first one to follow.
Water in Benal (1944)
In honor of this year’s Blog Action Day theme of Water, I wanted to share some stunning images from the Flickr Commons. The images I have selected, contributed by cultural heritage institutions from around the world, show methods of transportation or acquisition of water. I will let the images speak for themselves below, but next time you go to turn on the tap water in your home – think of all of those for whom getting water is a huge challenge each and every day. While most of the images below are from decades ago, easy access to safe, clean water is still a current issue. Please consider supporting an organization like Charity: Water, a non-profit organization bringing clean and safe drinking water to people in developing nations. 100% of public donations directly fund water projects.
And now.. the photos!
Egypt (1900): Arab water-carrier girls
1910: Drinking Water from Street Pump, NY
1900, Egypt Water Carriers
1913, Catskill Aqueduct
1918, Central France, Filling pot with water from a cart
1890: Native Girls in Holland
1910: Ways of using a divining rod
1974: Alice Thompson, Besoco, West Virginia, Is Shown with Milk Bottles Her Neighbors Furnish Her Water with after Her Water Lines Were Cut Off. She Is Divorced From a Coal Miner Who Was Imprisoned for Killing a Man
1940: Faro Caudill drawing water from his well, Pie Town, New Mexico
This post is from from: Spellbound Blog.Blog Action Day: Flickr Commons Images of Acquiring Water
http://feedproxy.google.com/~r/Spellboundblog/~3/Lf5TUyqaH_s/
In honor of the Army of Women Day, my post today takes a quick look at how the American public has been delivered various messages about cancer via posters and PSAs.
These two 1930s posters from the Library of Congress focus their message on convincing women to seek treatment from their doctor quickly and not fight their cancer alone.
By the 70s we got PSAs from organizations like the American Cancer Society, focusing on not smoking, doing self-exams and seeing your doctor for ‘regular cancer check-ups’. The clip below features Farrah Fawcett in 1981 (25 years before her own cancer diagnosis):
Almost 30 years later we have a new kind of video appeal. The Army of Women, a program of the Dr. Susan Love Research Foundation, funded by a grant from the Avon Foundation for Women, is recruiting 1,000,000 women (and men!) of all ages and ethnicities to participate in studies to find the cause of breast cancer. Their PSA below recasts the challenge. Now, instead of living a healthy lifestyle and then seeking out doctors for diagnosis and treatment – we are asked to join forces with others to support doctors in their research the cause of breast cancer.
I lost my aunt to breast cancer. I have more friends and family who have fought breast cancer than I can count on one hand. I joined the Army of Women over a year ago.
What can you do?
If you are over 18, sign up to join the Army of Women database. The first step is to add your name to the pool of individuals willing to be contacted to hear about research projects in the future. It is free. You are not agreeing to participate in any specific project, just adding yourself to the list so researchers can find the subjects they need as fast as possible.
Invite your friends and family to join.
Help us reach a day when the only way that a woman can learn about what it was like to have breast cancer is from memoirs, documentaries and tear-jerker movies. I want to put cancer in the archives (forgive me.. couldn’t resist it!).
This post is from from: Spellbound Blog.Breast Cancer: Join the Army of Women & Help Scientists Find the Cause
http://feedproxy.google.com/~r/Spellboundblog/~3/DWJzEj92I88/
Just a quick reminder that I will be presenting tomorrow morning at SAA2010 on the topic of search engine optimization and archives websites. I am part of session 502 officially titled Not on Google? It Doesn’t Exist: Findability and Search Engine Optimization for Archives. My specific portion of the presentation is titled ‘Building Archives Websites That Google Will Love’ and will be a general introduction to SEO concepts and why they are important to those involved in the creation of websites for archives and other cultural heritage institutions. It will include some basic tips and techniques.
My two co-presenters, Matt Herbison and Mark Matienzo, will discuss more in depth issues related to website architecture, URLs and increasing links back into your website. We hope you can join us, even though our session is during the less than pleasant 8am Saturday morning time slot. I will be posting my slides after our session and linking to them from my presentations page. I plan to pick up some donuts to sweeten the deal!
This post is from from: Spellbound Blog.SAA2010: SEO and Archives Websites
http://feedproxy.google.com/~r/Spellboundblog/~3/Kr_NFi1zrYw/
I got a kind email today asking “Whither ArchivesZ?”. My reply was: “it is sleeping” (projects do need their rest) and “I just started a new job” (I am now a Metadata and Taxonomy Consultant at The World Bank) and “I need to find enthusiastic people to help me”. That final point brings me to this post.
I find myself in the odd position of having finished my Master’s Degree and not wanting to sign on for the long haul of a PhD. So I have a big project that was born in academia, initially as a joint class project and more recently as independent research with a grant-funded programmer, but I am no longer in academia.
What happens to projects like ArchivesZ? Is there an evolutionary path towards it being a collaborative project among dispersed enthusiastic individuals? Or am I more likely to succeed by recruiting current graduate students at my former (and still nearby) institution? I have discussed this one-on-one with a number of individuals, but I haven’t thrown open the gates for those who follow me here online.
For those of you who have been waiting patiently, the ArchivesZ version 2 prototype is avaiable online. I can’t promise it will stay online for long – it is definitely brittle for reasons I haven’t totally identified. A few things to be aware of:
when you load the main page, you should see tags listed at the bottom – if you don’t at all, then drop me an email via my contact form and I will try and get Tomcat and Solr back up. If you have a small screen – you may need to view your browser full screen to get to all the parts of the UI.
I know there are lots of bugs of various sizes. Some paths through the app work – some don’t. Some screens are just placeholders. Feel free to poke around and try things – you can’t break it for anyone else!
I think there are a few key challenges to building what I would think of as the first ‘full’ version of ArchivesZ – listed here in no particular order:
In the process of creating version 2, I was too ambitious. The current version of ArchivesZ has lots of issues, some usability – some bugs (see prototype above!)
Wherever a collaborative workspace of ArchivesZ were going to live, it would need large data sets. I did a lot of work on data from eleven institutions in the spring of 2009, so there is a lot of data available – but it is still a challenge.
A lot of my future ideas for ArchivesZ are trapped in my head. The good news is that I am honestly open to others’ ideas for where to take it in the future.
How do we build a community around the creation of ArchivesZ?
I still feel that there is a lot to be gained by building a centralized visualization tool/service through which researchers and archivists could explore and discover archival materials. I even think there is promise to a freestanding tool that supports exploration of materials within a single institution. I can’t build it alone. This is a good thing – it will be a much better in the end with the input, energy and knowledge of others. I am good at ideas and good at playing the devil’s advocate. I have lots of strength on the data side of things and visualization has been a passion of mine for years. I need smart people with new ideas, strong tech skills (or a desire to learn) and people who can figure out how to organize the herd of cats I hope to recruit.
So – what can you do to help ArchivesZ? Do you have mad Action Script 3 skills? Do you want to dig into the scary little ruby script that populates the database? Maybe you prefer to organize and coordinate? You have always wanted to figure out how a project like this could group from a happy (or awkward?) prototype into a real service that people depend on?
Do you have a vision for how to tackle this as a project? Open source? Grant funded? Something else clever?
Know any graduate students looking for good research topics? There are juicy bits here for those interested in data, classification, visualization and cross-repository search.
I will be at SAA in DC in August chairing a panel on search engine optimization of archival websites. If there is even just one of you out there who is interested, I would cheerfully organize an ArchivesZ summit of some sort in which I could show folks the good, bad and ugly of the prototype as it stands. Let me know in the comments below.
Won’t be at SAA but want to help? Chime in here too. I am happy to set up some shared desktop tours of whatever you would like to see.
PS: Yes, I do have all the version 2 code – and what is online at the Google Code ArchivesZ page is not up to date. Updating the ArchivesZ website and uploading the current code is on my to do list!
This post is from from: Spellbound Blog.ArchivesZ Needs You!
http://feedproxy.google.com/~r/Spellboundblog/~3/woBtTfNCYpQ/
In my presentation at the Spring 2010 Mid-Atlantic Regional Archives Conference (MARAC), Whirlwind Tour of Visualization-Land, I showed some screenshots of a tool called Gridworks. At the time, Gridworks was not available to the general public. The good news is that earlier this month Gridworks 1.0 was officially released and you can get Gridworks right now.
For those of you who didn’t see my presentation, Gridworks is tool you run locally on your computer via a web browser. It permits you to load ‘grid-shaped data’ for examination, filtering and data cleanup. That makes is sound so much less exciting than it is. The best way to get a sense of what you can do is to watch the Gridworks Videos.
What sort of data do I think there is in archives to be pumped into Gridworks? How about collection descriptive data and electronic record datasets? Since all the data is kept locally, you don’t need to worry about uploading your data to some anonymous server in order to work with it. It all stays safely on your local computer the whole time.
A quick list of things that Gridworks can do:
Cluster data to find values that are almost the same so you can normalize your data (for example – NYC vs N.Y.C.)
Create instant facetted browsing based on any column in your data
Provide scatterplots of the values from any two numeric columns as well as a way to spot the most interesting combinations across many possible columns
Reconcilliation and validation of values based on data from within Freebase.com
Pull data from Freebase.com based on a matched column – such as the population of a country, if you have a column in your dataset with country specified
Splitting data within a cell based on a specified delimiter
Application of regular expressions and other simple code to data to create new columns
This list just scratches the surface, but it should give you a decent idea of the power of Gridworks. Even if the only feature you ever use is the one which lets you cluster and update your data to remove the ‘almost the same’ values, Gridworks can save you hours of painstaking data cleanup.
Why is data cleanup exciting? Because once you have nice clean data with all the attributes that are usefull to have for your data set – then you can start playing with the data in visualization tools! So go watch some Gridworks Videos, get Gridworks for yourself and start playing with data. It is free and it makes working with data fun!
This post is from from: Spellbound Blog.Gridworks: Super Data Cleanup and Exploration Tool
http://feedproxy.google.com/~r/Spellboundblog/~3/soOiZAY5I8o/
The official title for this session is “Discovery Tools for Archival Collections: Getting the Most Out of Your Metadata” and was divided into two presentations with introduction and question moderation by Jaime L. Margalotti, senior assistant librarian in Special Collections at the University of Delaware.
Introduction to Metadata Standards
Michael Bolam, metadata librarian for digital production, is in charge of all the metadata for all the collections at the Digital Research Library at the University of Pittsburgh. He is not an archivist – but does know where the archives is at Pitt! He has put lots of archival material online through digitization and assignment of metadata.
The best definition he has found of metadata, good for all audiences: “Metadata consists of statements we make about resources to help us find, identify, use, manage, evaluate and preserve them” Marty Kurth – Head of Metadata Services, Cornell University Libraries
Reviewed examples of metadata for images, text documents and archival collections. There is also data related to the business of scanning and making content available – administrative/behind the scene. Standards let you take your data and use it for other purposes.
Overview of alphabet soup of metadata standards:
MARC: bibliographic information in machine-readable form (a MAchine-Readable Cataloging record).
Dublin Core: the goal of Dublin Core was to create a core set of metadata fields that could be used across platforms, across various disciplines.
MARCXML: schema for representing MARC in XML. Makes it easy to convert to and from MARC without loosing any data. May have more data than you need. MARCXML is not very ‘human readable’. You need to recall all the code numbers for the different data elements. Can be exported from Archivist Toolkit.
MODS: Metadata Object Description Schema – sort of a ‘MARCXML light’. Tries to be a step between MARCXML (robust & complicated) and Dublin Core (really simple). May result in compacting multiple MARCXML fields into single MODS fields. May loose some of the granularity of the data. The tags ARE human readable. The tag is the word ‘author’ – not a number. Also can be exported in Archivists Toolkit.
ONIX: ONline Information eXchange – standard used by the book publishing industry. XML-based standard for making available intellectual property in published form, both physical & digital. Data created by the publisher. They use different ways of representing authors, keywords..etc in comparison to LOC and library cataloging.
METS: Metadata Encoding & Transmission Standard. XML standard wrapper for describing divergent types of content within a digital library. The metadata for books, images, collections etc keep this data in different formats – METS lets you bring them together.
OAI-PMH: Not a metadata standard – but rather a protocol for sharing metadata. Gives us a way to pull baseline information about a digital object out of a database and put it out somewhere where it can be harvested and used.
Examples of projects built on shared metadata:
Worldcat.org: Has everything that is shared with OCLC. They do expose their records to google and yahoo harvesting.
OAIster: Searches a harvested data set – it is not going live out on the web. The OAIster records are also available in Worldcat. Example: search for Pittsburgh City Photographer (that is a provider of data). Most digitization software will generate an OAIster harvestable version. In his example we see that address and location get compressed into Notes. This is because there is not always a place in Dublin Core that maps to the level of detail you collect at your local institution. http://www.oclc.org/us/en/oaister/default.htm – has the info about contributing your content for crawling.
Archive Grid: The goal is to pull in finding aids from many sources. It is a service – requires some sort of subscription and payment to see the data. Uses Lucene for searching. The content in Archive Grid is now available in Worldcat. To participate – see http://www.oclc.org/us/en/archivegrid/default.htm
Google and Yahoo do index OAIster and WorldCat, so that is one path to being found in search engines.
MARC Records for Archival Materials in WorldCat Local
Jennifer MacDonald from the University of Delaware presented a cataloger’s perspective of a WorldCat Local environment. She is a “concerned enthusiast” with regard to metadata. The University of Delaware was the first institution to buy WorldCat Local. She ended up on the WorldCat Local Special collections and Archives Task Force. The task force made their final report in 2008 and got a response from OCLC in 2009. They did get some immediate changes based on their feedback – like moving the 520 “summary” data element higher in the display. For some problems the task force identified, such as Archival Materials that were not being identified properly (Internet Resource is the type for all OAI records), it is hard to tell if the issue has been fixed.
She showed some screenshots from WorldCat local to show what data elements are there and how they are organized. In the FirstSearch screenshot (only available at the school), Notes and General Info holds a mishmash of content from various data elements consolidated into single fields. The task force asked for the “Browse” feature but apparently this feature is dead. They got no response from OCLC to this request in their report.
If you use the University of Delaware instance of WorldCat Local to search for walter penn shipley and drill down to the detail record display for the Walter Penn Shipley Papers you will see what was shown during the session. This display is customizable at the institution level in WorldCat Local. Some data is shown. You see lots of Web 2.0 options to add your own data, but the display is missing some of the data from the original MARC record. The full MARC record is indexed for keyword search, but since some of it is not displayed, users may not be able to determine why a record was returned.
Fields missing from the WorldCat Local display:
351 – Organization and Arrangement of Materials
545 – biographical note
506 – restrictions on access
540 – Use of materials – with link to an askspec page: http://www.lib.udel.edu/cgi-bin/askspec.cgi
525 – preferred citation form – and this is where the manuscript number is
655 – some of the parts of the genre terms are missing
656 – occupation
OCLC says that they have not included all this because people don’t want this displayed. Given that local organization is already deciding what to show, the task force would prefer the option to displayable all data elements. Due to this missing data, Jennifer prefers the FirstSearch interface – but this option is not always available at all institutions. You should take advantage of the Web 2.0 features. Archivist can create an account on WorldCat Local and add data elements.
Questions and Answers
QUESTION: You talk about having the metadta in a format that is accessible to harvesting. What I have is a bunch of CDs with images on them that have a folder and descriptor structure. Is there a metadata harvester that can go in and pull that metadata out? New York Stock Exchange photographer sent these.
ANSWER (Michael): So the metadata you are looking to extract is the filename and descriptors? You could have someone write a little script and extract what you need. I would hand it to the guy I work with because he writes perl. If then you made that available via your website – then people could find it. To get it into a database – it is just a small script.
QUESTION: Are there any specifically useful webinars/seminars for becoming familiar with these formats for skillbuilding?
ANSWER (Michael): Tons on the web. The LoC websites are very useful. You may have heard the term ‘crosswalking’ – that is where you take one format and turn it into another. Looking at the crosswalks can make it much easier to understand how a format you understand maps to one you are trying to learn about. Shareable Metadata – metadata for you and me. Not online yet – but someone in the audience said the plan is to post the materials. There have been a couple of books and ALA publications. Most of the ones I know of are about 10 years old. Jaime: SAA has a good workshop series.
QUESTION: One of the first things you said was to take data out of EAD and you didn’t go into detail in that. Were you talking about DAO tagged items?
ANSWER (Michael): I was just talking about reusing data in a new environment. For example, we just started digitizing manuscripts and each item is becoming an individual digital object. The only metadata we have is in the EAD finding aid – so we are using that data to make descriptive data about the digital objects. We are going to create a MODS or METS record for every digital object. Jaime: We use EAD to make MODS records. She has been manually extracting EAD data as Dublin Core data for ContentDM.
My QUESTION: What format does OAIster want?
ANSWER (Michael): OAIster is just harvesting Dublin Core. You can share MODS and other metadata types and you may find other aggregators that are expecting their users to work in a more detailed environment. You may publish more data elements for other harvesters as well – but OAIster will only pull the Dublin Core data elements.
QUESTION: We are working on a digitization project to digitize local historical societies, museums and libraries. Might the catalogers be able to deal with MODS or will the loss of granularity be a problem?
ANSWER (Michael): I am not a MODS expert. MARC is very granular. Maybe look at the MARCXML – MODS crosswalk?
QUESTION: At the University of Delaware, do you have any other systems?
ANSWER (Jennifer): When we first got WorldCat Local you had to know the URL to get to the library. That changed fast! The patrons couldn’t find anything. Jaime: In WorldCat Local you cannot scope the search to specific sub-collections.
QUESTION: Thank you Jennifer for your remarks. Is there a problem with catalogers trying to ’sneak’ data elements into other places – are standards in danger?
ANSWER (Jennifer): I would hope we wouldn’t move 524 data into a 500 field just to get it displayed. There is some danger of loosing the granularity by pushing everything to Dublin Core. I don’t know how real that danger is at this point.
QUESTION: A political question for Jennifer: Who has the clout to push for changes with OCLC?
ANSWER (Jennifer): I think leaning encouraging users to give feedback is important. We were told that users don’t want that “we have proven that users don’t want that”. Users need to make comments about their challenges in dealing with the interface. FROM AUDIENCE: The strongest is to say that you are looking at Sky River. FROM AUDIENCE: Make your data more discoverable outside the catalog world – internal websites and Google. Jaime: We are working hard to make MARC records to push access to our collections. The push is to make the data available in as many locations as possible.
QUESTION: Are these all different levels of subscriptions? Are they trying to push people to buy more subscriptions?
ANSWER (Jennifer): There is a sense that WorldCat Local is pushed at local public libraries. Yes – WorldCat Local is something they have to pay for. Michael: With Archive Grid you are going a step further – EVERYTHING in the finding aid is indexed. Every search I did in there returned thousands of records. Then I filtered by institution – and it never loaded. FROM AUDIENCE: I think they are revamping Archive Grid – but I don’t know how far they are in the process. Michael: I love the detail – you don’t have to dig through other data to find something useful. Depending on the institution – and how they are allowing their data to be harvested – you may see less information. Jaime: You have to actively work with OCLC to get Archive Grid to pick up your data.
QUESTION: We are tinkering with users adding tags – are you having any success with people adding tags?
ANSWER (Jaime): No – it isn’t something we have dealt with. WorldCat Local does let you add stuff like that.
QUESTION: Will OCLC provide that UGC (user generated content) back to the institution?
ANSWER: We wouldn’t know.
QUESTION: Have they provided access to the user studies?
ANSWER: Yes – but it is based on watching individuals use the tools.
Image Credit: Statue representing Research by Henry Hering from image of the interior of the Field Museum of Natural History interior.
As is the case with all my session summaries from MARAC, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
This post is from from: Spellbound Blog.MARAC Spring 2010: Hurray for Archival Metadata (Session S2)
http://feedproxy.google.com/~r/Spellboundblog/~3/lUQahW5nFO4/
In an example of Twitter serendipity, @silverasm’s (Aditi Muralidharan) tweet pointed me to @historying’s blog post about Topic Modeling. In this post Cameron Blevins explains the results of using the topic modeling feature of UMass Amherst’s MAchine Learning for LanguagE Toolkit (MALLET) on the text of Martha Ballard’s Diary.
I have spent lot of time thinking about how to generate thematic overviews of groups of archival collections. My information visualization project, ArchivesZ, aims to provide ways of understanding aggregated archival description data, both from a single institution or across institutional boundaries. Now I find myself wondering if text mining with a tool like MALLET might generate smart topic groupings more elegantly than fighting with the wide range of non-standardized collection subjects.
Topic Modeling with MALLET
To get a sense of what MALLET generates, see the excerpt below from Blevins’s post:
With some tinkering, MALLET generated a list of thirty topics comprised of twenty words each, which I then labeled with a descriptive title. Below is a quick sample of what the program “thinks” are some of the topics in the diary:
MIDWIFERY: birth deld safe morn receivd calld left cleverly pm labour fine reward arivd infant expected recd shee born patient
CHURCH: meeting attended afternoon reverend worship foren mr famely performd vers attend public supper st service lecture discoarst administred supt
DEATH: day yesterday informd morn years death ye hear expired expird weak dead las past heard days drowned departed evinn
GARDENING: gardin sett worked clear beens corn warm planted matters cucumbers gatherd potatoes plants ou sowd door squash wed seeds
He goes on to explain that “MALLET also allows us to track those topics across the text.” What if, instead of text mining a diary, we pumped the descriptions of every archival collection from a single institution into MALLET. Of course we would need a good list of stop words including such common terms as archives, history, sources and records. But I wonder how the topics MALLET suggests would compare to the official subjects associated with each collection? Could this give us a broad overview of the topics covered by a specific repository and give us a new way to build paths to the collections based on topic?
Auto-Classification Using Castanet
Text miner Aditi Muralidharan also posted recently on this theme in Castanet: automatically generating a browsing structure for a collection and explains:
Castanet automatically carves a sub-structure from the hierarchical concept dictionary, WordNet (http://wordnet.princeton.edu), and matches items in the collection to one or many appropriate places within that hierarchy. Then, after some automated trimming and flattening, the result is a hierarchical browsing system.
I have heard of Castanet before via the Flamenco Search Interface Project. Apparently Muralidharan did a project using Castanet last summer to create a category system for Flickr Commons images based on the images’ tags which is then rendered using a Flamenco interface. I include a partial screen-shot below to give you a taste of what the navigation of images feels like a few levels down in the hierarchy. I love the classification of ‘Group Action’ then filtered by a sub-classification of ‘Commerce’. The first images shown are of ‘horse trading’ – with additional headings and images beneath them as well as additional filter options on the left.
What If?
What if we pulled all the English language archival descriptions from around the world as our original data set. If we used this data for topic modeling, our subjects clusters would be cross-institutional. Maybe we could map the local institution assigned subjects to the topic model generated topics for each collection and get a sort of automated crosswalk for finding related collections. If we used the local institution assigned subjects from the archival descriptions for Canasta style auto-classification, maybe we could generate a way to hierarchically browse collections topically.
Both MALLET and Flamenco are open source (I am not sure of the status of Castanet) and, as I discovered working on ArchivesZ, many institutions will share their archival description data for a good cause. So – is this a good cause? I need to tease these ideas out a bit more, but what do you all think of it at first blush? Feasible? Interesting? Worthwhile experiments?
Image Credits: MALLET logo from MALLET homepage. Images in screen shot from Flickr Commons with no known copyright.
This post is from from: Spellbound Blog.Topic Modeling, Auto-Classification and Archival Description
http://feedproxy.google.com/~r/Spellboundblog/~3/eNZtWH_iJjc/
What does a brilliant female scientist look like? In honor of the 2010 Ada Lovelace Day, I went on a hunt through the Filckr Commons and other sources of archival images to see how many portraits of women who have contributed to science and technology I could find.
A few years back I read Malcolm Gladwell’s book Blink. One of the ideas I took away was the profound impact of the images with which we surround ourselves. He discusses his experience taking an Implicit Association Test (IAT) related to racism and his opinion that surrounding oneself with images of accomplished black leaders can change ones ‘implicit racism’. Project Implicit still continues. I found a demo of the ‘Gender-Science IAT’ and took it (you can too!). “This IAT often reveals a relative link between liberal arts and females and between science and males.” My result? “Your data suggest little or no association between Male and Female with Science and Liberal Arts.” My result was received by 18% of those taking the test. 54% apparently show a strong or moderate automatic association between male and science and female and liberal arts.
My inspiration for this post is to find images of accomplished women in science and technology to help young women and girls fight this ‘automatic association’. How can you imagine yourself into a career when you don’t have role models? Lets find the most varied assortment of images of what female scientists and technologists looks like!
The Smithsonian has an entire set of Women in Science images on the Flickr Commons about which they wrote a fabulous blog post over on their Visual Archives Blog. Consider the difference between the Smithsonian Flickr set of Portraits of Scientists and Inventors and that of Women in Science shown below in my snazzy animated GIF.
For me, the first set goes a long way to associate what a scientist or inventor looks like to images of white men with varying degrees of facial hair. I don’t see myself in that set of photos, even though there are a few women mixed into the set. The Women in Science set shows me women and, even though the images are black and white and reflect the style of another era, I can imagine myself fitting in with them.
Digging into a few specific examples within the ‘Women in Science’ images, on the left below we see research scientist Eloise Gerry who worked for the US Forest Service from 1910 through 1954. The caption from this image is “Dr. Gerry in her laboratory with the microscope that helped give the great naval stores industry in the United States a new lease on life.” On the right we have Physicist Marie Curie.
Over on the website of the Smithsonian’s Dibner Library of the History of Science and Technology I found a few more images. On the left we have mathematician Tatiana Ehrenfest, from the first half of the 20th century, and on the right a physicist from the 1700s, marquise du Châtelet, Gabrielle-Émilie Le Tonnelier de Breteuil. These were not easy to find – I did in fact skim through all the names and photos listed to find the two shown here.
After thinking a bit about the shortest path to more images of women in science and technology I went onto Freebase.com. I was so pleased to discover how easy it was for me to find entries for computer scientists, then filter by those who were female and had images. This gave me the faces of Female Computer Scientists, including those shown in the screen shot below (and yes, that is Ada Lovelace herself 2nd from the left in the top row).
I was excited to find more images and next I pulled together a list of Female Scientists. Finally a bit more diversity in the faces below (and there are many more images to explore if you click through).
Finally, I put a call out on both Twitter and the DevChix mailing list asking for women to share images of themselves for use in this blog post. Within just a few hours I received photos of Lorna Mitchell (a PHP developer in the UK – photo by Sebastian Bergmann), Aimée Morrison (shown crafting a social multimedia curriculum for DHSI 2010), Kristen Sullivan and a group photo of the DC LinuxChix dinner at ShmooCon.
There are many sources of images of women who have contributed to or are members of the fields of science, technology, engineering and mathematics, but one of the best are archives. Consider the photo credits page for the website dedicated to Biographies of Women Mathematicians which credits 9 different archives for images used on the site.
Images are so powerful. The preservation of images of women like those mentioned above is happening in archives around the world. The more of these images that we can collect and present in a unified way, the more young women can see themselves in the faces of those who came before. It sounds so simple, but imagine the impact of a website that showed face after face of women in science and tech. Of course I would want a short bio too and the ability to filter the images by specialty, location and date. I think that Freebase.com could be a great place to focus efforts. Their APIs should make it easy to leverage images and all the structured data about women in tech that we could possibly dream to collect. I know that many of the posts created today will feature photos of amazing tech women, how do we organize to collect them in one place? Who wants to help?
If you know of additional archival collections including images of tech women, please let me know!
Happy Ada Lovelace Day everyone!
This post is from from: Spellbound Blog.Ada Lovelace Day: Portraits of Women in Technology
http://feedproxy.google.com/~r/Spellboundblog/~3/8j29wTPstZc/
While smart folks over at NARA are thinking about the preservation strategy for digitized 2010 census forms, I got inspired to take a look at what we have preserved from past censuses. In specific, I wanted to look at posters, photos and videos that give us a glimpse into how we encouraged and documented the activity of participation in the past.
There is a dedicated Census History area on the Census website, as well as a section of the 2010 website called The Big Count Archive. While I like the wide range of 2010 Census Posters – the 1940 census poster shown here (thank you Library of Congress) is just so striking.
I also loved the videos I found, especially when I realized that they were all available on YouTube – uploaded by a user named JasonGCensus. I am not clear on the relationship between JasonGCensus and the official U.S. Census Bureau’s Channel (which seems focused on 2010 Census content), but there are some real gems posted there.
For example, in the 1970 Census PSA shown below we learn about the privacy of our census data: “Our separate identities will be lost in the process which is concerned only with what we say, not who said it”. We are shown technology details – complete with old school beeping and blooping computer sounds. (NOTE: this video is also available on Census.gov, but I saw no way to embed that video here – hence my cheer at finding the same video on YouTube)
For the 1960 census, a PSA explains the new FOSDIC technology which removed the need for punch-cards. With the tagline ‘Operation Rollcall, USA’, the ad presents our part in “this enterprise” as cooperation with the enumerators. In the 1980 PSA the tag line is ‘Answer the Census: We’re counting on you!’ and stresses that it is kept confidential and is used to provide services to communities. By the time you get to the 1990 and 2000 PSAs we see more stress on the benefits to communities that fill out the census and less stress on how the census is actually recorded.
I also found some lovely census images in the Library of Congress Prints and Photographs catalog including the image shown to the right and:
an 1870 Wood Engraving
an 1890 Cartoon
a 1910 Postcard
Exploring the area of Census.gov dedicated to the 2010 census made me wonder what was available online for the 2000 census.
Wayback Machine to the rescue! They have what appears to be a fairly deep crawl of the 2000 Census.gov site dating from March of 2000. For example – the posters section seems to include all the images and PDFs of the originals. I even found functional Quicktime videos in the Video Zone, like this one: How America Knows What America Needs.
The ten year interval makes for a nice way to get a sense of the country from the PR perspective. What did the Census Bureau think was the right way to appeal to the American public? Were we more intrigued by the latest technology or worried about our privacy? Did they need to communicate what the census is used for? Or was it okay to simply express it as an American’s duty? I appreciate the ease with which I can find and share the resources above. Great fun.
And for those of you in the United States, please consider this my personal encouragement to fill out your census forms!
This post is from from: Spellbound Blog.Encouraging Participation in the Census
http://feedproxy.google.com/~r/Spellboundblog/~3/KPAfzESaBus/
Even with the recent announcement that the Flickr Commons is not currently accepting new applications, there are clearly still applications being processed. NARA has been on Flickr since February of 2009 and loaded over 49 sets of images. As announced in a recent press release, on the first of February 2010 Flickr flipped the switch and all the images in the The U.S. National Archives’ photostream was shifted over into the Commons.
The 49 sets are sorted into 4 collections:
Historical Photographs and Documents (19 sets) – including NARA favorites like Rosie the Riveter and Nixon and Elvis and documents from regional archives across the country.
DOCUMERICA Project by the Environmental Protection Agency (27 sets) – one set dedicated to top picks and the rest organized by photographer. Interestingly, NARA’s website has indexed the 15,000+ images from this project by subject and by location. I wonder how the picked which image from DOCUMERICA to port over to Flickr?
Mathew Brady Civil War Photographs (2 sets) – currently 473 out of the 6,066 digitized Mathew Brady images are uploaded into the Commons. The images posted in the Commons are available in a much higher resolution than they are within ARC. A great example from this collection is the image of the Poplar Church (image shown to right) available as a 600 x 483 GIF on ARC and as a 3000 x 2416 JPG on Flickr. This image also has gotten a nice set of comments and tags.
Development and Public Works (1 set) – the only set in this collection consists of images taken to support the Flathead Irrigation Project. “The Project was initiated to determine rights and distribute water originating on the Flathead Indian Agency in Montana to both tribal and non-tribal land.” These images seem to be the same resolution on both archives.gov and Flickr.
In honor of this transition, NARA posted a new set of 220 Ansel Adams photographs. One of the first comments on the set was “low-res scans? Pretty big letdown.” Fine question. As noted above, other images from NARA in the Commons much larger than the 600 x 522 that seems to be available for the Ansel Adams images. It would be great to have a clear explanation about available resolutions published along with each new set of images.
NARA has published this simple rights statement for all NARA images in the Commons:
All of the U.S. National Archives’ images that are part of The Flickr Commons are marked “no known copyright restrictions.” This means the U.S. National Archives is unaware of any copyright restrictions on the publication, distribution, or re-use of those particular photos. Their use restriction status in our online catalog is “unrestricted.” Therefore, no written permission is required to use them.
NARA has also posted an official Photo Comment and Posting Policy and a fairly extensive FAQ about the images they have post on Flickr. I do wish that there was a simpler way to request reprints of images from the Commons. Most of the NARA images have this standard sentence – but for someone not familiar with NARA and more accustom to one click ordering, the instructions seem very complex:
For information about ordering reproductions of photographs held by the Still Picture Unit, visit: www.archives.gov/research/order/still-pictures.html
I also wish that more of the images had location information assigned – only 113 of the images show up on the fun to explore map view. At first glance it looks as if this information is populated only for images taken near airports. There are many images that include a location based subject in the image description posted on Flickr, yet do not include geographic metadata that would permit the image to be shown on a map. The one image I did find that was not at an airport but did include geographic metadata is this image of the World Trade Center assigned to the NYC Financial District Flickr Location. While I could add a location related tag to NARA’s images, there does not appear any way for the general public to suggest location metadata.
One odd note about this and other World Trade Center images – the auto-generated tags have broken up the building name very oddly as shown in my screen clip on the left.
Another fun way way to explore the NARA Flickr images is to visit the ‘Archives’ page (slightly hilariously titled “U.S. National Archives’ Archives”). Here we can browse photos based on when they were uploaded to Flickr or when they were taken. Those images that include a specific date can be viewed on a calendar (such as these images from 1918) or in a list view (those same images from 1918 as a list), while those taken ‘circa’ a year can be viewed in a list with all other images from sometime that year (such as these images from circa 1824).
Beyond all the additional tags and content collected via comments on these images, I think that being able to find NARA images based on a map, calendar or tag is the real magic of the commons. The increased opportunities for access to these images cannot be overstated.
Take this image of a sunflower. If you visit this image on archives.gov, you can certainly find the image and view it – but good luck finding all the images of flowers as quickly as this Flickr tag page for NARA images of flowers can. Even looking at the special Documerica by Topic page doesn’t get me much closer to finding an image of a flower.
It will be fun to watch what else NARA chooses to upload to the Commons. I vote for more images that are assigned metadata such that they show up on the map and calendar. I will also put your mind at ease by telling you that the lovely ladies at the top of this post are their because their image is one of the most popular uploaded by NARA to date (based on it having been marked a favorite by 88 individuals). The only image I could find with more fans was the classic image of Nixon and Elvis with 250 fans at the time of this posting.
What is your favorite NARA Commons image? Please post a link in the comments and if I get enough I will set up a gallery of Spellbound Fan Favorites!
Image Credits: All images within this blog post are pulled from NARA’s images on the Flickr Commons. Please click on the images to see their specific details.
This post is from from: Spellbound Blog.National Archives Transitions to Flickr Commons Membership
http://feedproxy.google.com/~r/Spellboundblog/~3/RWDU--STpnI/
The Official Google Reader Blog recently announced a new feature that will let users watch any page for updates. The way this works is that you add individual URLs to your Google Reader account. Just as with regular RSS feeds, when an update is detected – a new entry is added to that subscription.
My thinking is that this could be a really useful tool for archivists charged with preserving websites that change gradually over time, especially those fairly static sites that change infrequently with little or no notice of upcoming changes. If a web page was archived and then added to a dedicated Google Reader account, the archivist could scan their list of watch pages daily or weekly. Changes could then trigger the creation of a fresh snapshot of the site.
I will admit that there have been services out there for a while that do something similar to what Google has just rolled out. I personally have used Dapper.net to take a standard web page and generate an RSS feed based on updates to the page (sound familiar?). One Dapper.net feed that I created and follow is for the news archive page for the International Red Cross and can be found here. What is funny is that now they actually have an official RSS feed for their news that includes exactly what my Dapper.net feed harvested off their news archive page – but when I built that Dapper feed there was no other way for me to watch for those news updates.
There are lots of different tools out there that aim to archive websites. Archive-It is a subscription based service run by Internet Archive that targets institutions and will archive sites on demand or on a regular schedule. Internet Archive also has an open source crawler called Heritrix for those who are comfortable dealing with the code. Other institutions are building their own software to tackle this too. Harvard University has their own Web Archive Collection Service (WAX). The LiWA (Living Web Archives) Project is based in Germany and aims to “extend the current state of the art and develop the next generation of Web content capture, preservation, analysis, and enrichment services to improve fidelity, coherence, and interpretability of web archives.” One could even use something as simple as PDFmyURL.com – an online service that turns any URL into a PDF (be sure to play with the advanced options to make sure you get a wide enough snapshot). I know there are many more possibilities – these just scratch the surface.
What I like about my idea is that it isn’t meant to replace these services but rather work in tandem with them. The Internet Archive does an amazing job crawling and archiving many web pages – but they can’t archive everything and their crawl frequency may not match up with real world updates to a website. This approach certainly wouldn’t scale well for huge websites for which you would need to watch for changes on many pages. I am picturing this technique as being useful for small organizations or individuals who just need to make sure that a county government website makeover or a community organization’s website update doesn’t get lost in the shuffle. I like the idea of finding clever ways to leverage free services and tools to support those who want to protect a particular niche of websites from being lost.
Image Credit: The RSS themed image above is by Matt Forsythe.
This post is from from: Spellbound Blog.Leveraging Google Reader’s Page Change Tracking for Web Page Preservation
http://feedproxy.google.com/~r/Spellboundblog/~3/IFt0_bdLQJM/
In the early 1960s, my father bought a Wheatstone concertina in London. He tells how he visited the factory where it was made to pick one out and recalls the ledger book in which details about the concertinas were recorded. After a recent retelling of this family classic, I was inspired to see what might be online related to concertinas. I was amazed!
First I found the Concertina Library which presents itself as a ‘Digital Reference Collection for Concertinas’. With fourteen contributing authors, the site includes in depth articles on concertina history, technology, music, research and a wide range of concertina systems.
I particularly appreciate the reasons that Robert Gaskins, site creator, lists for the creation of the site on the about page:
(1) Almost all of the historical material about concertinas has been held in research libraries where access is limited, or in private collections where access may be non-existent. The reason for this is not that the material is so valuable, but that in the past there was no way to make material of limited interest available to everyone, so it stayed safely in archives. The web has provided a way to make this material widely available—partly by the libraries themselves, and partly in collections such as this.
(2) There seems to be a growing number of people working again on the history of concertinas, perhaps in part because research materials are becoming available on the web. These people are widely scattered, so they don’t get to meet and discuss their work in person. But again the web has provided an answer, allowing people to work collaboratively and exchange information across miles and timezones, and for the resulting articles the web offers worldwide publication at almost no cost.
What an eloquent testimonial for the power of the internet to both provide access to once-inaccessible materials and support virtual collaboration within a geographically dispersed community.
Next, I found the Wheatstone Concertina Ledgers. This site features business records (in the form of ledgers) of the C. Wheatstone & Co. stretching from 1830 through 1974 (with some gaps). The originals are held at the Library of the Horniman Museum in London. It is a great reference website with a nice interface for paging through the ledgers. Armed with the serial number from my father’s concertina (36461) I found my way to page 88 of a Wheatstone Production Journal from the Dickinson Archives. If I am reading that line properly, his concertina is a 3E model and was made (or maybe sold?) April 25, 1960. I wish that there was documentation online to explain how to read the ledgers. For example, I would love to know what ‘Bulletin 3052′ means.
I liked the way that they retained the sense of turning pages in a ledger. Every page of each ledger is included, including front and back end pages and blank pages. I have total confidence that I am seeing the pages in the same order as I would in person.
You can read the overview and introduction to the project, but what intrigued me more was the very detailed narrative of how this digitization effort was accomplished. In How The Wheatstone Concertina Ledgers Were Digitized, we find Robert Gaskins of the Concertina Library explaining how, with an older model IBM ThinkPad, a consumer grade scanner, and his existing software (Microsoft Office and Macromedia Fireworks), he created a website with 4,500 images and clean, simple navigation. From where I sit, this is a great success story – a single person’s dedication can yield fantastic results. You don’t need the latest and greatest technology to run a successful digitization project. One individual can go a long way through sheer determination and the clever leveraging of what they have on hand.
Back on the Concertina Library’s about page we find “There is still a lot of material relevant to the study of concertinas and their history which should be digitized and placed on the web, but has not been so far. Ideas for additional contributors, items, and collections are very welcome.” If I am following the dates correctly, the Concertina Library has articles dating back to February of 2001, shortly before Mr. Gaskins started planning the ledger digitization project. At the same time as he was collaborating with other concertina enthusiasts to build the Concertina Library, he was scanning ledgers and creating the Wheatstone Concertina Ledgers website. Three cheers to Mr. Gaskins for his obvious personal enthusiasm and dedication to virtual collaboration, digitization and well-built websites! Another three cheers for all those who joined the cause and collaborated to create great online resources to support ongoing concertina research from anywhere in the world.
All this started because my father owns a beautiful old concertina. I love it when an innocent web search leads me to find a wealth of online archival materials. Do you have a favorite online archival resource that you stumbled across while doing similar research for family or friends? Please share them in the comments below!
Image Credit: http://www.flickr.com/photos/rocketlass/ / CC BY-NC-SA 2.0
This post is from from: Spellbound Blog.Concertina History Online Features Virtual Collaboration and Digitization
http://feedproxy.google.com/~r/Spellboundblog/~3/Qkm1pW3S5FI/
Larry Sultan was famed as both a photographer and archives researcher. He passed away on Sunday, December 13th, 2009 and his obituary in the New York Times describes his use of archival photographs as “harnessing found photographs for the purposes of art while using them as a way to examine the society that produced them”. The 59 photographs, selected in collaboration with Mike Mandel from a broad assortment of corporate and government archives, were originally displayed and published as a collection named ‘Evidence’ in 1977. A reprint of Evidence was published in 2004, including a new scholarly essay and additional images not in the original.
The Stephen Wirtz Gallery has a number of images from the 2004 exhibition available online and features this great summary of the original project:
Sultan and Mandel created the series Evidence with documentary photographs mined from image banks of government institutions, corporations, scientific research facilities, and police departments. An NEA grant gave the artists a persuasive edge in gaining access these resources, and images were selected for their mysterious and perplexing subject matter. The series was presented in an exhibition at the San Francisco Museum of Modern Art in 1977, and simultaneously collected in the book Evidence, which is recognized among the most important publications in the history of photography. Removed from their original contexts and repositioned without references to their sources, these images challenged the viewer to examine the conceptual concern of identifying meaning and authorship in the creation and consideration of the art photograph.
I used WorldCat to find the closest copy of Evidence and happily found a copy of the 1977 imprint at the Art Library at the University of Maryland, College Park. It had been a long time since I had looked at photographs on paper and bound in a book rather than on a computer monitor. I love the idea of re-purposing of archival image – but I was also fascinated to realize that the word ‘archive’ does not appear anywhere in the publication. Even the description above mentions ‘image banks’, not ‘archives’.
The organizations thanked at the start of the book included major corporations, U.S. federal agencies and a long list of highway, fire and police departments. Sultan and Mandel seemed to focus their research efforts in California and Washington, DC – perhaps due to a need to limit their travel. While today one would likely still need to travel to many archives to find images like those used in Evidence, there are so many images available online (at least for preview). How would someone approach a project like this now?
It is so easy to create a slide show or website featuring images from repositories from around the world. Even the images that have not been digitized have a decent chance of at least being mentioned in an online finding aid. The recently introduced Flickr Galleries make it easy to select up to 18 images from across Flickr – like my November Flickr Commons Photos of the Month Gallery. Also, much of the online culture of reuse encourages giving proper attribution for materials.
Part of Evidence’s power is the extraction of the images from their original context and their unexplained juxtaposition with one another. Finding and harvesting an image online would make it much harder to entirely strip that context away to leave the raw image behind. I can imagine a web-wide hunt for an image’s origin. While that might be fun (maybe an archives answer to the DARPA Network Challlenge?), it would not be the same as a sleek hardback book with 59 stark, unlabeled, black-and-white photos that sits on the shelf of an art library.
I find it poetic that Evidence’s photos are a perfect example of a ’secondary value’ of archival records, even though the images were literally evidential records necessary for the carrying out of daily business. That said, I don’t believe that ‘possibly useful to future artists’ is a typical reason given for retaining and preserving archival records. We are just lucky that artists have been (and will almost certainly continue to be) innovative in their hunt for inspiration.
If you have the opportunity, I encourage you to sit quiety with a copy of Evidence. The images include landscapes, explosions, deep pits, plants, rocks, people, planes, machinery, wires and a car on fire. My laundry list of contents cannot begin to do the images justice – but I hope that they might wet your appetite.
This combination of gallery exhibition and book has inspired me to wonder about other similar projects that specifically leverage archival images for artistic purposes. Please list any that you are aware of in the comments (be they in gallery exhibitions or published volumes).
This post is from from: Spellbound Blog.Archival Photographs as Art: A Part of Larry Sultan’s Legacy
http://feedproxy.google.com/~r/Spellboundblog/~3/sKOV3MDaANY/
I realized while at MARAC at the end of October that I never posted here about the completion and publication of the Interactive Archivist: Case Studies in Utilizing Web 2.0 to Improve the Archival Experience. The brainchild of J. Gordon Daines III and Cory Nimer, this free SAA ePublication only exists online and brings together ten Web 2.0 archivist-oriented case studies covering blogs, mashups, tagging, wikis, Facebook and more. It also includes thorough introductions to each of the technologies covered by case studies, an annotated bibliography and a link to a living list of resources on Delicious.
My contribution to the collection is titled Spellbound Blog: Using Blogs as a Professional Development Opportunity. I don’t spend much time on this blog talking about blogging, so if you ever wanted to know more about why I blog or are considering starting a blog yourself – my case study might be of interest.
Thank you again to Gordon and Cory for including me as part of their project. I think that it is a great contribution to the cultural heritage community at large. These case studies take a wide range of new technologies and make them accessible through real examples and lessons learned. I don’t know about you, but I believe I learn at least 10x as much from someone’s first hand experience than I would from an abstracted explanation of how one might use a new technology. I hope you find the Interactive Archivist as rich a resource as I believe you will.
This post is from from: Spellbound Blog.Interactive Archivist: Spellbound Blog as a Case Study
http://feedproxy.google.com/~r/Spellboundblog/~3/__xznBgEYxM/
In honor of Blog Action Day 2009’s theme of Climate Change, I am revisiting the subject of a post I wrote back in the summer of 2007: International Environmental Data Rescue Organization (IEDRO). This non-profit’s goal is to rescue and digitize at risk weather and climate data from around the world. In the past two years, IEDRO has been hard at work. Their website has gotten a great face-lift, but even more exciting is to see is how much progress they have made!
Weather balloon observations received from Lilongwe, Malawi (Africa) from 1968-1991: all the red on these charts represents data rescued by IEDRO — an increase from only 30% of the data available to over 90%.
Data rescue statistics from around the world
They do this work for many reasons – to improve understanding of weather patterns to prevent starvation and the spread of disease, to ensure that structures are built to properly withstand likely extremes of weather in the future and to help understand climate change. Since the theme for the day is climate change, I thought I would include a few excerpts from their detailed page on climate change:
“IEDRO’s mandate is to gather as much historic environmental data as possible and provide for its digitization so that researchers, educators and operational professionals can use those data to study climate change and global warming. We believe, as do most scientists, that the greater the amount of data available for study, the greater the accuracy of the final result.
If we do not fully understand the causes of climate change through a lack of detailed historic data evaluation, there is no opportunity for us to understand how humankind can either assist our environment to return to “normal” or at least mitigate its effects. Data is needed from every part of the globe to determine the extent of climate change on regional and local levels as well as globally. Without these data, we continue to guess at its causes in the dark and hope that adverse climate change will simply not happen.”
So, what does this data rescue look like? Take a quick tour through their process – from organizing papers, photographing each page, the transcription of all data and finally upload of this data to NOAA’s central database. These data rescue efforts span the globe and take the dedicated effort of many volunteers along the way. If you would like to volunteer to help, take a look at the IEDRO listings on VolunteerMatch.
This post is from from: Spellbound Blog.Blog Action Day 2009: IEDRO and Climate Change
http://feedproxy.google.com/~r/Spellboundblog/~3/mYRaGtPdQ_8/
Over the past month I have been playing with Flickr’s new Galleries. Each gallery is limited to 18 images from anywhere in Flickr (provided that the image owner has made their image available for inclusion in galleries). I thought it might be fun to try my hand at picking the best of the new images added to the Flickr Commons each week.
Each Thursday over the past month I have created a Commons Picks of the Week gallery from the all the images added to the Commons in the prior 7 days.
Here are the galleries from the first month of my experiment. Let me know what you think.
September 17, 2009 Commons Picks of the Week
September 24, 2009 Commons Picks of the Week
October 1, 2009 Commons Picks of the Week
October 8, 2009 Commons Picks of the Week
Each week I had about 150 new images from which to select my 18 favorites. Since many institutions seem to load their images each week along some thematic lines, sometimes I felt like I had too many of one kind of image. Moving forward I may switch to bi-weekly or monthly to get a larger pool of images from which to pick.
I think there is a lot of room for making fun thematic galleries from images in the Commons. I tried my hand at this too and came up with Bathing Beauties of the Commons. Of course the fact that all images across Flickr can co-exist in these galleries means that Commons images now have another way to be pulled into the public eye next to other ‘regular’ images.
I have a short wish list of enhancements I would love to see:
slideshow option for display of the gallery within Flickr
a way to embed a gallery on an external website as a slideshow
some way to follow the new galleries created by an individual (RSS feed or subscription option)
If you try your hand creating a gallery of Commons images, please post a link as a comment to this post so we can all take a look.
This post is from from: Spellbound Blog.Flickr Galleries: Fun with Flickr Commons
http://feedproxy.google.com/~r/Spellboundblog/~3/548GbygCXRY/
Each week brings announcements of archives launching new websites. Today both my email and Twitter told me about University of Maryland, Baltimore County’s new Digital Collections site. Who can resist peeking at new materials available online?
I have spent much of the past year learning the details of Search Engine Optimization. Usually shortened to SEO, this simply refers to the use of techniques which improve the traffic sent to a website via organic search. Want your webpage to show up at the top of the list for a specific search in Google? You want to work on your SEO.
So when I look at new archives website, I can’t help but keep an eye open for how well the site is optimized for search engines.
I hope that UMBC will forgive me for nitpicking their new site. A lot of their choices are great for SEO, but they also have room for improvement.
Things Done Well for SEO
Home Page Title & Description: The site’s home page has a good meta description. This is the text displayed below the link on a search results page – as shown below:
Unique Page Titles At Collection Level: Each photography collection homepage has a unique page title and a nice block of explanatory text. Google can only read words – so the more unique text on a page, the better the job Google can do in figuring out what your page is about. Example: Ardsley Park Album
Good anchor text: (also known as link text) The words used in anchor text tells search engines information about the destination page. For example, the blue text below is anchor text.
Areas for SEO Improvement
Unique Page Titles At Item Level: Individual images and documents all use a generic page title such as ‘UMBC | Digital Archive | Document Viewer’. Document Example: Accidental Death of an Anarchist Image Example: 10 year old Bootblack
H1 Tags: In the HTML of each page, the dominant heading of the page should use the tag. This helps Google know the phrase you are targeting with this page. It is your 2nd best place to emphasize your content after the page title. In the case of the item pages, there seems to often be a headline type title at the top of the page – but it currently is not an demarcated with an tag.
Think About Search Results and Indexing: Pages displaying results of internal searches on your site are not likely to be useful as indexed pages in Google. The thinking here is that they can dilute the focus on the item and collection level pages on your site if Google also has many search results pages in the index. If UMBC wanted their search pages to be indexed, then those pages’ URLs should be simplified and the search results pages need a page title that somehow includes the search criteria. There are two ways that I know of to disable this indexing – blocking via the site’s robots.txt file or via a robots meta tag in the header of the search results page. Both of these methods tell obliging search engines to not crawl certain parts of your site.
Final Thoughts
There are plenty of other things that UMBC could do to support this new website. They could create an XML sitemap of all their pages and submit it to Google (maybe they already have). They might re-title some of their pages based on using a tool like Google Insight to see what variations of a phrase is searched on most frequently. My goal here was to give you a taste of the sorts of things that catch my eye. Also, SEO is still more of an art than a science – so you will sometimes notice that what one SEO expert recommends is the opposite of what the next expert would tell you.
In many cases changes, such as the Unique Page Title at the Item Level mentioned above, may not even be possible due to software or programmer resource limitations. The trick is to take advantage of every option that is available. There are also trade-offs to be made. UMBC’s site provides some very slick interfaces for viewing the details of a group of documents, such as theater programs and other materials related to a theatrical production. The imlementation elegantly handles the situation of multiple scanned images which relate to a coherent set of documents. Sometimes you can’t have both your innovative UI and perfect SEO. Then it gets down to what your goals are for your website. Are you trying to make a specific community of existing users happy by providing them with tools they can use? Or does your mission focus more on reaching out to a broader audience?
There is no silver bullet to search engine optimization. It just takes knowledge of the available tools and techniques combined with a willingness to keep learning and experimenting. Like the ‘Do-It-Yourself-Woman‘ pictured above in the Nationaal Archief’s photo I found out on the Flickr Commons, you too can learn the basics and do-it-yourself. A great starting point is Google’s free SEO Guide. Also, please remember that the best time to plan your SEO strategy is before you have built your site in the first place!
I would love to do research on how much progress archives websites can make in their organic search traffic after SEO improvements. My thinking is to take a snapshot of a month of analytics (the statistics that tell you how many people are visiting your website) and then apply some SEO inspired changes. After a suitable delay (it takes some time for SEO to do its job) we consider another month of analytics to determine any change in organic traffic.
Do you want me to do a quick review of your archives website to see if there is room for SEO improvement? Please contact me or add a comment to this post. I feel like there is a conference presentation in all this if we can find a good set of websites to optimize.
Finally, thank you to unsuspecting UMBC – your new website really is beautiful.
Image credit: Doe-het-zelf vrouw /Do-it-yourself-woman from Nationaal Archief on Flickr Commons.
This post is from from: Spellbound Blog.SEO Evaluation of an Archival Website: Looking at UMBC’s Digital Collections
http://feedproxy.google.com/~r/Spellboundblog/~3/OhgfuIew6u8/
Andrew Flinn, University College London (UCL), was the second speaker during SAA09’s Session 202 with his presentation ‘A History of Our Own, Representing Communities and Identities on the Web’. Flinn began with the idea that archives are “a place for creating and re-working memory”. While independent community archives are constituted around many purposes, Flinn’s main interest is in communities focused on absences and mis-representation of a group or event in history. Communities in which there is a cultural, politcal, or artistic activism. Some of these communities may be considered ‘movements’.
How should/can archivists support local archiving activities?
Part of the challenge of online communities is the need to capture the interactions in order to not loose the full picture. The National Listing of Community Archives in the UK’s website states that they “seek to document the history of all manner of local, occupations, ethnic, faith and other diverse communities”.
The UCL’s International Centre for Archives and Records Management Research and User Studies (ICARUS) “brings together researchers in user access and description, community archives and identity, concepts and contexts of records and archives, and information policy”. Flinn is the Principal Investigator on the ICARUS project Community archives and identities which focuses on in depth interviews of 4 institutions which are “documenting and sustaining community heritage”.
These are some example online community sites:
rukus – black gblt archives
Moroccan Memories in Britain
eside community – east side working class community in London
Main Findings
proceed from a position that ‘knowing your own history’ is beneficial their communities as well as to the public at large
the quality of the work is done by individual passion and sacrifice, voluntary
there is ambivalence to/about the mainstream archives sector — keen to work with mainstream archives, but scarred by past bad experiences
good practices now could lead to partnerships in the future
these are living archives — not static.. still alive and growing
these ideas prompt re-evaluation of conventional archives thinking
lots of access to digital objects – perhaps movement to online existence
We need to understand that these communities evolve and are fluid. They have as broad variety of structures, sizes and methods of working. What are the patterns in participation & ownership?
The site urban 75 has hosted extended discussions about recent UK history. Efforts include identification of places and people in uploaded photos. The site connects people about issues about housing and local services – it is very practical but it also has evolved to include this historical documentation. One example post from the Brixton Forum shows a discussion about an Old shop front revealed on Atlantic Road.
A Short Aside
Next Flinn apologized for taking his talk slightly off script. Setting his papers aside, he spoke to the audience about the eXHulme website which he had discovered the evening before while finishing his presentation. Having lived in Hulme, Manchester himself, he felt a great impact from looking through the site. He spent 4 hours looking at it – including photos such as the travellers living in their buses parked – otteburn close 1996 seen at the bottom of this page. His discovery and exploration of this site gave him a greater personal understanding of the impact of these types of community documentation projects. I felt he would have been happy to keep talking about this site and the directions it had sent his thoughts — but he then got back to his papers and continued.
Building Community Online
Interactions online are the historic record of the community itself. Archives evolve and change as the community builds and edits their online content. These heritage and archive sites work to shift from the idea of visitors to engaging users in interaction — they need users of the website to feel part of the community.
Examples of sites building community online:
My Brighton and Hove – community history site
Remembering Olive Collective – “social production of collective knowledge”
The Newham Story — uses social tagging
How do you successfully encourage participation (rather than large number of passive observers) which is crucial to the success of these types of initiatives? Lurking without contributing is easy – even if joining requires action. The rate of uptake may correspond with the sense of ownership. Heritage projects might encourage and sustain such participation. See Elisa Giaccardi & Leysia Palen’s article – The Social Production of Heritage through Cross-media Interaction: Making Place for Place-making.
Suggestions
encourage conversation and treat all stories as having value – value every account
promote a sense of ownership once a story has been shared
allow for multiple ways to engage with and share content and memories
recognize and let users shift from observer to active member
Flinn’s Conclusions
What are the challenges and perils facing community archives? Lack of resources. People are doing these things in unsustainable ways
Why should we sustain independent community archives? Benefit to individuals, communities and broader society.
What can professional archivists do? Support and partnership with groups seeking this sort of partnership.
My Thoughts
The image I included above is from the Library of Congress’s Flickr Commons project. If you read through the comments on this photo you can see a diverse group of individuals come together to document the history Sylvia Sweets Tea Room. This is just another example of the process of documentation being as interesting as the original image itself.
There is still so much to learn in the arena of building productive online communities. Archivists working through how to archive what online communities create will need to understand how the process of creation is documented via various software tools. As the techniques for encouraging participation evolve – archivists will need to evolve right along with them. I think it is interesting to envision archivists working in this space and supporting these types of communities — becoming as much the champions of the community itself as preservers of a community’s collaborative creations.
Image Credit: Flickr Commons Library of Congress: Sylvia Sweets Tea Room, corner of School and Main streets, Brockton, Mass
As is the case with all my session summaries from SAA2009, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
This post is from from: Spellbound Blog.A History of Our Own, Representing Communities and Identities on the Web (SAA09: Session 202)
http://feedproxy.google.com/~r/Spellboundblog/~3/nM6kpVa9_E0/
Expanding Your Local and Global Audiences (Session 405, SAA 2009) shared how three institutions of higher education are using the web to reach out to new audiences. While the general public may still hold close the stereotype of archives as of rooms full of boxes of paper (not so different from this Duke image on Flickr: “Mattie Russell, curator of manuscripts, and Jay Luvaas, director of the Flowers Collection, examine the papers of Senator Willis Smith in the library vault.”), the presenters in this session are focused on expanding peoples’ experience of archives beyond boxes of papers locked away in a vault. They are using the web as a tool to reach beyond the walls of their reading rooms and the edges of their campuses.
Duke University Rare Books, Manuscript & Special Collections Library (RBMSCL) : Lynn Eaton (Reference Archivist)
While I didn’t find my way into this session until the start of the next speaker’s presentation, Lynn was kind enough to share with me her personal printout of her presentation slides. The links below and any associated commentary are based solely on my own interpretation of the various screen-shots included.
Duke Digital Collections
RBMSCL Finding Aids
AdViews: A Digital Archive of Vintage Television Commercials – this includes interviews with experts, a TV ads quiz and a wide range of TV ads available via iTunes U.
Duke Yearlook – a set of Flickr collections displaying images from the Duke University Archives, each focused on a decade or theme related to Duke’s history.
Duke University Libraries YouTube Channel: example Duke Exhibit: “A Century of Sex Appeals”
Duke Digital Collections on DukeMobile iPhone application – This wasn’t included in the presentation’s slides – but I spotted it on the YouTube Channel. I downloaded the DukeMobile app onto my iTouch and had a great time exploring the Duke Digital Collections included in the images section of the app. I think it was
University of Nevada Las Vegas (UNLV) Digital Collections: Tom Sommer (University and Technical Services Archivist)
UNLV has experimented with new technologies as they appear. Tom made a point of saying that when they started seeing others provide a feature on their websites, UNLV would find a way to try it out. A great example of this is the addition of a tag cloud and google map to The Boomtown Years collection listed below.
Howard Hughes Digital Collection – Images displayed in this online exhibition about Howard Hughes, such as this portrait of Howard Hughes, feature the opportunity both to rate and comment on the image. In addition, they provide an RSS feed for every possible metadata attribute (such as location, subject and media type)
Southern Nevada: The Boomtown Years – in addition to ratings and comments, this collection adds on display of recent comments, tagging and a google map which ties images to locations in southern Nevada.
UNLV Special Collections Facebook Page – shares news and updates about projects – launched 2 months ago
Marist College Archives and Special Collections: John Ansley (Head, Archives and Special Collections)
Marist first launched their website in 2001 to raise awareness of their collections. They also used listserves and the on-campus newspaper. Utlimately their best tactic was working one-on-one with professors whose interests intersected with their collections. This led to contact with special interest groups. Working with the special interest groups led to new tag and metadata values for their collections.
Hidden in Plain Sight – online exhibit about fore-edge painting. Includes videos as part of introduction since it is hard to understand through still images. The bibliography receives the most hits.
Marist Environmental History Project – this ongoing project aims to document who has what information about environmental history. The site includes an extensive list of primary sources as well as a 24 minute oral history: The Enduring Storm: The Story of the Storm King Case and the People Who Launched the Modern Environmental Movement (mp3).
Intercollegiate Rowing Association Poughkeepsie Regatta – timeline used to guide users to who won each race, PDFs of programs, and extensive bibliographies (including an index of 1000+ NYT articles about the regatta).
Lowell Thomas Travelogues – a household name during the golden age of radio, Lowell Thomas created extensive multimedia travelogues of his travels around the world. He is credited with making T. E. Lawrence famous as ‘Lawrence of Arabia’. The site was launched as a teaser to the over 1000 linear feet of photos, audio, video & other records which will be available to researchers in October 2009. For a taste of what is coming, check out this Lowell Thomas travelogue video clip – my favorite quote from which is “…come with me on a magic carpet out to the land of history, mystery and romance.”
My Thoughts
The archivists at all three of these educational institutions have tried new things and worked hard to share their materials with people beyond the traditional range of a reading room. The promise of the web, and all the tools and techniques it supports, is still being uncovered. It will be up to innovative archivists to keep discovering ways to push the envelope and welcome new audiences from all the corners of the globe.
Image Credit: http://www.flickr.com/photos/dukeyearlook/ / CC BY-NC-SA 2.0
As is the case with all my session summaries from SAA2009, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
This post is from from: Spellbound Blog.Archival Collections Online: Reaching Audiences Beyond The Edge of Campus (SAA09: Session 405)
http://feedproxy.google.com/~r/Spellboundblog/~3/j4tzaHGeVCE/
Thank you to everyone who came to our session this morning (Building, Managing, and Participating in Online Communities: Avoiding Culture Shock Online). Word on the street is that we had about 150 people in the audience.
As I mentioned during our talk – here is the Online Communities Comparison Chart. Please let me know if you have any issues accessing this document and feel free to share it with anyone you like.
If you had questions you were unable to ask during the session – please feel free to post them as comments below or send me a message via my Contact Form. I will be sure to pass questions along to all the members of our panel. I also plan to update this post with links to everyone’s slides as they appear online.
Slides from our talk:
Mark’s slides on Slideshare: Online Presence and Participation
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/CFx81VPvHaQ/
Thank you to everyone who came to our session this morning (Building, Managing, and Participating in Online Communities: Avoiding Culture Shock Online). Word on the street is that we had about 150 people in the audience.
As I mentioned during our talk – here is the Online Communities Comparison Chart. Please let me know if you have any issues accessing this document and feel free to share it with anyone you like.
If you had questions you were unable to ask during the session – please feel free to post them as comments below or send me a message via my Contact Form. I will be sure to pass questions along to all the members of our panel. I also plan to update this post with links to everyone’s slides as they appear online.
Slides from our talk:
Mark’s slides on Slideshare: Online Presence and Participation
Deborah Wythe’s slides available on SAA’s site: Archives on Flickr Commons (it’s not your mother’s audience anymore). She has also made a full paper available via SAA as well.
SAA has posted video of our presentation on facebook. The one I have linked to is the first of 7 segments. To view each in order, keep clicking ‘previous’ to view the next video.
Blog L’Archivista has a great post about our session.
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/CFx81VPvHaQ/
THATCamp Austin 2009 will be the first regional THATCamp. Slated for Tuesday evening August 11st, 2009 in Austin, Texas it will be held on the campus of the University of Texas, Austin. ‘THAT’ stands for The Humanities and Technology, while the Camp portion refers to the fact that it is an unconference.
What is an ‘unconference’ you ask? It is an attendee organized gathering focused on a common theme – in this case digital humanities. In the days leading up to the camp, attendees will post their ideas for discussion topics – but the final schedule will be sorted out on the ground during the gathering itself.
The original THATCamp event, organized by the Center for History and New Media (CHNM) at George Mason University, was a full two day weekend event. THATCamp Austin 2009 will be held on a single evening during the same week that the Annual Meeting of the Society of American Archivists is being held in Austin (and has the blessing of the CHNM).
I had an amazing time at the first THATCamp at CHNM in 2008 and wrote 3 posts about various presentations and discussions. Since I was unable to attend THATCamp 2009 I am especially pleased to be lending a hand in organizing this first regional THATCamp while I will be in Texas for SAA. If you can get yourself to Austin on Tuesday night August 11th and have a passion for the digital humanities — take a look at the what/when/where details over on the THATCamp Austin 2009 About Page.
A few details hijacked from the THATCamp Austin website:
How do I sign up?
Unfortunately, we only have space for 60-70 participants, so we’ll have to do some vetting. To apply for a spot, simply send email to thatcamp.austin.2009@gmail.com., telling us what you’d like to present, and what you think you will get out of the experience. Please don’t send full proposals. We’re talking about an informal note of around 250 words, max. Please include your T-shirt size and an email address you can check from public places so that we can register you with the University of Texas wi-fi system.
How much?
THATCamp Austin is free to all attendees, but a $25 donation towards T-shirts and pizza will be very much appreciated.
Don’t be afraid to take a step into the less-structured unconference world. What I experienced at the first THATCamp was a group of very enthusiastic individuals who were so pleased to find like minded people with whom to talk – regardless of our very varied backgrounds. Folks have reported coming away from both of the THATCamps at CHNM feeling energized and rededicated to their projects — as well as having found new collaborators and opportunities for cross-polination across all the diverse members of the digital humanities community.
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/9OPqa3FKjnA/
Session Title: Digital Curiosities: Resource Creation Via Amateur Digitisation
Speaker: Melissa Terras
Overview: Review of 100 virtual museum websites and multiple flickr groups plus surveys of amateur website creators, memory institutions and Arts & Humanities academics leads to new perspective on digitization and creation of collections online by dedicated enthusiasts.
Session Highlights
Areas of “Amateur” endeavor have a long history of launching collections, such as:
cabinet of curiosities
foundation of astronomical research
british flora and amateur botanists
weather observations
open source software movement
Being an amateur doesn’t necessarily mean being bad at what you do!
Within the realm of self-defined museums some common topics often emerge:
ephemera (advertising, packaging, nostalgia)
comics
technology – especially old tech, there is a surprising trend of being fascinated by technology approximately 10 years older than the collector
personal and “embarrassing” collections
genealogy
For these self-defined museums the scope is self-defined – these are self-delineated collections. Virtual museums can document aspects of cultural heritage considered socially taboo or in some way too sensitive to collect. A great example of this is the Museum of Menstruation which claims to have been created 14 years ago and is currently trying to establish a public permenant display for the public.
Platforms have evolved over the life of the web, starting with static html, then blogs and now Flickr images as a mode of presentation.
This is a list of successful amateur collections online:
Today’s Inspiration – illustration from the 40’s and 50’s
JonWilliamson.com – advertising 1940s-1960s
Pulp Fiction Flickr Group – 882 members who provide basic metadata and often label stuff within the image – currently contains 3,385 items.
Curio Cabinet Flickr Group – 1,206 members and 5,537 items
Visual Arts Data Service (VADS) is a more traditional site created by a cultural heritage institution. It contains 100,000+ images copyright cleared for use in teaching, learning and research in the UK. VADS is a very detailed static source of images with metadata, but provides no interaction.
Amateurs do provide metadata, but it is intuitive metadata. It might not fit into rigid buckets of data, but that doesn’t meant that the metadata available isn’t useful.
What are the boundaries between amateur and professional? Work vs hobby?
Many of these amateur sites get much more traffic than most standard museum sites. More than 50% of museum digitized images are never visited.
Memory institutions are starting to put things into the wider online community:
Smithsonian: photos in Smithsonian Flickr Commons
Tate: The How We Are Now project invited the public to contribute photos to the How We Are Flickr Group. The images were streamed to screens within the How We Are: Photographing Britain exhibit and 40 photos were chosen to be included as the last set of photos in the physical exhibit.
Victoria & Albert Museum: created a Flicrk group of photos taken at the V&A museum along with a long list of other V&A Flickr groups and streams
Oxford University’s Great War Archive: contains 6,500 items contributed by the public and related to the First World War.
Facebook and Twitter are being used more often for informing the community about their collections
Much of amateur research has been driven by advances in technology. A great example of this is the advent of affordable metal detectors led to dramatic changes in archaeology. The internet and Web 2.0 technology are arming a whole new generation of enthusists who can find one another and collaborate more easily than might ever have been dreamed of 20 years ago.
Next Steps & Conclusions
Future research will involve looking at the psychology of collection: archives vs collections. For now it is important to realize that institutions are not the only hosts of “worthwhile” digital objects. Pro-am (aka, pro-amateur) are doing better with using web 2.0 & getting more traffic.
What can memory institutions learn from this?
interact with user communities
use the ‘grand central stations’ of flickr, twitter, facebook
usability of flickr is better than what most memory institutions build for themselves
My Thoughts
This session considers the ways cultural memory institution can take advantage of the web by looking at what the successful enthusiasts are achieving. This research-backed approach confirms what I would have expected. Libraries, museums and archives are leaving a lot on the table when it comes to putting their collections online. Sites run by non-professionals are doing an amazing job of drawing in new audiences, keeping people around and then initiating conversation within that audience.
The Flickr Commons is a big step forward, but it isn’t the only option. There are also varying opinions about how successful the crowdsourcing aspect of the Flickr Commons is for memory institutions. A lot of this goes back to to a core question “how do we know if we have succeeded?”. There is much to be said for setting out clear goals when launching online initiatives. Is your goal increased traffic to your site or crowdsourcing of metadata? A great example of an initiative whose goal is clearly collection of crowdsourced metadata is the German Federal Archives who chose to use the Wikimedia Commons for their photo metadata initiative.
If you are trying to extend your mission of providing access to materials to the public, then how do you measure success? Putting your materials in what Melissa called “grand central stations” (or what I have also heard termed “public crosswalks”) definitely increases the chances serendipitous discovery by new individuals. That said, we can see from the successful blogs mentioned above that tackling a niche with enthusiasm and consistent posting can go a long way to building a following. JonWilliamson.com seems to have only launched back in November of 2008 with a post featuring a Scotch Tape Christmas ad from 1951. The author posted in May of 2009 that his images in Flickr had surpassed 100,000 views.
To conclude this post I leave you with a list of inspirational digitized collections online that were created by various cultural heritage institutions:
Publishers’ Bindings Online – discussed in SAA2007’s Session: Publishers’ Bindings Online – Digitization, Collaboration, Standardization and Community Building, a multi-institutional project that includes galleries of topical images combined with an essay that gives the images context. Two of my favorites are:
From Domestic Goddesses to Suffragists: The Story of Women Told on Bookbindings, 1820-1920
Indians, the Frontier, and the West in American Bookbindings
Calisphere – more than 150,000 digitized items organized for easy use by K-12 teachers. This is especially interesting in that it represents items already available in Online Archive of California, but organized in a way to make them easy to find and use with their target audience in mind.
Yiddish Books Online – A project by the National Yiddish Book Center that uses the Internet Archive as a platform to host 11,000 digitized out-of-print Yiddish books. This project is a nice cross between a branded custom site and a grand-central station
Have a favorite online collection website? Please share it in the comments below.
As is the case with all my session summaries from DH2009, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
Image credit: http://www.flickr.com/photos/mms0131/ / CC BY-NC-ND 2.0
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/co8HXawog18/
Session Title: Digital Lives: How people create, manipulate and store their personal digital archives
Speaker: Peter Williams, UCL
Digital lives is a joint project of UCL, British Library and University of Bristol
What? We need a better understanding of how people manage digital collections on their laptops, pdas and home computers. This is important due to the transition from paper-based personal collections to digital collections. The hope is to help people manage their digital archives before the content gets to the archives.
How? Talk to people with in-depth narrative interview. Ask people of their very first memories of information technology. When did they first use the computer? Do they have anything from that computer? How did they move the content from that computer? People enjoyed giving this narrative digital history of their lives.
Who? 25 interviewees – both established and emerging people whose works would or might be of interest to repositories of the future.
Findings?
They created a detailed flowchart of users’ reported process of document manipulation.
Common patterns in use of email showed that people used email across all these platforms and environments. Preserving email is not just a case of saving one account’s messages:
work email
Gmail/Yahoo
mails via Facebook
Twitter
Documented personal information styles that relate skills dimension to data security dimension.
The one question I caught was from someone who asked if they thought people would stop using folders to organize emails and digital files with the advent of easy search across documents. The speaker answered by mentioning the revelations in the paper Don’t Take My Folders Away!. People like folders.
My Thoughts
This session got me to think again about the SAA2008 session that discussed the challenges that various archivists are facing with hybrid literary collections. Matthew Kirschenbaum also pointed me to MITH’s white paper: Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use.
I am very interested to see how ideas about preserving personal digital records evolve. For example, what happens to the idea of a ‘draft’ in a world that auto-saves and versions documents every few minutes such as Google Documents does?
With born digital photos we run into all sorts of issues. Photos that are simultaneously kept on cameras, hard drives, web based repositories (flickr, smugmug, etc) and off-site backup (like mozy.com). Images are deleted and edited differently across environments as well. A while back I wrote a post considering the impact of digital photography on the idea of photographic negatives as the ‘photographers’ sketchbooks’: Capa’s Found Images and Thoughts on Digital Photographers’ Sketchbooks.
I really liked the approach of this project in that it looked at general patterns of behavior rather than attempting to extrapolate from experiences of archivists with individual collections. This sort of research takes a lot of energy, but I am hopeful that basically creating these general user profiles will lead to best practices for preserving personal digital collections that can be applied easily as needed.
As is the case with all my session summaries from DH2009, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/cKgkaEH1qyo/
When I read about Yahoo Image Search’s recent addition of a filter to return only creative commons Flickr images, I got all excited about what this might mean for images in the Flickr Commons. So I raced off to the Yahoo Image Search page to see how it works. The short answer is that the new special rights setting of no known copyright restrictions that they created for members of the Flickr Commons apparently doesn’t count.
For my test I searched for an exact match on “Ticket with portrait of George Washington”. This returns one result – the one image in Flickr with the same name, from The Field Museum in Flickr Commons. If you click on the ‘More Filters’ link, you will see other ways to filter your results – including the option to restrict your results to only include images whose creators permit reuse.
Next I clicked in the ‘Creator allows reuse’ and my one result disappeared! Quite disappointing in my book.
Google is also getting onto the ‘make it easy to search for reusable images’ bandwagon. Search Engine Land reported that Google Images Quietly Adds Creative Commons Filter. That post pointed me to Google Operating System’s search interface that lets you play with the options that Google has available. After a clicking through to some of the images returned by a Google Image Search for creative commons images of archives, the way the Google model appears to work is to look for creative commons badges or links on the page with the image. I even found Flickr creative commons images, but when I tried to find my Flickr Commons image of the ticket used above for my Yahoo image search experiment it wasn’t returned by Google either.
So if an archives (or museum or library) posts images on a page that indicates that the content is licensed under creative commons, it seems those images will then appear in Google’s image search as reusable. That is good news! Another way to get users to find your public domain images.
The question I am left is how to resolve the gap between Flickr Commons’ ‘no known copyright restrictions rights statement and both Google and Yahoo’s definition of reusable content.
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/2XyJayzeZxY/
Navigating the rapidly changing landscape of new technology is a major challenge for archivists. As quickly as new technologies come to market, people adopt them and use them to generate records. Businesses, non-profits and academic institutions constantly strive to find ways to be more efficient and to cut their budgets. New technology often offers the promise of cost reductions. In this age of constantly evolving software and technological innovation, how do archivists know when a new technology is important or established enough to take note of? When do the records generated by the latest and greatest technology matter enough to save?
Below I have include two diagrams that seek to illustrate the process of adopting new technology. I think they are both useful in aiding our thinking on this topic.
The first is the “Hype Cycle“, as proposed by analyst Jackie Fenn at Gartner Group. It breaks down the phases that new technologies move through as they progress from their initial concept through to broad acceptance in the marketplace. The generic version of the Hype Cycle diagram below is from the Wikipedia entry on hype cycle.
Each summer, Gartner comes out with a new update on Where Are We In The Hype Cycle?. Last summer, microblogging was just entering the ‘Peak of Inflated Expectations’, public virtual worlds were sliding down into the ‘Trough of Disillusionment’ and location aware applications were climbing back up the ‘Slope of Enlightenment’. There is even a book about it: Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time.
The other diagram is the Technology Adoption Lifecycle from Geoffrey Moore’s Crossing the Chasm. This perspective on the technology cycle is from the perspective of bringing new technology to market. How do you cross the chasm between early adopters and the general population?
Archivists need to consider new technology from two different perspectives. When to use it to further their own goals as archivists and when to address the need to preserve records being generated by new technology. A fair bit of attention has been focused on figuring out how to get archivists up to speed on new web technology. In August 2008, ArchivesNext posted about hunting for Web 2.0 related sessions at SAA2008 and Friends Told Me I Needed A Blog posted about SAA and the Hype Cycle shortly thereafter.
But how do we know when a technology is ‘important enough’ to start worrying about the records it generates? Do we focus our energy on technology that has crossed the chasm and been adopted by the ‘early majority’? Do we watch for signs of adoption by our target record creators?
I expect that the answer (such as there can be one answer!) will be community specific. As I learned in the 2007 SAA session about preserving digital records of the design community, waiting for a single clear technology or software leader to appear can lead to lost or inaccessible records. Archivists working with similar records already come together to support one another through round tables, mailing lists and conference sessions. I have noticed that I often find the most interesting presentations are those that discuss the challenges a specific user community is facing in preserving their digital records. The 2008 SAA session about hybrid analog/digital literary collections discussed issues related to digital records from authors. Those who worry about records captured in geographic information systems (GIS) were trying to sort out how to define a single GIS electronic record when last I dipped my toes into their corner of the world in the Fall of 2006.
It is not feasible to imagine archivists staying ahead of every new type of technology and attempting to design a method for archiving every possible type of digital records being created. What we can do is make it a priority for a designated archivist within every ‘vertical’ community (government, literary, architecture… etc) to keep their ear to the ground about the use of technology within that community. This could be a community of practice of its own. A group that shares info about the latest trends they are seeing while sharing their best practices for handling the latest types of records being seen.
The good news is that archivists aren’t the only ones who want to be able to preserve access to born digital records. Consider Twitter, which only provides easy access to recent tweets. A whole raft of third-party tools built to archive data from Twitter are already out there, answering the demand for a way to backup people’s tweets.
I don’t think archivists always have the luxury of waiting for technology to be adopted by the majority of people and to reach the ‘Plateau of Productivity’. If you are an archivist who works with a community that uses cutting edge technology, you owe it to your community to stay in the loop with how they do their work now. Just because most people don’t use a specific technology doesn’t mean that an individual community won’t pick it up and use to the exclusion of more common tools.
The design community mentioned above spoke of working with those creating the tools for their community to ensure easy archiving down the line. In our fast paced world of innovation, a subset of archivists need to stay involved with the current business practices of each vertical being archived. This group can work together to identify challenges, brainstorm solutions, build relationships with the technology communities and then disseminate best practices throughout the archives community. I did find a web page for the SAA’s Technology Best Practices Task Force and its document Managing Electronic Records and Assets: A Working Bibliography, but I think that I am imagining something more ongoing, more nimble and more tied into each of the major communities that archivists must support. Am I describing something that already exists?
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/7N6kUqwiFz0/
Mark Shelstad, head of Archives and Special Collections at University of Texas at San Antonio, sent me a link to the TARO (Texas Archival Resources Online) page for UTSA’s Archives and Special Collections finding aids in XML format.
With the current scripts, these are the fun tag stats:
1,684 total tags extracted
75% (1,266 tags) are associated with only one finding aid
3% (51 tags) are associated with 10 or more finding aids
Collection Size
235 out of tne 253 collections ended up with a collection size of 0.
Consider the encoding of the collection size in the Guide to the Women’s Overseas Service League Records, 1910-2007:
77 linear feet (approximately 44,000 items)
Contrast this with one of the examples where the size of the collection was extracted properly by the current script:
8.4 linear feet (14 boxes)
Sometimes it feels like a game of Where’s Waldo. In this case we are simply missing the set of tags from the first example. Off I went to the EAD tag descriptions to find the guidelines for use of the tag, where I found this overview of the tag:
A wrapper element for bundling information about the appearance or construction of the described materials, such as their dimensions, a count of their quantity or statement about the space they occupy, and terms describing their genre, form, or function, as well as any other aspects of their appearance, such as color, substance, style, and technique or method of creation. The information may be presented as plain text, or it may be divided into the , , , and subelements.
Bad news for my script logic - both versions are valid! This is a great example of how valid encoding can still present challenges. While in this example it seems just as easy to parse the version with the tags as without, it will only be through examination of a much broader sample of data that we can determine how much of a problem we have on our hands with this scenario of size data included in the tags without enclosing or tags.
Inclusive Dates
Twenty of the UTSA collections came through with no years. When I examined the data, I found an assortment of formats that my current script could not parse properly, including the examples below:
1917-1980 (bulk 1920-1945)
1876-1903, 1914-1919, 1940-2002
1940s, 1970s-1990s
Another encoding approach that could not be parsed was the one used for the finding aid of the Church Women United of San Antonio Records. In this case the tag is within the tag as seen here:
Church Women United of San Antonio Records,
1961-2005
Among the finding aids for which I did extract a range of inclusive date years, I also found issues with values like 1950s-1990s. The current script interpreted this to represent 1950 through 1990, but I believe it would be more properly translated as representing 1950 through 1999.
General Code Fixes
The University of Texas at San Antonio’s finding aids have provided additional examples of the following data and encoding issues already identified in earlier data sets:
Inconsistent repository titles (26 different variations of “The University of Texas at San Antonio Library”)
Titles with embedded and tagged dates
Carriage return and tab characters that need to be removed
Emphasis within a title or abstract added via a tag (such as Storyletters seen in A Guide to the Storyletters Records, 1991-2000) which interrupts extraction of text at that point
Next Steps
This is the last data set I am analyzing before tackling actual updates to the ArchivesZ data extraction script. My next step is to review and prioritize my long to do list for updates to this script. Most of what I have found in my examination of the data sets are ways in which my script was not smart enough to handle valid variations in encoding and the tabs, carriage returns, formatting tags and special characters found throughout everyone’s XML. Yes, there are some cases in which the data itself is less than optimal (such as non-standardized repository titles) or the values challenging (so many ways to describe the size of a collection!), but overall I am optimistic about how much more I can improve the extraction script before I have to resort to hand correcting records in the database.
Thanks to everyone for your patience with these data analysis posts. Onward to programming!
This post is from from: Spellbound Blog.
http://feedproxy.google.com/~r/Spellboundblog/~3/ByaTKs5R6eQ/
Amanda Ross, project archivist for the Forest History Society, sent me 57 EAD finding aids to include in the ArchivesZ project. These are the data challenges that the current data extraction script does not address:
Titles with embedded tags or punctuation. Generally the script drops anything after it hits either, so rather than a title like William E. Towell Papers, 1941 - 1988, my database ended up only with “William E Towell Papers,” based on this encoding: Inventory of the William E. Towell Papers, 1941 - 1988
Need to handle a conversion factor for a size of “1 folder” (as found in the Inventory of the Biltmore Forest School Images, 1890 - 1988)
My script chokes on the Inclusive Year format “1910 and 1931 - 1937″ (as found in the Inventory of the Alfred Cunningham Papers, 1910 and 1931 - 1937)
The presence of a character within the tag, used to force a line break, is preventing my script from extracting any size information at all (as found in the Inventory of the DeWitt Nelson Papers, 1940 - 1976)
Within the tag, my script drops everything after an tag (making for a very short abstract in the case of the Inventory of the Arthur Bernard Recknagel Auxiliary Photograph Collection, 1911 - 1947).
The most dramatic issue, seen across all the finding aids in this set, is that no subject data was extracted from any of the finding aids. My working theory for the moment is that this is due to the use of and tags as shown here:
Subject Headings
Audiotapes
Ainsworth, John H., 1909-
Businessmen -- United States
This is in contrast with this example of encoding from Syracuse University:
Subject and Genre Headings
Adult education
Adolphson, L. H.
Bradford, Leland Powers, 1905-
Or this sample from Oregon State University:
Aitken, Frances Alva, 1889-1970.
Oregon Agricultural College. Class of 1910.
Oregon
Agricultural College--Students.
Corvallis
(Or.)
Student
activities--Oregon--Corvallis.
Both the Syracuse and OSU examples are handled by the current state of the data extract script.
Amanda pointed me to the NCEAD Best Practice Guidelines for EAD 2002. Down in Appendex G: How Do I Encode…, the second question down is “What if I have multi-part scope notes, biographical notes or subject headings?” followed by exactly the and tag usage as is being done for the Forest History Society finding aids. This format clearly should be handled.
So, no fun tag stats for this run - but I hope to fix my ruby script so that the Forest History Society finding aids can be incorporated into the data set I use for testing version 2 of ArchivesZ. My ruby script to do list is getting quite long!
http://feedproxy.google.com/~r/Spellboundblog/~3/YXasAdfk48g/
Thanks to Archivism.net for this animated gem from DigitalPreservationEurope. Somehow they manage to include digital preservation, trusted data repositories, metadata and refreshing storage media in their story of Team Digital Preservation vs Team Chaos.
I really want a t-shirt with the Bit-Rot guy on it!
http://feedproxy.google.com/~r/Spellboundblog/~3/X-Ly3ZsvBS4/
There are still spaces available in a workshop I am giving May 6, 2009 at the University of Maryland’s iSchool. The workshop, titled Benefits of Blogging: Why you should start a blog today!, is free and open to anyone in the University of Maryland community.
This is the workshop description:
Blogging is an easy way to build your professional network, improve your writing and get your ideas out there. Information professionals need to understand how to take advantage of the promise of blogs, both to support their careers as well as a tool for institutions. This workshop will be led by an active blogger who has found great success in becoming part of a broader community via her blog. Learn about free tools, things to keep in mind and why you should start a blog today.
When: 5pm Wednesday May 6, 2009
Where: iSchool Student Lab, Hornbake South room 2108
Registration: Maryland iSchool Workshop Registration
Are you interested in this session, but not affiliated with the University of Maryland? Please let me know, either via my contact form or a comment below, and I will see what I can do about putting together another session off-campus.
http://feedproxy.google.com/~r/Spellboundblog/~3/u_rVL_6Y-FE/
Gina Strack of the Utah State Archives and Records Service provided me with access to the XML of 1,196 EAD encoded finding aids. These EAD 2.0 XML files are a product of a grant funded project completed last year to migrate from EAD 1.0 finding aids. Their website includes a detailed account of the EAD Project.
These finding aids have helped me identify three types of ArchivesZ data challenges:
strange characters
broad composite subjects
determination of accurate collection size
Strange and mysterious characters!
These finding aids use a special character in the place of the standard Library of Congress double dash which normally appears between subsections of the subject heading.
An example subject from the Utah Government XML looks like this:
Women—Suffrage—Utah.
Viewing the same subject in a pure text editor (such as vi):
Women—Suffrage—Utah.
By the time it gets into my database and is pulled out via a query in MySQL Query Browser it looks like this:
Women—Suffrage—Utah.
Rather than just stripping out all instances of —, my plan is to replace them with the standard Library of Congress double dash. This will ensure that the existing code that breaks the subjects down to tags will still work.
Composite Subjects
When I say “composite subject” what I mean is a subject that includes multiple very disparate terms. Rather than the Library of Congress style subjects, all aspects of which relate to the collection in question, these composite subjects cover multiple subjects which are grouped together for convenience.
This is a list of some of the most popular subjects for the Utah Gov collections:
Politics, Government, and Law
Business, Industry, Labor, and Commerce
Science, Technology, and Health
Arts, Humanities, and Social Sciences
These subjects throw a monkey wrench into my theories about decomposing subjects based on commas. The collections to which these subjects are assigned likely fit in only one of the component themes. For example, the “Inventory of Publications from Department of Technology Services, 1993-2008″ is assigned the subject “Science, Technology, and Health”. If I divide this subject into 3 separate tags, the Science and Health tags would be quite misleading.
So that leaves me a bit trapped. If I want to divide subjects such as “Art, Cuban, 20th century”, as I discuss in my Syracuse University post, then I end up also dividing these umbrella subjects which separate such very divergent terms with commas.
This issue goes on my list of reasons to add a repository configuration file for use by the data extraction script.
Accurate Collection Size
In my quest to convert all sizes to linear feet - sizes such as these are challenging:
0.20 cubic foot and 1 microfilm reel
0.35 cubic foot and 2 microfilm reels
I also have situations of sizes be specified in multiple sections of the finding aid. The Inventory of ALERT Foundation records from Governor Bangerter, 1986-1991 has a collection level size of “0.50 cubic foot and 2 microfilm reels”, but further down in this finding aid I see this:
series: ALERT Foundation records
box 1, folder 1: Documentary: “”Letters from our Children,”" Motion picture film reel, 16mm
box 1, folder 2: Documentary: “”Letters from our Children,”" VHS videocassette
box 1, folder 3: Documentary: “”Letters from our Children,”" VHS videocassette
box 1, folder 4: Documentary: “”Letters from our Children,”" VHS videocassette
So either when they said 2 microfilm reels, do they really mean a 16mm motion picture film reel and a VHS videocassette? How sizes are specified in a specific repository’s finding aids is another possible candidate for a repository level configuration script.
Tagging Statistics
Finally, here are a few tag stats:
Only 31 tags (1.5% of all Utah Government tags) are associated with 10 or more collections
1404 tags (71.5%) are assigned to only a single collection
107 collections have been assigned only 1 tag
10 collections have no subjects
Of course these statistics are based on the current incarnation of the data extraction script. After I modify the script, there will be a greater number of tags and (hopefully) more overlap of tags across multiple collections. These types of statistics should help me gauge how well my data extraction logic is working.
http://feedproxy.google.com/~r/Spellboundblog/~3/ylrA4umKkEg/
The title says it all. I won 2nd place in the “Smart Computers and Computing” section of the University of Maryland’s Graduate Research Interaction Day (GRID) for my poster ArchivesZ: Visualizing Archival Collections (what is in all those boxes?).
1st place in “Smart Computers and Computing” went to the fabulous Dave Levin for his presentation on TrInc: Small Trusted Hardware for Large Distributed Systems.
Overall, it was a great experience. I wish I could have been in multiple rooms at the same time so I could have seen more posters and presentations. I also wished I had understood that I could have presented with either a poster or a power point deck. That was not entirely clear ahead of time. The downside of of my choice was being tied to my poster, but the upside is that I still have the poster that can be examined by readers like you. Obviously it all worked out in the end.
A big thanks to everyone in the Graduate Student Government who worked so hard to bring this event together.
http://feedproxy.google.com/~r/Spellboundblog/~3/4f_Qpe35suE/
The latest example of a media company finding a way to profit from their archives, Warner Brothers has launched the Warner Brothers Archive. Nestled neatly within the the WBshop.com website, among the TV shows and promotional merchandise, the movies from the archives include everything customers have come to expect from an online shop. We have user reviews, video clips and the ways to share links. You can browse by genre or decade. They are currently holding a vote to see what title should be added to the inventory next.
One of the films available from the archives is the 1975 action feature Doc Savage: The Man of Bronze. Embedded below is a 30 second clip showing Doc Savage entering his “Fortress of Solitude”. They could have made it easier for me to embed this (I had to go figure out how to embed FLV files into this blog post) - but I am happy that they let me embed it at all. If you don’t see a video below, you probably need to install adobe’s shockwave. You can always go watch the clip on the Doc Savage page (click on Video Trailers & Clips).
Each film page carefully notes “This film has been manufactured from the best-quality video master currently available and has not been remastered or restored specifically for this DVD and On Demand release.” and then directs the customer to view the preview clip to evaluate the film’s quality.
The details comes out when we dig into the Warner Archive FAQ. It is here that we learn that the DVDs we can purchase for $19.95 are produced “on-demand”. How are they different from the DVD’s you buy at the store?
DVD’s produced on-demand are similar to, but not quite same as, DVD’s you’d buy at the local video store. DVD movies you buy at the local video outlet are manufactured from a mold via a stamping process whereas on-demand DVDs are “burned”. Each carries information read by the DVD player, but the physical properties of the two are different.
Most DVD players are compatible with both commercial DVD-Video and one or more of the “recordable DVD formats. Our on-demand DVD’s are manufactured using the most widely accepted format, DVD-R.
They also answer this question about copying the DVDs:
Q: I’m trying to make a few extra copies of my DVD, for “safe keeping” and for a surprise present to my mom. When I copied the disc it was un-playable. Why is that? And what can I do about it?
A: This DVD on-demand disc was recorded using CSS encryption. CSS is designed to prevent unauthorized reproduction of the DVD. We’re delighted that you’d like to surprise your mother with the gift of a Warner Bros classic movie. May we suggest she’d like an officially produced and packaged DVD even more? As such we welcome your visit back to the Warner.com classic store at any time.
In addition to being able to purchase DVD-Rs with CSS encryption, many of the archives films permit a download option. Archives movie downloads appear to cost $14.95. The Digital Products FAQ explains the details, but these are the highlights of what comes along with that $5 in savings:
Downloads are protected by DRM
Downloads only play on MS Windows boxes - no Mac or Linus support
You can burn the movie to a CD or DVD, but they “are Digital Rights Management (DRM) protected, so you will only be able to watch the video on the computer or device on which it was originally purchased.”
I give a big thumbs up to Warner Brothers for coming up with a way to leverage their archives. I am less impressed with the non-open format and DRM restrictions they are placing on both the DVD-Rs and downloads. A model that states that a purchased download can be played as often as I want - but requires a specific operating system and only permits play on the same machine from which I made the purchase seems untenable. If I were to buy one of these films, I would spend the extra $5 and get the DVD-R which at least can be played on multiple machines, even if it can never be copied!
http://feedproxy.google.com/~r/Spellboundblog/~3/J3NBoUHA08I/
Come meet me and hear my 8 minute talk in front of a poster about ArchivesZ.
When? April 13, 2009, 1:30-3pm
What? University of Maryland’s Graduate Research Interaction Day (GRID)
Where? University of Maryland’s Stamp Student Union
My ArchivesZ poster has been assigned to the “Smart Computers and Computer Science” theme. I will be with my poster in the Benjamin Bannekar B room at UMD’s Stamp Student Union from 1:30 to 3pm. If you are attending GRID, please stop by and say hello!
Want a preview or can’t make it? Here is the poster in question:
http://feedproxy.google.com/~r/Spellboundblog/~3/WCbFA7iGZyw/
In celebration of Ada Lovelace Day 2009, I decided to see how many different archival resources I could dig up that document the achievements of women in technology.
My first find has me giving a big hats off to IBM. They have a page dedicated to IBM Women in Technology, but the real fun is in digging through the persona pages listed in the IBM Women in Technology International (WITI) hall of fame. You can watch oral history interviews with women like Frances Allen, an “expert in the field of optimizing compilers”, or Caroline Kovac, who “oversees the development of cutting-edge information technology at IBM for the life sciences market”.
Beyond IBM’s offerings I ran into a classic challenge - how do you find archival collections specifically about women in technology? A visit to the American Institute of Physic’s archive found me a photo mini-exhibits of of Marie Curie and Maria Goeppert Mayer. A search for “woman scientists” on the Online Archive of California (OAC) found these:
Contributions of 20th Century Women to Physics : Records of the UCLA Website 1912-2001: The records include documentation of the original papers in which discoveries were first reported, biographical material, including some photographs, and descriptions vetted by Field Editors.
Katherine Esau papers: The Katherine Esau papers represent the entire body of plant anatomy research Esau conducted from 1924 when she began research on curly top virus in sugar beets for the Spreckels Sugar Company to 1991 when she published her last article. The collection includes correspondence, research notes, photographs, biographical material, objects, and printed matter.
The challenge in finding collections like these is that you need to hunt through each institutions collections. Looking for the records of a specific individual is easiest, but finding collections in general relating to women and technology is a lot harder. The first collection listed above from OAC has the subject “Women in physics –Archival resources” assigned to it, which seems very useful until you realize that it is the only collection assigned this subject in all of OAC.
I want to leave you with the thought that preserving the notes and writing of young innovative women who are passionate about technology is what will let future generations read their words just as young women can read and be inspired by the words of Ada Lovelace today.
Want to read some of Ada’s writing? Get your hands on a copy of Ada, the Enchantress of Numbers: A Selection from the Letters of Lord Byron’s Daughter and Her Description of the First Computer. Want to read something a bit more contemporary that is halfway between memoir and eclectic visit to the depths of software programming, then try Ellen Ullman’s Close to the Machine: Technophilia and Its Discontents.
Technorati Tag: ALD09post
http://feedproxy.google.com/~r/Spellboundblog/~3/C1pDrj-vaqM/
I received a zip file of 1,771 EAD encoded finding aids from the kind EAD enthusiasts at the Seely G. Mudd Manuscript Library. These finding aids came from five divisions within Princeton’s Library:
University Archives
Public Policy Papers
Manuscript Division
Latin American Ephemera Collection
Engineering Library
So onward to the data issues and what they mean for my ever growing ’script fix to-do list’.
Repository Names
As we saw with the Oregon State University finding aids, the finding aids from Princeton University had a wide range of different values for repository names. In the list below we spot some issues. Some end in periods, some do not. One has extra space (probably a carriage return) in the middle. One does not include Princeton in the repository name. Once we have many repositories’ finding aids in ArchivesZ, a repository name of ‘Engineering Library’ does not tell the user enough about where those collections can be found.
Here is the list of repository titles my script extracted:
Princeton University Library. Department of Rare Books and Special Collections.
Engineering Library
Princeton University Library
Princeton University Library. Department of Rare Books and Special Collections.
Princeton University Library.
My script can handle the extra period and the extra spaces, but the non-specific name would need to ultimately be fixed on the source side.
Collection Size
The current script assumes that there is only one extent value specified to express the size of the collection. Princeton’s finding aids showed me examples of multiple extent values. For example, the Christina Georgina Rossetti Collection has both a collection level size of 0.4 linear feet (1 archival box) as well as a 2nd extent specification corresponding to a specific folder with the value of (1 poem, 3 drawings, 1 photo, 1 incomplete article). The script must be modified to only consider the collection level size.
Complicated Titles
The current script logic apparently does not handle what I would call ‘complicated collection titles’. For example, I ended up with “Edward Livingston Papers, ” as the title for a collection with a full title of Edward Livingston Papers, 1683-1877 (bulk 1764-1836). This is the way that this title is encoded:
Edward Livingston Papers, 1683-1877 (bulk 1764-1836)
Too Many Tags
The Engineering Library’s Department of Mechanical and Aerospace Engineering Technical Reports: Finding Aid has 522 tags assigned to it! Almost all of these are the names of the authors of the individual reports. This scenario goes on the list of reasons why I might choose to not include (at least for this version) persname subjects. The other option for handling this situation is to only use subjects assigned at the collection level and ignoring subjects assigned at lower unit/container levels. Without the author tags, this single collection ends up with this nice, reasonable list of tags:
Fluid mechanics
Mechanical engineering
Combustion
Aerospace engineering
Propulsion systems
Year Challenges
I found two different issues related to year ranges:
Women in Argentina, VI, 1989-2001: Finding Aid: The current script does not properly extract the inclusive dates which are encoded within the titleproper tags, but rather assumes that it will be encoded using a unitdate tag.
An assortment of finding aids include subjects which have year spans as part of the subject. When these subjects are decomposed into tags, we end up with tags like ‘1850-1950′. Since we have the time period communicated via the inclusive dates, I will likely just drop these portions of the subjects rather than create a tag for each unique year span.
General Code Fixes
It is reassuring at this point to spot the same issues with data from multiple repositories. Here are data and code logic issues that I have seen elsewhere that are revalidated by Princeton’s finding aids:
Need to strip /n & /t characters
Need to break subjects up based on commas
Need to drop final periods from repository names, subjects and titles
The designation of size in volumes, as in “793 volumes”. I need to pick an approach for translating from volumes to linear feet
The script to-do list is still getting longer, but I am not done cycling through new institutions’ XML files to find new issues. Want to share your institution’s EAD finding aids in XML format with the ArchivesZ project? Please drop me a line via my contact form.
Image Credit: Top image from the Seeley G. Mudd Manuscript Library homepage.
http://feedproxy.google.com/~r/Spellboundblog/~3/w6XbLnuC9NM/
The Archival Research Catalog (ARC) of the US National Archives and Records Administration (NARA) needs to be replaced. NARA has put out an official Request for Information (RFI) and plans a “Vendor Day” for April 6th with final responses required by April 24th, 2009.
This is exciting for two very different reasons:
New catalog software!
Getting to read all the gory details about ARC!
If this makes you curious, then go give the RFI a read, but here are some juicy ARC tidbits to consider:
ARC’s Logical Data Model - 20 pages worth of data model that I am sorely tempted to print out, tape together and hang on a very large wall
ARC was built as a customization of OLIB back in 2003 and has been upgraded along the way
ARC currently contains 2,478,259 archival descriptions and 8,810,938 authority records
An average of 25,000 archival descriptions are added to ARC each week
The RFI states: “NARA has outgrown the existing ARC system and requires a more robust solution that’s capable of scaling to support at least 250 million archival descriptions and links to upwards of 500 million digital copies over the next 4-7 years.” Why so many records? Because all of NARA’s partners are digitizing records so quickly that they are creating a massive backlog of documents and the future only holds more of the same.
This RFI is only for planning purposes, but I will definitely be following this story as it unfolds.
http://feedproxy.google.com/~r/Spellboundblog/~3/CNkBoIGUDTM/
Sunshine Week 2009 is a national initiative spearheaded by journalists to “open a dialogue about the importance of open government and freedom of information”. The Electronic Frontier Foundation (EFF) chose to mark Sunshine Week this year by announcing the release their new tool for searching EFF’s FOIA documents. Learn more about EFF’s efforts to make open government a reality in this EFF call to action.
The Sunshine Week blog announced the release of a 2009 Survey Of State Government Information Online. The survey results explains:
Using a standardized worksheet surveyors rated each section on its usability, looking at factors such as whether the information was clearly linked, if full reports or only summaries were available, whether viewing and/or downloading was free, and whether the data were current. The categories for the survey were selected for generally serving the overall public good — the kind of information people need for their own health and well-being and that of the community.
See the worksheet for details on the categories selected for inclusion in the survey and the results for lots of interesting tidbits about exactly which states provide access (or not) to various public information online. A few very randomly selected highlights:
Maryland: Nursing home information, mhcc.maryland.gov/consumerinfo/nhguide, got high marks for facilitating online search and for allowing users to “compare data in a variety of ways.”
Iowa: The state auditor’s office reportedly offers online more than 5,000 full reports of all its audits dating back to 2001. The audits are easily accessible from tabs on the main Web page, www.auditor.iowa.gov.
Colorado: Bridge inspection reports in Colorado are considered public, but they are not published online. Anyone who wants to see the reports is advised to file an FOI request.
All of this made me recall my blog post about the parallel goals of journalists and archivists when considering digital public records and databases. I wanted to celebrate Sunshine Week by looking for other online sources of government information. My first stop was the website of the Council of State Archivists (CoSA). They had a couple of great resources including:
A 2007 status report on the state of State Records (and it looks like a new report should be out soon - their 2008 survey just closed at the end of January 2009)
Directory of State Archives and Records Programs
Details on their Local Government Project
A bit further afield we find GovernmentDocs.org advertised as a “community government document reviewer system”. On their about page we read:
With the GovernmentDocs.org system, citizen reviewers can engage in the government accountability process like never before. Registered users can review and comment on documents, adding their insights and expertise to the work of the national nonprofit organizations which are partnering on this project. This new information then becomes instantly searchable. The text of each document is searchable, as well, thanks to a powerful Optical Character Recognition (OCR) functionality.
GovernmentDocs.org adds a powerful layer to government transparency and accountability by indexing documents in a user-friendly manner that is remarkably easy to share. Every page of every document has its own unique url, allowing you and other users to link to that page on blogs, send emails about the documents to friends, and expose the information to a wider audience.
Here is an example GovernmentDocs page taken from a request submitted by CREW (Citizens for Responsibility and Ethics in Washington) regarding the Endangered Species Act. Each GovernmentDocs page has a unique URL, full text transcription of the page and supports comments and reviews. The possibility of building up a community around these records is very real. I am curious to see how many citizen reviewers and comments are associated with these documents a year from now.
Please help celebrate Sunshine Week by exploring all these amazing resources!
http://feedproxy.google.com/~r/Spellboundblog/~3/58Qunjcy_R8/
The Syracuse University Special Collections Research Center has also been so kind as to provide the XML source files for their finding aids for use in the ArchivesZ project. I loaded 572 finding aids and no errors were generated during the parsing of the XML files.
My scripts extracted 6632 unique ‘tags’ from the subjects assigned to the finding aids. As part of the data parsing and loading of data for use in the visualizations, the script divides up compound subjects into tags. For example, in the subjects we find assigned to Syracuse University finding aids we find these values (number shown is number of finding aids to which that subject is assigned):
Art — American — 20th century (1)
Art — Cartoonists (68)
Art — Cartoonists. (3)
Art — Exhibitions. (1)
Art — Illustrators (36)
Art — Illustrators. (1)
Art — Painters (77)
Art — Philosophy. (1)
Art — Sculpture (33)
As well as subjects, where the components are separated by commas such as these (number listed indicates total finding aids assigned that subject):
Art, American (33)
Art, American. (46)
Art, American, 20th century (28)
Art, American, 20th century. (31)
Art, Cuban, 20th century (1)
Art, Modern (1)
Art, French, 20th century. (1)
The goal is to capture the core ideas - to capture the overlap in subject matter among diverse collections. All of the collections with any of these subjects are about Art. With the current script, the tag Art is associated with 179 collections from Syracuse University. You can see from this tiny subset of subjects that other themes would be revealed when these subjects were decomposed more completely - and this just scratches the surface.
Out of the 6676 subjects, 5658 subjects are assigned to single collections. Out of the 6632 tags the current script extracted from those subjects, 5594 tags are assigned to single collections. Not much improvement with the current state of the script.
While currently the script does a good job with the Library of Congress double dash separation pattern, the Syracuse University data has shown me a number of other standard patterns that need to be handled which can be seen in the small sampling of art related subjects shown above. The easy one is removing periods and stripping spaces from the end of subject values. The harder change will be to implement smart separation of subjects into tags based on commas. This would need the code to only break up values while leaving and alone. I will also need to examine values from across various institutions to decide if it is better to break them up or leave them be.
Other than these subject issues, there are a few other script modification that I will need to make based on scenarios the data in the Syracuse finding aids have shown me:
Syracuse University uses an entity to populate the repository values - the current script does not handle this at all.
Ensure that single item collections are assigned a size of .25 linear feet
Linear ft must be added as another recognized abbreviation for linear feet
All these issues are being added to my master ‘to do’ list for updating the EAD parsing script. Onward to the next data set.
Want to share your institution’s EAD finding aids in XML format with the ArchivesZ project? Please drop me a line via my contact form.
Image Credit: Syracuse University image above from Syracuse University Special Collections Research Center home page.
http://feedproxy.google.com/~r/Spellboundblog/~3/BOoKWkiPmrM/
The Oregon State University Archives has generously contributed 356 of their finding aids in EAD format for use in the development of version 2 of ArchivesZ. This is my first post in a what will likely be a series of looks behind the scenes at the challenges facing a project like ArchivesZ on the data level.
Version one of ArchivesZ only used finding aids from the University of Maryland and the Library of Congress. This was definitely a case of the path of least resistance. I attend the University of Maryland and the Library of Congress has a very convenient page providing links to all their Finding Aids source XML files. A very key aspect of creating version 2 of ArchivesZ is making sure that the scripts that pull data from EAD XML files is robust enough to handle the encoding practices of a very diverse range of institutions.
Please keep in mind that OSU is likely to bear the brunt of many basic data issues that I would have unearthed with whatever data sets I tried first!
There are 3 crucial data elements on which the visualizations of ArchivesZ depend: subject, inclusive dates, and collection size. Each element presents unique challenges. The script parsing issues I am uncovering with the OSU finding aids are currently worst for collection size. In order to make pretty charts which let people compare the quantity of materials in each collection (or record group - please forgive that I use the term ‘collection’ to mean any set of records for which a finding aid has been created), we need to be able to assign a single number to represent the size of each collection. Based on the values used in the LOC and UMD finding aids, we chose to go with linear of feet as our standard unit of measurement. So the trick is to translate whatever archivists choose to put into the element of their finding aid into some number of linear feet.
These are the size conversion rules we implemented for version 1 of ArchivesZ:
1 microfilm reel = 1 linear foot
Collections represented only by a number of items will be represented as .25 linear feet
If size only specified in number of boxes, then 1 box = .5 linear feet
When the size is given in some different types of units, they are prioritized in the following order: linear feet > boxes > microfilm reels > items
This works reasonably well when the physical description values are simple - it starts to fall apart when what is entered is more complicated. Here are some examples of the physical descriptions in the OSU finding aids:
Guide to the Phi Kappa Phi-OSU Chapter Records: The display in the ‘pretty’ version of the finding aid online shows this: 5.5 cubic feet (9 boxes, including 2 oversize boxes) (3 microfilm reels)
The version in the XML file is this:
5.5 cubic feet
9 boxes, including 2 oversize boxes
3 microfilm reels
With the current algorithm, this finding aid would be marked as being 3 linear feet in size. At a bare minimum, I must add ‘cubic feet’ as another unit to be converted. More difficult to discern is if I should have a value of 5.5 linear feet (assuming 1 cubic foot = 1 linear foot for the purposes of these comparisons) or a value of 8.5 linear feet (5.5 + 3 linear feet for the 3 microfilm reels). There is never going to be a perfect answer here, but clearly my logic needs to be more sophisticated than it is now.
Harvey L. McAlister Collection: The display in the pretty version of this finding aid online is this: 1 cubic foot, including 26 photographs (4 boxes, including 2 oversize boxes, and 1 map folder)
The version in the XML file is this:
1 cubic foot, including 26 photographs
4 boxes, including 2 oversize boxes, and 1 map folder
With the current algorithm, this finding aid would be marked as being 1 linear foot in size. From looking at these two examples, it would seem that this would be fine and in fact - for the purposes of calculating a comparable size - only looking at the first value might be the way to go - at least for OSU finding aids.
There are some other simpler issues relating to standardization in the way that certain values are entered. For example, after ingesting 173 finding aids from OSU (the number I got through before my script flat out choked on a size designation), I ended up with five different repositories added to my REPOSITORIES table. I had expected only one. Each of these was entered as repository name — and I have included the length of each value to show how extra spaces are causing part of the problem:
Oregon State University Libraries - length 36
Oregon State University - length 23
Oregon State UniversityLibraries - length 32
Oregon State University Libraries - length 36
Oregon State University Libraries - length 33
Some of these I can handle by adding smarter trimming of trailing spaces - but in this case it is clear that typos and inconsistency are also a challenge. I checked and each of these different values, within the element is used by at least 10 finding aids. Perhaps they have been inherited over time from a template?
I have considered creating a repository definition file that could be used when loading finding aids from one repository at a time. This would remove dependence on perfect replication of these sorts of values while still supplying the data needed to let people limit their searches by a named repository.
The last issue is the most minor. There are many /n and /t characters throughout the XML documents. These I plan to simply strip out as the script parses the XML file.
A big thank you to Elizabeth Nielsen, Senior Staff Archivist at OSU Archives. Her response to my query about OSU’s comfort with my taking apart their finding aids in public on my blog was “Bring it on – we’re tough!”.
It is fascinating to dig into new finding aids and see how the parsing script handles what it finds. I plan to test the existing script on XML from more sources to see all the things that must be fixed. Then I get to wrap my head around code that someone else wrote (another member of the original ArchivesZ team wrote the version 1 ruby script). For those of you who are not programmers, you can skim through my Book Review of Dreaming in Code to get a handle on why this can be harder than it sounds like it should be.
Want to share your institution’s EAD finding aids in XML format with the ArchivesZ project? Please drop me a line via my contact form.
Image Credit: OSU Archives image above from the OSU Archives Home Page.
http://feedproxy.google.com/~r/Spellboundblog/~3/qeydTFjO9Sc/
Centropa. org features video photo montages that combine Jewish family photographs with oral history. I found my way to Centropa from the Time.com article Old Nazi News Makes Headlines in Germany which includes Kristallnacht in Words and Photographs from Centropa, but Centropa’s mission reaches beyond recalling the Holocaust. Centropa bills itself as “an interactive database of Jewish memory”.
The first oral history project that combines old family pictures with the stories that go with them, Centropa has interviewed more than 1,350 elderly Jews living in Central and Eastern Europe, the former Soviet Union, and the Sephardic communities of Greece, Turkey and the Balkans. With a database of 25,000 digitized images, we are bringing Jewish history to life in ways never done before.
Their fleet of 140 individuals conducted extensive oral interviews and digitized thousands of old family photos. They are quite intent on clarifying that they do not create videos during their sessions with their interviewees. Instead, they record audio of their multi-hour sessions, transcribe these sessions and combine them with the digitized family photos to create their movies.
The juicy center of their website is found in the Centropa Movies which are alternately billed as a “library of rescued memories” and a “digital bridge back to a world destroyed”. Their movies are also available via iTunes and on the CentropaOffice YouTube Channel. The movie I have included below tells the story of Judit Kinszki and focuses on her father Imre Kinszki, a budding photographer from Budapest, Hungary. From this movie’s Centropa Movie page you can also navigate to Judit Kinszki’s biography , the full family photo album and a study guide for this movie.
The amount of detail provided with each posted interview is really incredible. Biographies, detailed notes on each photo, the study guide, a family tree and a currently grayed out but promising link to “Discuss Movie”. This site has clearly given great thought to how to support teachers and has followed that vision through in the form of tons of supporting materials. Centropa has chosen the path of quality over quantity with the 17 movies currently posted.
Upon further reflection, I realize now that the movies are an outgrowth of the database of photographs and biographies. The detail was not added to support the videos - but rather the videos are the next step of evolution beyond the photos and interview transcripts.
In addition to the movies they offer a Recipe Archive, downloadable eBook versions of some of their interviews as well as Centropa Student, aimed at high schools in Europe, North America, and Israel. For those of you working on your own oral history projects, there is the Centropa Oral History Tool Kit, available in 5 languages. The Centropa Glossaries are less glossary and more a detailed list of people, social groups, events and terms that can be searched by country, type or keyword. Finally, don’t miss the ‘Narrated Stories and Introductions’ featured on the right sidebar on the Centropa Movies page, such as Maps, Central Europe and History or the Introduction to Centropa for US Students.
Reading Centropa’s claim that they are the first to combine the use of family photos and oral histories made me recall the University of Alaska Fairbank’s Project Jukebox. This project launched back in 1988 and aims to ” integrate oral history recordings with associated photographs, maps, and text.” The original was written using Hypercard!
They have a map showing all the communities in Alaska currently included as part of the project. A good example of an individual photo with accompanying narration is Harry Cook in his Garden from the Kiana Village History Project. No - it isn’t as elegantly assembled as the Centropa Movies, but the intention is much the same. They use old photos as a catalyst for helping individuals being interviewed and then combine the audio and images to improve end users’ understanding of the context of individual photos.
I have signed up with Centropa to be notified when they launch the promised ‘Add Your Family Photos’ feature. Until then I will keep scanning my own family’s photos, such as the one below featuring my grandfather (back row on the right), and working my way through all the Centropa Movies and their supporting materials.
http://feedproxy.google.com/~r/Spellboundblog/~3/rFpSWTJP7vk/
I spotted the New York Times article Historical Photos in Web Archives Gain Vivid New Lives via Dan Cohen’s Twitter Feed. The article is a nice treatment of the difference between the Library of Congress’s 50 photo a week contributions to the Flickr Commons and the German Federal Archives‘ contribution of 100,000 images to the Wikimedia Commons (described as ” the virtual archive for material used in Wikipedia articles”).
I took a look at the details of this project - starting with the homepage of the Commons: Bundesarchiv on the Wikimedia Commons. This passage explains one of the goals of the Budesarchiv Gallery:
Very old photographs have become public domain, and events and persons of today can be photographed by Wikipedians with their digital cameras. But for the time between there is a huge gap in Wikipedia articles. The donation of Federal Archive is important to close that gap, and it is to hope that it can serve as a model to other institutions in Germany or elsewhere.
Also, each individual photo includes this disclaimer:
For documentary purposes the German Federal Archive often retained the original image captions, which may be erroneous, biased, obsolete or politically extreme. Factual corrections and alternative descriptions are encouraged separately from the original description.
There is a special category to call out instances of these types of descriptions - BArch images with biased descriptions. In my exploration, I discovered only a very few with these original image captions translated to English. One example is the photo of a single room home for a family of eleven.
In contrast to the Library of Congress addition of 50 photos a week, the German Federal Archive plans to add “a few thousand images a month”. The Commons:Bundesarchiv To Do list is also interesting reading. The To Do page includes tasks both in German and English (though the wiki discussion page is all in German). I love having the opportunity to read about issues confronting those working on this sort of project. For example - there is a discussion about how to determine if an image should remain Uncategorized. What if only 1 person out of three is tagged? Does it still ‘deserve’ to remain marked as ‘uncategorized’?
New categories created for use in this project need to use a special template so that they show up properly within the sub-categories of the Category:Images from the German Federal Archive page. For example - the page which sorts images by country has 64 sub-categories at the time of this post. A new country added using this template approach would immediately show up on the images by country sub-category page.
I will say that the learning curve for images within the Wikimedia Commons in general, and the Budesarchiv project in specific, is much higher than tagging images in the Flickr Commons. There is a handy CommonSense tool (available via the ‘find categories’ tab on any image) that will suggest categories based on keywords, but even that is a bit overwhelming for a beginner.
As an example, let’s look at the image I chose for this post of two boys finishing their ice cream in 1949. Here are the categories currently assigned:
Images from the German Federal Archive, year 1949
Images from the German Federal Archive, location Berlin
History of Germany
Ice cream
Black and white photographs of children
Black and white photographs of Germany
Standing males
Photographs by Brenner
Let’s take a look at what the wiki text looks like to set these categories. First there is the special template for the project which specifies the year and location. I believe that these are attributes uploaded with the original photograph. This gives us the first two categories in our list (emphasis added mine):
{{BArch-License|
|signature=Bild 183 1984-0202-506
|batch=Bild 183
|year=1949
|month=
|location=Berlin
|PD=
}}
Then we get to the standard Wikimedia Commons categories. These are the categories most akin to tags in Flickr. These are the categories which will promote discovery of these images alongside images from other sources from across the Wikimedia Commons:
[[Category:History of Germany]]
[[Category:Ice cream]]
[[Category:Black and white photographs of children]]
[[Category:Black and white photographs of Germany]]
[[Category:Standing males]]
[[Category:Photographs by Brenner]]
These categories were clearly hand added by someone, since the original caption reads (by my rough translation) At the beach: “Is it already gone?”. I suppose I could go in and add [[Category:Beaches]], but I am honestly not sure if there is enough beach in the photo to warrant such a>
I am very curious to see comparison stats of the assignment of categories/tags to images in both the Flickr & Wikimedia Commons a year from now. How will we measure success? How will we grade the accuracy of metadata assigned by the public? Which images will get more public views and usage - those added to the Flickr Commons or those added to the Wikimedia Commons?
For now, I am happy to set aside all these thorny questions. I am just so pleased to see a new and ambitious experiment in crowdsourcing image metadata.
http://feeds.feedburner.com/~r/Spellboundblog/~3/523164716/
The official title of this session was Getting to the Heart of Performance: Archivists as Creative Collaborators. It was a lovely change of pace. Upon entering this session, we discovered someone tuning a Chinese hammered dulcimer in the middle of a social dance floor. Our hosts were Scott Schwartz of the Sousa Archives and Center for American Music, University of Illinios, Urbana-Champaign and Andrew M. Wentink of Middlebury College Special Collections & Archives. The goals of the session? To teach us about Asian American Jazz fusion and Tango.
Asian American Jazz Fusion
Dr. Anthony Brown, of Anthony Brown’s Asian American Orchestra, explained why there was a Chinese hammered dulcimer sitting in the middle of the room. Brown was going to introduce us to Asian and American Jazz fusion. The curator of the Smithsonian’s Duke Ellington Collection from 1992-1996, he discovered materials related to Ellington’s Far East Suite, originally composed to honor the people who welcomed Ellington during his state department tour (cut short by Kennedy’s assasination). Brown was able to trace Ellington’s itinerary through business records and then figure out the instruments that inspired the original in the Asian American Jazz Orchestra’s recording of Far East Suite. His next CD project was Monk’s Moods. The Asian American Jazz Orchestra is now celebrating its 10th anniversary with the release of a CD titled Ten.
Yangqin Zhao plays the Chinese hammered dulcimer and is the formost performer on the instrument in the western hemisphere. The dulcimer travelled via the silk road from persia. The silk road was the original information highway. It was the way east and west were connected in the ancient eras.
Then a recording of Monk’s Moods on piano was played. Then Zhao performed the same piece on the Chinese hammered dulcimer. To achieve this, Brown and Zhao had to work together to translate the original arrangement. Excerpt from Gershwin’s rapsody in blue - recomposition - reorchestrated for his orchestra. A piece of music or a dance chart cannot come to life until you breath life into it. Enabling access to performing arts is different.
The second piece that Zhao played was Andantino from Rhapsody in Blue. Samples of both Andantino and Monk’s Moods are available on the Ten CD page. Zhao then thanked Anthony for teaching her Jazz.
Tango
The dance portion of the session was brought to us by Richard Powers of Stanford University Dance Division and his dance partner Joan Walden. Powers founded the Flying Cloud Academy of Vintage Dance. He has a design and creative process degree from Stanford where he is an expert in 19th and early 20th century social dance. Stanford has an extensive dance manuals collections and Powers is the director of Stanford’s 70 member vintage dance ensemble.
Stanford Dance department wanted Richard to make dance more visible on campus to help make sure that it didn’t get cut (partially or completely). Outreach is important - strengthen funding or let potential donors know about you. He recommends that you can bring back dance manuals from your archive. With movies like Mad Hot Ballroom and Shall We Dance? and TV shows like Dancing With The Stars, the American public is predisposed right now to be interested in dancing. Most of the dances in dance manuals were meant for teaching regular people to dance so they could dance with their friends. They were part of a self improvement movement.
Think of unique way to encourage others to use archival records. Powers encourages everyone to NOT hand it off to others. Being a non-dancer gives you a better chance for colloboration. The more we know, the harder it is to get into a true collaboration. But if it is new for you you are more open minded and more open to true collaboration.
There are other resources beyond dance manuals: dance magazines, etiquette books, anti-dance manuals (which sometimes describe the illicit dances that the proper dance manuals won’t mention), novels that give background, journals/diaries/letters, iconography - lithographs, photos, drawings, etchings, sculptures .. to help get the visual costuming. Dance cards and ball programs give lots of information - when, who.. what music.. maybe where. This also gives you a chance to see which dances were popular (vs the manuals which are promoting dances). Motion pictures from the times. So - how can we weave all of this together?
For more information about how to reconstruct dances, read Powers’ Guidelines for Dance Research and Reconstruction.
We then got a crash course in Tango history. I took notes as fast as I could, but I know I missed a lot along the way. Here are the bits I managed to get down - but don’t trust me to be an authority:
100 years ago in Buenes Ares or Paris - you could find the argentinian tango. 1908 - just arrived in paris.. in the outskirts from Buenes Ares. But that version would seem simple. And then they danced!
1st Myth of the Tango: It was born in the brothels. His informed opinion is that it was created by the poor, but that doesn’t mean they were pimps & prostitutes. Most tango scholars today believe it was created by the honest poor in the bario.
2nd Myth of the Tango: The Tango was imported to Paris (1908-1912) and tamed by the French who found it too passionate and make it more appropriate for the ballroom. Lots of documentation from many sources that prove that the French ADDED more passion.. and that the dance was carried to Paris by young aristocrats.
Tango was presented in response to the dance called the Apache - exchanged influence from 1912-1914 in Paris.
A Buenes Arnes dance manual from 1914 (dated by the illustrations) called El Tango Argentino includes detailed illustrations and foot diagrams. Going back to the source shows us the meaning behind the names and rules about steps. Most drama and stalking was added 15 years later.
The true roots of Tango are unknown.
The main trunk of Tango is the version known in Paris 100 years ago.. social Tango today is still the same. Three branches of
Tango are: 1) stage performance (more dramatic), 2) ballroom competition and 3) Beunes Ares - every 10 years or so it changes dramatically.
Then they got everyone up and out on the dance floor. We went from learning history and thinking about how to one might decipher dance manuals to actually learning to Tango!
My Thoughts
If you are wondering why I am posting this over four months after the conference - you can blame Beaver Archivist’s post about Dancing Archivists. It immediately made me recall the largest gathering of dancing archivists I had personally witnessed. The session itself was really great. It was so far from people sitting in silent rows staring at powerpoint slides (not that there is anything wrong with that) that you might have thought you had wandered into the wrong conference.
It was the takeaway that was especially appealing to me. I really like the of finding new ways to bring performance based archives back to life - of finding new ways to reach out to people and make the records sing and dance again. Hearing music reinterpreted and reinvented is of course fundamentally different from seeing sheet music in a glass case. What if every archives that had performance art related records found a way to have two live, participatory events each year? I can only imagine the new audience who might be drawn in to learn about what is hidden in the archives — they might just come back because it is fun. My fingers are crossed that I can get my 2nd Tango lesson in Austin, TX in August 2009.
As is the case with all my session summaries from SAA2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.
http://feeds.feedburner.com/~r/Spellboundblog/~3/516373667/
Amazing how much can change in 100 years. In March of 1909, the stereograph above shows African Americans driving the carriage that carried President and Mrs. Taft from the Capitol to lead the inauguration parade to the White House. On January 20th of 2009, Barack Obama will be the guest of honor. The American Folklife Center’s Inauguration 2009 Sermons and Orations Project aims to collect recordings, transcriptions and ephemera of speeches addressing the significance of the inauguration of Barack Obama as the first African American president.
It is expected that such sermons and orations will be delivered at churches, synagogues, mosques and other places of worship, as well as before humanist congregations and other secular gatherings. The American Folklife Center is seeking as wide a representation of orations as possible.
The Inauguration 2009 project is modeled after prior Library of Congress collection projects. Two great examples of these earlier projects are:
“Man-on-the-Street” Interviews Following the Attack on Pearl Harbor - features audio recordings of the reactions of than 200 people to the Japanese attack on Pearl Harbor.
September 11, 2001, Documentary Project - includes 200 audio recordings collected between September 13, 2001 and February 13, 2002 in cities across the United States
If you want to organize a local recording, here are the basics:
Recording must be made between Friday, January 16th and Sunday, January 25th, 2009 and postmarked by February 27, 2009.
The project website provides the required Participant Release Form for speakers, photographers and those making the recordings.
The project is accepting audio recordings, video recordings, and written texts of sermons (see their detailed specifications page for information about accepted formats). Also accepted will be accompanying ephemera such as photographs and printed programs.
If you are sending materials to the Library of Congress, they encourage you to use FedEx, UPS, or DHL because of the danger of damage due to security screening done to USPS packages.
If you want to get a taste of other recordings held by the Library of Congress, you can spend some time browsing the fantastic list of Collections in the Archive of Folk Culture Containing Sermons and Orations provided on the project site.
So spread the word. Honor the Library of Congress’s goals by helping this collection include the perspectives of as many communities as possible. Your local religious or secular leader could have their point of view preserved as part of a snapshot of our country’s response to the Inauguration of 2009. While they hope for audio and video recordings, they are also accepting text transcriptions - so this doesn’t have to be a high tech endeavor. That said, perhaps this is the inspiration you have been waiting for to learn how to make an audio or video recording!
http://feeds.feedburner.com/~r/Spellboundblog/~3/511539363/
It is official - the panel I proposed for SAA 2009 (aka, Sustainable Archives: AUSTIN 2009) was accepted!
Title: Building, Managing and Participating in Online Communities: Avoiding Culture Shock Online
Abstract: As more archival materials move online, archivists must become adept at participating in and managing online communities. This session will discuss real world experiences of this involvement, including putting images into the Flickr Commons and links to archival materials in Wikipedia, as well as guidelines on cultural norms within online communities. We will also discuss choosing between building new communities from scratch vs joining a broader, existing community (such as the Flickr Commons).
I will be serving as session chair and moderator for our group of fabulous panelists (finances and travel plans permitting):
Deborah Wythe: talking about Flickr Commons and other Brooklyn Museum web/community projects (or whatever the latest and greatest projects are afoot at the Brooklyn Museum by the time we hit August 2009)
Ann Lally: talking about Wikipedia and blogs (co-author of: Using Wikipedia to Extend Digital Collections)
Mark Matienzo: talking about NYPL web/community projects
Seb Chan: talking about Powerhouse Museum, Flickr Commons and (maybe) blogs
The intention is for this session to begin with very brief presentations showing off the current projects at our panelists’ institutions and follow that up with lots of time for discussion and answering of questions.
We see our target audience as archivists who want to hear about real world experiences of working within existing online communities (such as Wikipedia or Flickr) and building new communities dedicated to cultural heritage materials. The session will target individuals with less experience with Web 2.0 and social media implementations, but the lessons learned should also be of interest to those already in the implementation stages of their own projects.
I will put out a call for questions as we get closer to the conference so that our group can get an of what people are interested in learning about specifically, so start making notes now. Hope to see you in Austin!
http://feeds.feedburner.com/~r/Spellboundblog/~3/482423279/
As has been reported around the web today, Google is now digitizing and adding magazines to Google Book Search. This follows on the tails of the recent Google Life Photo archive announcement.
I took a look around to see what I could see. I was intrigued by the fact that I couldn’t see a list of all the magazines in their collection. So I went after the information the hard way and kept reloading the Google Book Search home page until I didn’t see any new titles displayed in their highlighted magazine section. This is what I came up with, roughly grouped by general topic groupings.
Science and technology:
The Bulletin of the Atomic Scientists: which started out as the Bulletin of the Atomic Scientists of Chicago in December of 1945 through November of 1998
CIO: The Magazine for Information Executives: back to Volume 1, Number 1 from Sept/Oct 1987
Maximum PC: October 1998 through the present
Popular Science: stretching back to an issue for March of 1872 when it was known as Popular Science Monthly through to February 2008
Popular Mechanics: January 1905 through November 2005
Lifestyle and city themed:
New York Magazine: April 1968 through December 1997. Fascinating that some of the magazines still have the original mailing label on them (see this example from a July 1969 issue of New York )
Cincinnati Magazine: January 1971 through December 2005, at which point it seems to switch to being an annual city guide titled Cincinnati USA
Atlanta: January 2003 through August 2008 - and mis-titled ‘Atlants’
Indianapolis Monthly: January 1995 to the present
Cruise Travel: June 1979 through December 2007
African American:
Ebony Jr!: May 1973 through October 1985
Jet: November 1961 through October 2008
Black Digest: Named ‘Negro Digest’ from November 1961 through April 1970, then Black Digest from May 1970 through April 1976.
Health, nutrition and organic:
Women’s Health and Men’s Health: January 2006 through present. I found it very amusing to be able to scan the covers of all the issues so easily - true for all of these magazines of course, but funny to see cover after cover of almost clad men and women exercising.
Prevention: January 2006 through the present
Better Nutrition: January 1999 through December 2004
Organic Gardening: November 2005 to the present
Vegetarian Times: March1981 through November 2004
Sports and the outdoors:
Baseball Digest: July 1945 through October 2007
American Cowboy: May 1994 through August 2008
Bicycling, Mountain Bike and Runner’s World: January 2006 through present
They of course promise more magazines on the way, so if you are reading this long after mid December 2008 I would assume there are more magazines and more issues available now. I hope that they make it easier to browse just magazines. Once they have a broader array of titles - how neat would it be to build a virtual news stand for a specific week in history? Shouldn’t be hard - they have all the metadata and cover images they need.
I love being able to read the magazine - advertising and all. They display the covers in batches by decade or 5 year period depending on the number of issues. I also like the Google map provided on each magazines ‘about’ page that shows ‘Places mentioned in this magazine’ and easily links you directly to the article that mentions the location marked on the map.
I think it is interesting that Google went with more of a PDF single scrolling model rather than an interface that mimics turning pages. In many issues (maybe all?) they have hot-linked the table of contents so that you can scroll down to that section instantly. You can also search within the magazine, though from my short experiments it seems that only the articles are text indexed and the advertisements are not.
Google’s current model for search is to return results for magazines mixed in with books in Google Book Search results - but they do let you limit your results to only magazines from their Advanced Search page within Google Book Search. See these results for a quick search on sunscreen in magazines.
Overall I mark this as a really nice step forward in access to old magazines. As with many visualizations, seeing the about page for any of these magazines made me ask myself new questions. It will be interesting to see how many magazines sign on to be included and how the interface evolves.
To read more about Google’s foray into magazine digitization and search take a look at:
Tech Crunch: Google Adds Print Magazines To Book Search
Official Google Blog: Search and Find Magazines on Google
Venture Beat Digital Media: Google Book Search: now with magazines!
For a really nice analysis of the information that Google provides on the magazine pages see Search Engine Land’s Google Book Search Puts Magazines Online.
http://feeds.feedburner.com/~r/Spellboundblog/~3/480285626/
As part of his portion of our SAA 2008 panel in San Francisco, Max Evans demonstrated his prototype for a new way to view an EAD finding aid. You can download his presentation from the SAA’s site: Finding Aids for the 21st Century: The Next Evolution.
Max’s prototype of Susa 2.0 is now online! He asked that I make sure you know it works best (showing all the intended mouse over text for links) with Internet Explorer version 6.0. The prototype presents the finding aid of the Susa Young Gates Papers from the Utah State Historical Society. His design tackles the major issues that plague large finding aids normally displayed in traditional single page layouts. Anyone who has looked at a large finding aid online has had the experience of being scrolled down somewhere in the middle and realizing they have no what they are looking at. What folder is this item in? What box is this folder in? Am I reading through a list of letters from 1950 or are these the ones from 1970?
Context is hard to communicate when you are dealing with long lists of folders that stretch longer than the length of the screen. Max’s design uses a three column approach to provide context from left to right. His design also gives users a way to look at the full list of either items or folders, independent of their originating containers - each list then sortable in three different ways: ‘as arranged’, alphabetically or by date. I love this page which shows how a scanned document might be displayed within the proper context of the collection - in this case, page 2 of document 1 of the General Correspondence from 1886-1909. All of these get at the heart of giving researchers more control over how to tackle the records in a collection while making sure that they don’t loose the tools that ordered documents in a folder would provide them in the research room.
His prototype takes a step beyond just changing how the finding aid itself is presented - but also considers how the work flow of a researcher can be improved while also simplifying the record request processes. The prototype gives the patron the option to request the scanning of specific folders or items. They can also add records to their ‘research cart’ to either request the proper boxes be retrieved or to store the records in a personal research area within the archives website - both possibilities sound useful to me.
Max’s prototype is such a great example of rethinking how people are expected to work with archival records within the confines of the information we already have available in finding aids as they exist today. I highly recommend you give Susa 2.0 a look. It is a testament to Max’s incredible patience that he was able to create this prototype using over 200 separate HTML files - but it also sets the bar high for what we could be doing with our interface design!
http://feeds.feedburner.com/~r/Spellboundblog/~3/478186046/
This afternoon I realized that I had passed a new landmark here on Spellbound Blog - I have published over 100,000 words! 100,208 words in 137 posts to be exact (thank you TD Word Count plugin).
Since I managed to miss my 2 year blogiversary back in July, this seems like a fine time to thank you all for sticking with me and giving me such great feedback over the past 2+ years. Since my Happy Birthday post in July of 2007, Google Analytics tells me that I have had 14,901 unique visitors to my blog from 139 different countries and territories and Feedburner tells me that I have over 400 subscribers to my RSS feed.
So, thank you everyone for giving my posts some of your precious time. Now, onward to 200,000 words!
Image Credit: 100,000 Miles by Melissa Doroquez via Flickr.
http://feeds.feedburner.com/~r/Spellboundblog/~3/462579013/
In news that would make any fan of old photographs drool, the Official Google Blog has announced that the LIFE Photo Archive is now available on Google Image Search. The LIFE Photo Archive’s home page is neatly organized to encourage you to browse for images by decade, famous people and topics.
There really is something for everyone here. I picked this striking image of Martha Graham because I love modern dance, but there are also images of war, fashion, sports, landscapes, architecture and tons more. The images currently posted stretch from the 1750s through 2003 and include many that have never before been published.
It is also worth noting that not everything in this collection is a photograph. I found illustrated pages from books like the Queen Summer by Walter Crane. I also found illustrations like this one of the ancient temple of Artemis in Ephesus, Turkey.
From the text in Google’s blog post, it sounds as if Google is doing the digitizing - while LIFE Inc (or their parent company Time Inc) will profit from the sale of prints. The current posted photos represent 20% of all the photos. Ultimately the photo archive is expected to be about 10 million images and stretch to the present day.
TimeInc has partnered with QOOP to sell framed fine art photographs via links directly from each of the LIFE photo pages within Google. Take a look at the page dedicated to selling you a framed art print of St. Moritz, Switzerland in 1947.
I had not heard of QOOP before, but they seem to have a number of options available for those who want to be their partner. QOOP also wanted me to join their ‘Social Commerce Revolution’. In order to join the revolution I had to create an account (it is free and easy it pointed out). To create my account I had to give them my email address, password and birth date. Not so bad. On the next screen they required that I enter my mailing address and phone number. I don’t really want to give them this information - but I am curious about this revolution I have been promised. And when I was done - a whole lot of nothing happened. I think that I am supposed to use QOOP to create and market products. Is it an affiliate program? Is it an artsier CafePress? Do I need to contribute my own images or can I use those of others? I am still not sure.
I do like their integration with the LIFE images, but I think that there is clearly more work to be done before they are going to foster a ‘Social Commerce Revolution’ anytime soon.
Time Inc’s press release includes the following details:
The LIFE Photo Archive featured on Google will be among the largest professional photography collections on the Web and one of the largest scanning projects ever undertaken. Millions of images have been scanned and made available on Google Image Search today with all 10 million images to be available in the coming months.
“For 70 years, LIFE has been about one thing, and that’s the power of photography to tell a story,” says Andy Blau, LIFE’s President. “LIFE will now reach a broader audience and engage them online with the incredible depth and breadth of the LIFE Photo Archive from serious world events, to Hollywood celebrities to whimsical photographs.” Time Inc. EVP, John Squires adds: “We’re delighted Google recognized the rich value of our photo archive and worked with us to bring it to millions of consumers. Consistent with the launch of the TIME Archive, PEOPLE Archive and the SI Vault, this initiative continues our efforts to build valuable new revenue opportunities from our rich heritage.”
All keywords are translated into 16 different languages. LIFE’s Photo Archive will be scanned and available on Google Image Search free for personal and research purposes. Copyright and ownership of all images will remain with Time Inc.
Google uses a special notation to support search across the LIFE collection - all you do is include source:life as one of your search terms within the Google Image search box. Each photo has a rich set of metadata including a description and the keywords mentioned in the press release above.
When you click on one of the keywords (which are actually called ‘Labels’ in the Google interface) it submits a search within the LIFE collection - but does NOT restrict that search to only the keywords. Rather, it seems to search across all the text associated with each image. For example - the label ‘Feathers’ on assorted images links to this URL: http://images.google.com/images?q=Feathers+source:life. This search returns many images, including one of Debutante Marilyn Lowe wearing a dress made from feathers which is not in fact assigned the label ‘Feathers’, but obviously does include the word feathers in the description.
For those accustomed to hotlinked tag-like terms only retrieving content that also is assigned that term (see all the images in the Flickr Commons tagged with ’snow’), this might be a bit confusing. Also in contrast with the Flickr Commons, Google does not offer the opportunity for users to assign additional labels/keywords to the images. If you have signed on with a Google account, you can assign images within the LIFE collection a star rating. I don’t see how this is used right now, but I expect that over time they will leverage these ratings to sort the Google hosted image search results.
Since I spend a lot of time these days organizing controlled vocabularies, seeing the keywords assigned to these images makes me wish I could see Time Inc’s full and organized set of terms. My favorite spotted so far? Lines of People - as typified by this photograph of Models wearing checked outfits from 1958.
It will be interesting to see what other partnerships crop up in the next year to digitize other major collections. I am also very curious to know if people actually buy framed fine art prints at the current cost of $79.99 for an 8″x12″ inch print in a 13″x16″ frame. Who knows if Time Inc will be forthcoming with their degree of success on this front - but it will likely be an good test case for other major collections looking to recoup some of the cost of their digitization efforts and find a new revenue stream.
Image Credit: The copyright of both of the images shown above belong to Time Inc. Please click through to view details about each image, including the photographer’s name and the option to purchase your own print of the image.
http://feeds.feedburner.com/~r/Spellboundblog/~3/461586500/
My work now includes more SEO (Search Engine Optimization) work and so I have added SEO focused blogs to my RSS feedreader. Today I spotted Search Engine Land’s post Business Opportunities For Video News Archives. Stephen Baker calculates that 35 years worth of archive footage equals 51,100 hours of content per station. With approximately 20 stations per broadcast group he estimates a cost of $30 million per group to digitize each broadcast group’s archive of news footage. See the original article for more details on his calculations.
He then proposes 3 approaches to monetizing these efforts and leveraging the resulting digitized video:
Media-Centric Wikipedia - complete with an expectation that social media contributions would provide “scalable way for creating editorial metadata, such as descriptions and story summaries that would be costly to otherwise create”. This makes me think of Flickr Commons for video.
Education Site - akin to NBCU’s iCue site I mentioned in my post about NBC News Archive footage on Hulu. “Efforts like this provide educational/subscription opportunities as well as sponsorship/advertising opportunities—what advertiser doesn’t want to get in front of 13 - 18 year olds?”
News Site Extension - described as “bolting the news archive onto the existing site”. The major benefit of this is that “more content provides more SEO opportunity and, hence, larger audience reach.”
Baker concludes:
In a market where traditional media is struggling to create unique and compelling online experiences and business models, the archive represent a differentiator that can jump-start audience building and monetization initiatives. Not only is it an important representation of world history that must be saved for “preservation-sake”, the archive represents a large, untapped online opportunity. Who will be first to realize its potential?
The ultimate goal of all three of these scenarios is to offset the extreme expense of digitization of thousands of hours of news footage. I think it is refreshing to see a perspective from outside the cultural heritage corner of the world that still sees video archives as rich resources worth preserving. I also like seeing that are pitched in manner that should catch the attention of those making budgets and struggling with finding funding for large digitization efforts.
Image Credit: Flickr photo OSU Spring Game 2006 Media Lineup by Chris Metcalf
http://feeds.feedburner.com/~r/Spellboundblog/~3/452618671/

