Technical Documentation
Project Code
Fundamental scholarly resources, such as editions and translations of primary texts, often continue to be cited by scholars many decades after their publication. The lifespan of digital projects is typically much shorter than that of traditional print publications. To allow this project both to reach the widest possible audience now and also to have the best chance at longevity, we have designed our digital infrastructure using standards and processes that should be usable for the longest time span we can envision and to be recoverable even after technological evolution leaves the current web infrastructure behind and our webpage ceases to function. To this end, we have made the following technical choices:
- All of our project code and data are fully available. You are invited to visit our GitHub repository to examine and, if you wish, download our files. (Please note that while we consider our code itself to be licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, the project team retains copyright on the text of the English translations and scholarly commentary. Further, we have been granted permission to use Spanish source texts that may be otherwise under copyright. Please contact us if you have questions about permission to use the texts outside of the context of reading them on our site.)
- The original Spanish texts and our English translations are all encoded using TEI XML markup, specifically adhering to the TEI P5 Guidelines. TEI, or Text Encoding Initiative, XML markup has been used for decades and has become the "industry standard" for scholarly digital editing projects. Because it has been so widely adopted, and because its standards have been so robustly defined and documented by the humanities scholars for whom it was developed, we are confident that texts stored in this format will remain comprehensible and technically accessible for a very long time. Other datasets created as part of this project, including biographical, geographical, and lexicographical data, are also encoded according to the same standard. Thus even if web architecture fundamentally changes, these files can still be downloaded and read locally in any program that can read a basic text file.
- Our webpages are written in HTML5, adhering as closely as possible to strict XHTML standards. We have also aimed to maximize accessibility for screen readers.
- Rather than downloading packages or calling on external libraries, all of our CSS and JavaScript is written in-house. We have kept this as streamlined as possible to avoid code bloat.
- The XML needs to be transformed into HTML to be easily accessible on the web. To maintain direct control of the web display and avoid external dependencies, we have not used a pre-existing transformer such as Drupal. Instead, we have custom-written XSLT (Extensible Stylesheet Language Transformation) files, which like all of our code can be accessed through our GitHub repo. Each of these stylesheets is written to pull and process data from one or more of our TEI XML files, transform it using the set of template rules in the stylesheet, and create an HTML file for the website. Whenever an XML file is updated, we re-run the relevant XSLT stylesheets to create new HTML webpages that reflect the changes.
For those unfamiliar with XSLT, an example should help to explain it.
- The most complex stylesheet in the first phase of our project has been the one that creates our page with the Spanish text, English translation, and supplementary notes on the Parlamento of Negrete (1803). The page is viewable here, and the XSLT stylesheet can be found here.
- To create this webpage, the stylesheet has to process data from no fewer than six separate TEI XML files: the Spanish text; the English translation; the lexicon of definitions; the personography file; the geography file; and finally a file of explanatory notes.
- The XSLT stylesheet begins by defining the overall shape of the HTML file it will produce. It then fills in the data. The principal source is the XML file of the Spanish source text. Here the text is divided up into sections tagged with <div> for division. These are numbered. For each <div> of the Spanish text, the stylesheet then finds the corresponding <div> in the English text, pulling it from that file. These are placed in two parallel columns on the webpage. The Spanish and English files are also marked up with occurrences of persons, places, and vocabulary; these have been rendered in our output using colored or underlined text.
- The third column is the most complex part. Each <div> needs a series of notes to help the reader interpret the content of that <div>. The notes are collapsed by default to save space but each can be opened by clicking on the term, name, place name, or note, using the HTML5 tags <summary> and <details> to minimize the need for JavaScript. If there are any vocabulary terms tagged in that <div> in the XML, there will be a list of exactly those terms, in the order of first appearance in the text. These are drawn by the stylesheet from the lexicon file. In the same way, the persons named in the <div>, the places mentioned, and any specific explanatory notes are all pulled by rules in the stylesheet and inserted into the HTML right next to the sections of text to which they pertain. After the stylesheet has completed that process for <div n="1"> in the Spanish XML file, it moves on to <div n="2">, and repeats the process until it has come to the end of the last <div> (in this case, there are forty-four of them) in the Spanish XML file.
This may seem an unnecessarily complicated way of generating a webpage. There are two reasons we chose to use it. First, it enables us to keep all of our data in XML format for long-term accessibility. Second, while the stylesheet written to generate the Negrete 1803 page may be complex, we intend to translate at least four additional Parlamentos documents as the project continues. While each one will get its own stylesheet, each stylesheet will be a variation on the one we have already written. The initial outlay of effort has created an efficient pipeline for creating pages for these additional documents.
Extensions
To make the project more engaging and informative, we anticipate moving beyond texts and commentary to include more extensive media. This is likely to require us to use "external dependencies", digital resources that are not run from our own server. Because we do not own the tools or the servers that host them, it will not be possible to guarantee their long-term maintenance. However, we regard the translation and commentary as the core of the project.
Development and Storage
This project is a work in progress. We are using GitHub for collaborative development of our code and also to make it publicly accessible. Further, we are using GitHub Pages to host our development site. During the project development phase, webpages will first be created on the development site before being moved to their permanent hosting site, soon to be launched, to be hosted by the University of Pittsburgh. New features and pages will be added first to the GitHub Pages development site, and in that sense it may be more advanced than the University of Pittsburgh site; but users should be aware that anything on the development site may be incomplete, experimental, or glitchy. It should be treated as a rough draft, not a finished product. If you want to cite or link to any content that appears only on the development site, please consult us first.
When the project is complete, the GitHub Pages site will remain up as a mirror site in case the University of Pittsburgh site goes down. At that time all pages and features should be the same between the two sites. The code we have created will also continue to be available in the GitHub repository for as long as GitHub maintains it, but we also intend to save a copy of the code package to our institutional repository, D-Scholarship at Pitt.