Working on your Spanish

[18 November 2017; copy edits 18-20 November]

In just over seven months, the annual Digital Humanities conference will take place in Mexico City. I doubt that I’m the only digital humanist in the world who is surreptitiously trying to improve my Spanish before next June.

If you are trying to improve your Spanish (whether for that reason or for others), here are some things I am finding useful.

First, a textbook.

  • The French publisher Assimil has a Spanish volume in their sans peine series: L’espagnol sans peine. It’s available in several versions for speakers of different languages: English (Spanish with ease), German (Spanisch ohne Mühe), Italian, Dutch, and Portuguese. The series is described, quite accurately, as suitable for “beginners and pseudo-beginners” (débutants et faux-débutants).

    I bought the book and CDs direct from the Assimil web site and had them shipped to me in the U.S. without any difficulty; the only catch for some potential buyers is that the site is in French. Period. (Wait, aren’t they specialists in books for foreign-language learners? Don’t their web people realize the firm has non-Francophone target readers? Ah, well. There are some English-language resellers one can use, or you can grab a Francophone friend and make them babysit you through finding the book you want and making your purchase.)

    I won’t attempt to describe the Assimil method here. The linguist John McWhorter has given a good account of their results in his piece on the NPR web site (and there is some useful concrete advice at the site ‘How to Learn any Language’, for those who find the books’ instructions vague). I will say that for those who like me are attempting to learn (or re-learn) a language on their own and not in a class, the Assimil sans peine series has no equal that I know of.

    I bought the combination pack with the printed book, sound CDs with recordings of the dialogues, and a CD with MPEG versions of the recordings; I have downloaded the MPEG recordings to a directory on an Android tablet, imported the directory into the Podcast Addict app as an audio book, and I use Podcast Addict to play the day’s recording on continuous loop while I fetch the paper, wash dishes, etc. Listening in a podcast player has the advantage that I can speed up the early lessons to something approaching normal conversational speed.

Second, a supply of relatively easy reading and listening material. My searches for podcasts for learners of Spanish as a foreign languages turned up large numbers of results, some of which were obviously irrelevant and some of which I was able to delete without qualms after listening for a couple of minutes. I now listen to three:

  • Español automatico, prepared by a personable teacher of Spanish named Karo Martínez; in some installments, she speaks in relatively simple Spanish about assorted topics (the relative merits of the various actors who have played James Bond, the history of Catalonia, how to learn Spanish, and others; self-help topics of the how-to-be-more-organized variety are not uncommon), in others she specifically discusses issues of Spanish idiomatic usage and vocabulary. Episodes generally range from twenty to forty minutes (although when she got going on her introduction to the history of Barcelona, it ran over an hour). Transcripts and other additional materials appear to be available from the web site, some for purchase, which I hope pays for the cost of producing and publishing the podcast, and others for free (but you have to give them your email address, which induces a spasm of irrational privacy paranoia in me, so I have no idea what form the transcripts take). The only blemish for me is the repeated plea for listeners to file a five-star review of the podcast in iTunes.

  • El oso latino habla español — para mejorar su español, a quirky podcast put together in Sherbrooke, Québec, produced by a Québecois Spanish-learner named Pascal Dion and featuring the Peruvian Oswaldo Horna Montes (known, I gather, to his friends as El oso latino) as the main speaker. Episodes often include interviews with visitors from Latin America, with other Latin Americans living in Sherbrooke, or with anyone whom the host and producer think will be interesting; topics of discussion regularly include differences among varieties of Spanish and idioms peculiar to this or that regional variety. In one show, Montes narrated the preparation of a Peruvian dish for dinner. Dion appears in a regular segment on language-learner errors called Crónica del gringo, and Montes’s daughters in the segment Los chistes de Celia y de Marisol, which makes me laugh til I cry even though I have not yet understood a word of any of the jokes they have told. Lots of music. The most personal and thus the most memorable of these three podcasts. Generally around 30 minutes per episode.

  • News in slow Spanish, which is more or less what it sounds like: a weekly podcast with news stories in Spanish (there are similar News in Slow X podcasts for a number of different languages). There are several versions (intermediate vs. advanced, Castilian vs. Latin American), but oddly only one that I was able to locate from within Podcast Addict (Spain, Advanced, which proves not too advanced for me). News in Slow Spanish (Latino) does appear in the Tune-In Radio app (the only thing I miss there is continuous looping).

    I shied away from this at first; I have decades-old bad memories of unconvincing ‘newscasts’ specially for language learners filled with soft news (to give them a longer shelf life) and painful explanations of words. But I ended up trying the podcast, after I failed to find an app or podcast that would give me a conventional five- to fifteen-minute radio news broadcast once a day or once an hour (something along the lines of the NPR News app, or the Radio Canada app — still looking; if you know of anything let me know). And I was convinced. The news items are real, current, and interesting, and the editorial comment feels lively and intelligent. In the Latin-American version, particularly, it is interesting to listen to the friendly discussion between hosts with slightly different political leanings.

    I’ve been listening to the initial free portion of the podcast; a longer episode with more news items and some discussions of Spanish grammar and idioms is available as a paid service. The free portion runs about five minutes. (Every now and then there is either a slip or an intentional freebie, and the entire thirty-minute program is included.)

On easy reading material, I’m not doing too well. Children’s and young-adult books are an obvious choice, but I don’t know an easy way to know what’s worth buying and what’s not. Well, actually I do. The next time I’m at my public library I’ll ask at the information desk for recommendations.

And third, a supply of normal Spanish for listening and reading, ideally interesting and not over-challenging. Here, of course, the choice is limitless and the expanse of possibilities feels as trackless as Borges’s Library.

There is always the news (which tends to have a relatively manageable vocabulary, and to have a lot of short pieces). There are Android apps for any number of Spanish-language newspapers, most of which I haven’t heard of and some of which may or may not be worth reading. With my mind focused on Mexico City, I have looked only at apps and web sites for Mexican newspapers. A Mexican colleague (whom I thank, but who shall remain nameless here because I haven’t asked permission to name them here) has suggested:

  • Animal politico (left wing); the web site is fine on a desktop machine and a bit hard to navigate on a tablet. I didn’t find any app.
  • El Universal (center right); I did find an app but did not find it usable.
  • La Jornada (left wing); I find the Android app usable (though I wish it allowed me to adjust the font size), so I haven’t worked with the web site.

At the moment, I confess to finding Mexican newspapers slightly heavy going.

Since I’m a digital humanist, and I’m looking forward to DH 2018, I read the blog run by the Red de humanidades digitales with great interest, even if sometimes with imperfect comprehension.

Eventually, I’ll be looking for Spanish-language detective stories and the like: page-turners are a real boon for a foreign-language learner, so I will happily read many things in a foreign language whose English equivalents I wouldn’t normally be caught dead with. (I’m told that on the same principle, some adult literacy programs in this country do great work with Mickey Spillane.) Suggestions welcome.

Finding good podcasts aimed not at language learners but at intelligent adults has been a challenge, but looking around for podcasts on the sites of UNAM (Universidad Nacional Autónomo de México) and IMER (Instituto Mexicano de la Radio) has produced dividends, as have some journalistic think pieces I found on the Web on new media in Latin America. Right now my subscriptions include the following. (At the moment, all of these are tough sledding for me, but they repay repeated listening. The ability to slow the playback helps — it’s like being able to say “¡Demasiado rápido! ¡Más despacio, por favor!” and have the podcast nod and slow down.)

  • Azul Chiclamino, a podcast by Rodrigo Llop. I have no idea how to describe this; perhaps the subtitle will do: La realidad de lo absurdo. This is sometimes characterized as a humorous broadcast; for a sufficiently nuanced definition of “humor” (think Mark Twain) that’s probably true, but I find the podcast much more appealing than that label would lead me to expect.
  • Radio Ambulante, an NPR-affiliated podcast. Feels a bit like Radio Lab or This American Life, in Spanish: thoughtful, serious, well produced. Has the advantage that its stories are often about Hispanic affairs in the US, so I understand some of the background; has the drawback that its stories are often about the US, so I’m not learning about Latin America.
  • Ráfagas de pensamiento, a series of short pieces by the philosopher Ernesto Priani Saisó of UNAM, often reacting to a passage in an earlier philosopher (Nietzche, Husserl, Leibniz, More, …). Produced with atmospheric music and read by what sound like professional voice actors. Has the disadvantage for me that the background music can make the words harder to hear; has the advantage that it’s worth listening to. Usually 3-5 minutes per piece.
  • A multipart dramatization by Radio UNAM of Así asesinaron a Trotski by Leandro Sánchez Salazar (the man in charge of the investigation of Trotsky’s murder). I have no idea what Sánchez’s book is like as a historical source, but it has the virtue of strong narrative drive (even though I already know how it turns out). I may need to read the book in order to understand some of the broadcast.

Netflix and Amazon appear to have rather thin selections when it comes to Spanish-language films but they do have some. If anyone reading this knows an effective way to search by language on either, I’m all ears; surely searching for “Almodovar” should not be the only possibility (I am going to save Buñuel for later, when my Spanish is better and I can tell the difference between surrealism and not understanding the words). YouTube has a fair bit of Spanish content, though again I have not found any good way to find it except for searching on random Spanish words. An impulsive search on “Así asesinaron a Trotski” turned up several documentaries on Trotsky’s assassination, Trotsky’s life, Trotsky and Stalin, and Ramon Mercader (the man who killed Trotsky), as well as a few seminars on Trotskyite political theory.

[Addendum: on Netflix, selecting Browse / Audio & Subtitles takes the user to an interface where one can browse items with audio, or subtitles, in a given language. This is imperfect, but probably better than nothing. Looking for something to watch in the resulting display feels like looking for a book to read in a library arranged by color; for every ten times you feel irritated by its apparently random arrangement and the inconvenience of having to click on something every time you would like more information than is given in the icon, you may once or twice feel pleased by some serendipity.]

All of this is, of course, just my two cents. As may be clear from the above, my language learning work happens mostly on an Android tablet, not on a desktop machine.

More digital than thou

[16 December 2013]

An odd thing has started happening in reviews for the Digital Humanities conference: reviewers are objecting to papers if the reviewer thinks it has relevance beyond the field of DH, apparently on the grounds that the topic is then insufficiently digital. It doesn’t matter how relevant the topic is to work in DH, or how deeply embedded the topic is in a core DH topic like text encoding — if some reviewers don’t see a computer in the proposal, they want to exclude it from the conference.

[“You’re making this up,” said my evil twin Enrique. “That’s absurd; I don’t believe it.” Absurd, yes, but made up? no. The reviewing period for DH 2014 just ended. One paper proposal I reviewed addressed the conceptual analysis of … I’ll call the topic X, to preserve some fig leaf of anonymity here; X in turn is a core practice of many DH projects and an essential part of any account of the meaning of tag sets like that of the Text Encoding Initiative. I thought the paper constituted a useful contribution to an ongoing discussion within the framework of DH. Another reviewer found the paper interesting and thoughtful but found nothing specifically digital in the proposal [after all, X can arise in a pen and paper world, too, computers are not essential], graded it 0 for relevance to DH theory and practice, and voted to reject it. It should be submitted, this reviewer said, to a conference in [another field where X is a concern] and not in a DH conference. I showed this to Enrique. For once, he was speechless. Thank you, Reviewer 2!]

At some level, the question is what counts as work in the field of digital humanities. Is it only work in which computers figure in an essential role? Or is digital humanities concerned with the application of computers to the humanities and all the problems that arise in that effort? I think the latter. Some reviewers appear to think the former. If solving an essential problem turns out to involve considerations that are not peculiar to work with computing machines, however, what do we do? I believe that the problem remains relevant to DH because it’s rooted in DH and because those interested in doing good work in DH need the answer. The other reviewer seems to take the view that once the question becomes more general, and applicable to work that doesn’t use computers, the question is no longer peculiarly digital, and thus not sufficiently digital; they would like to exclude it from the DH conference on the grounds that it addresses a question of interest not only to DH but also to other fields.

[“Wait a second,” said Enrique. “You’re saying there’s a pattern here. Isn’t it just this one reviewer?” “No,” I said. “This line of thought also came up in at least one other review (possibly in a more benign form), and it has also been seen in past years’ reviews.” “Is that why you’re so touchy about this?” laughed Enrique. “Did they reject one of your papers on this account?” “Oh, hush.”]

The notion that there must be something about a topic that is peculiar to the digital humanities, as opposed to the broader humanities disciplines, makes sense perhaps if one believes that the digital humanities are intrinsically different from the pre-electronic humanities disciplines. On that view, any DH topic is necessarily distinct from any non-DH humanities topic, and once a topic is relevant to the broader humanistic fields (e.g. “what is the nature of X?”), it is ipso facto no longer a DH topic.

This is like arguing that papers about the concept of literary genre don’t belong at a conference about English literary history, because there is nothing peculiarly English about the notion of genre and any general discussion will (if it’s worth anything at all) also be relevant to non-English literatures. Or like trying to exclude work on the theory of computation from computer science conferences because it applies to all computation, not only to computation carried out by electronic stored-program binary digital computers.

[“I notice that leaders in the field of computer science occasionally feel obliged to remark that the name computer science is a misnomer,” said Enrique, “because computers are in no sense an essential element in the field.“ ”Perhaps they have the same problem, in their way,” I said. “My sympathy,” said Enrique.]

Another problem I see with the view my co-reviewer seems to hold is that some people believe that DH is not intrinsically different from the pre-electronic humanities disciplines. I didn’t get involved with computers in order to stop doing traditional philology; I got involved to do it better. Computers allow a much broader basis for our literary and linguistic argumentation, and they demand a higher degree of precision than philologists typically achieved in past decades or centuries, but I believe that digital philology is recognizably the same discipline as pre-digital philology. If the DH conference were to start refusing papers on the grounds that they are describing work that might be relevant to scholars working with pen and paper, then it would be presupposing an answer to the important question of how computers change and don’t change the world, instead of encouraging discussion of that question. (And it would be excluding people like me from the conference, or trying to.)

[“Trying in vain, I bet,” said Enrique. “Well, yeah. I’m here, and I’m not leaving.”]

These exclusionary impulses are ironic, in their way, because one of the motive forces behind the formation of scholarly organizations like the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (now the European Association for Digital Humanities) and behind their journals and conferences was that early practitioners in the field were often made to feel unwelcome in the conferences and journals of their home disciplines. The late Paul Fortier was eloquent on the subject of the difficulties he had trying to publish his computer-assisted work on Céline in conventional journals for Romance languages and literatures; he often spoke of Computers and the Humanities as having made it possible for him and others like him to have an academic career.

It will be a sad thing if the recent growth in degree programs devoted to digital humanities turns out to result in the field of DH setting up signs at all boundaries reading “Outsiders not wanted.”

Searching for patterns in XML siblings

[9 April 2013]

I’ve been experimenting with searches for elements in XML documents whose sequences of children match certain patterns. I’ve got some interesting results, but before I can describe them in a way that will make sense for the reader, I’ll have to provide some background information.

For example, consider a TEI-encoded language corpus where each word is tagged as a w element and carries a pos attribute. At the bottom levels of the XML tree, the documents in the corpus might look like this (this extract is from COLT, the Bergen Corpus of London Teenage English, as distributed by ICAME, the International Computer Archive of Modern and Medieval English; an earlier version of COLT was also included in the British National Corpus):

<u id="345" who="14-7">
<s n="407">
<w pos="PPIS1">I</w>
<w pos="VBDZ">was</w>
<w pos="XX">n't</w>
<w pos="JJ">sure</w>
<w pos="DDQ">what</w>
<w pos="TO">to</w>
<w pos="VVI">revise</w>
<w pos="RR">though</w>
</s>
</u>
<u id="346" who="14-1">
<s n="408">
<w pos="PPIS1">I</w>
<w pos="VV0">know</w>
<w pos="YCOM">,</w>
<w pos="VBZ">is</w>
<w pos="PPH1">it</w>
<w pos="AT">the</w>
<w pos="JJ">whole</w>
<w pos="JJ">bloody</w>
<w pos="NN1">book</w>
<w pos="CC">or</w>
<w pos="RR">just</w>
<w pos="AT">the</w>
<w pos="NN2">bits</w>
<w pos="PPHS1">she</w>
<w pos="VVD">tested</w>
<w pos="PPIO2">us</w>
<w pos="RP">on</w>
<w pos="YSTP">.</w>
</s>
</u>

These two u (utterance) elements record part of a conversation between speaker 14-7, a 13-year-old female named Kate of unknown socio-economic background, and speaker 14-1, a 13-year-old female named Sarah in socio-economic group 2 (that’s the middle group; 1 is high, 3 is low).

Suppose our interest is piqued by the phrase “the whole bloody book” and we decide we to look at other passages where we find a definite article, followed by two (or more) adjectives, followed by a noun.

Using the part-of-speech tags used here, supplied by CLAWS, the part-of-speech tagger developed at Lancaster’s UCREL (University Centre for Computer Corpus Research on Language), this amounts at a first approximation to searching for a w element with pos = AT, followed by two or more w elements with pos = JJ, followed by a w element with pos = NN1. If we want other possible determiners (“a”, “an”, “every”, “some”, etc.) and not just “the” and “no”, and other kinds of adjective, and other forms of noun, the query eventually looks like this:

let $determiner := ('AT', 'AT1', 'DD',
'DD1', 'DD2',
'DDQ', 'DDQGE', 'DDQV'),
$adjective := ('JJ', 'JJR', 'JJT', 'JK',
'MC', 'MCMC', 'MC1', 'MD'),
$noun := ('MC1', 'MC2', 'ND1',
'NN', 'NN1', 'NN2',
'NNJ', 'NNJ2',
'NNL1', 'NNL2',
'NNT1', 'NNT2',
'NNU', 'NNU1', 'NNU2',
'NP', 'NP1', 'NP2',
'NPD1', 'NPD2',
'NPM1', 'NPM2' )

let $hits :=
collection('COLT')
//w[@pos=$determiner]
[following-sibling::w[1][@pos = $adjective]
[following-sibling::w[1][@pos = $adjective]
[following-sibling::w[1][@pos = $noun]
]]]
for $h in $hits return
<hit doc="{base-uri($h)}">{
$h,
<orth>{
normalize-space(string($h/..))
}</orth>,
$h/..
}</hit>

Such searches pose several problems, for which I’ve been mulling over solutions for a while now.

  • One problem is finding a good way to express the concept of “two or more adjectives”. (The attentive reader will have noticed that the XQuery given searches for determiners followed by exactly two adjectives and a noun, not two or more adjectives.)

    To this, the obvious solution is regular expressions over w elements. The obvious problem standing in the way of this obvious solution is that XPath, XQuery, and XSLT don’t actually have support in their syntax or in their function library for regular expressions over sequences of elements, only regular expressions over sequences of characters.

  • A second problem is finding a syntax for expressing the query which ordinary working linguists will find less daunting or more convenient than XQuery.

    Why ordinary working linguists should find XQuery daunting, I don’t know, but I’m told they will. But even if one doesn’t find XQuery daunting, one may find the syntax required for sibling searches a bit cumbersome. The absence of a named axis meaning “immediate following sibling” is particularly inconvenient, because it means one must perpetually remember to add “[1]” to steps; experience shows that forgetting that predicate in even one place can lead to bewildering results. Fortunately (or not), the world in general (and even just the world of corpus linguistics) contains a large number of query languages that can be adopted or used for inspiration.

    Once such a syntax is invented or identified, of course, one will have the problem of building an evaluator for expressions in the new language, for example by transforming expressions in the new syntax into XQuery expressions which the XQuery engine or an XSLT processor evaluates, or by writing an interpreter for the new language in XQuery or XSLT.

  • A third problem is finding a good way to make the queries faster.

    I’ve been experimenting with building user-level indices to help with this. By user-level indices I mean user-constructed XML documents which serve the same purpose as dbms-managed indices: they contain a subset of the information in the primary (or ‘real’) documents, typically in a different structure, and they can make certain queries faster. They are not to be confused with the indices that most database management systems can build on their own, with or without user action. Preliminary results are encouraging.

More on these individual problems in other posts.

XForms and XQuery tutorials at TEI members’ meeting

[23 August 2010]

The TEI has published a list of workshops to be offered at the TEI Members’ Meeting this November in Zadar, Croatia.

Together with Syd Bauman of Brown University, I’m offering two tutorial workshops: one on XForms and one on XQuery. Each will last a day and a half, and involve some talking heads, some group discussion, and as much hands-on work as we can manage.

There are several other very good workshops on offer: Norm Walsh on XProc, the TEI@Oxford team on the ODD system, Elena Pierazzo and Malte Rehbein on the encoding of genetic editions, and Andreas Witt et al. on TEI for transcriptions of speech.

The organizers remind me that there is an early-bird discount for those who register before 31 August. There is some chance that tutorials which fail to attract enough participants will be canceled if they don’t get enough registration, so if you definitely want to come, you definitely want to register early, to help make sure your tutorial has enough registrants to make the cut.

Day of Digital Humanities, 18 March 2010

[17 March 2010]

Tomorrow I’ll be participating in a mass experiment with self-consciousness, the 2010 edition of the Day of Digital Humanities. The organizers have persuaded close to 150 people who self-identify with the description “digital humanist” (list) to blog, during the course of 18 March, about what it is they actually spend their time doing. “The goal of the project” (say the organizers) “is to create a web site that weaves together the journals of the participants into a picture that answers the question, ‘Just what do computing humanists really do?’”

For the day, I’ll be using the special Day of Digital Humanities blog set up for me by the organizers; the blogs of all participants are aggregated on the project site; there is also an RSS feed.