, ,

I’ve been thinking a lot lately about journal publishing formats – specifically, what formats we should use to publish our journal content, and how get it from manuscript to final published version. I love the idea of scholars composing their work in supportive, flexible, standard environments (hello WordPress!), but in the real world, they use  a variety of tools. They scribble notes on paper, they write articles in Word Perfect, they create data tables by hitting the space bar repeatedly. They use the tools that are handy and familiar. Some of this can be chalked up to a lack of technological literacy, and some to the inherent heterogeneity of scholarship and scholars, but whatever causes it, it’s not likely to change anytime soon.

This is an especially tricky problem for library publishing programs, which are, in general, working on a shoestring. We more often see our role as providing infrastructure and consulting than labor-intensive services like copy- and layout editing. That, too, is not likely to change, and maybe it shouldn’t. After all, part of our job is to model low-cost, low-barrier, usually open access publishing. Still, we have a responsibility to make our outputs standardized, accessible, and preservable if we possibly can.

I was inspired to think about this problem in a new way by a post on a totally different topic by Tim McCormick (If You Can’t Hear Anything Nice, Don’t Hear Anything: Robustness vs. Civility of Networks). In it, Tim invokes the Robustness Principle, and gives a pithy example:

… you should send documents that you believe to be strictly compliant with standards, but accept documents that diverge from standard in common ways.

Tim was talking about civility on Twitter, but his description of the Robustness Principle struck me as an excellent lens for viewing library publishing inputs and outputs. Maybe, instead of seeing non-standard inputs as a frustrating problem that we could avoid if only we had better instructions/stricter standards/magic wands, we could see them as one half of a robust publishing infrastructure. We could try to set up our systems to take in the glorious mess that is a bunch of manuscripts and reveal them to the world as the standards-compliant content we want to have in our discovery systems.

The challenge is how to do this in a way that doesn’t replicate traditional, expensive, labor-intensive publishing workflows. The challenge is to find and adapt and build tools that will do the heavy lifting for us. As an example, a group of developers from the Public Knowledge Project is working on an automated XML parsing tool that will take a Word manuscript and encode its content in standard XML. The rationale for the project is described briefly in their January 2013 newsletter:

This ought to be very valuable, as XML production lately often falls to cheap, outsourced labour, which is untenable for many of our lower-budget journals and unsustainable besides. This is not new technology, but it will finally allow PKP’s editors to produce structured, reflowable content with no more difficulty than is currently required to produce a PDF.

This is the kind of tool that should be part of a robust (or, rather, Robust) library publishing infrastructure.

What else could we make? What other areas need this kind of support? As individual programs our resources are pretty limited, but there are more and more opportunities (THATCamp Publishing and the nascent Library Publishing Coalition come to mind) for us to pool our brains, and our code, and our efforts to make these kinds of ideas a reality.