Bible Markup Pattern: A Proposal

23 comments | Posted: 26 May 08 in Tutorials, by Michael Montgomery

What is the best way to mark up text from the Bible? I’ve been thinking about this question for more than a year, and in that time I have sought an answer, or at least a proposal.

If you’re impatient, here are the examples: Exodus 20, Psalm 23, Matthew 5, and Matthew 6:5-15.

Caveats

Please remember this article presents only a proposal for consideration. I am no theologian; nor am I a renowned expert on front-end code. Nor does every page of every site I’ve ever built have “best” markup, however I do make every effort to do so.

However, it seems to me that—of all texts in the world—it should be important to make an effort toward best practices in marking up the Bible.

Of course, I welcome corrections and commentary, any thoughts or additions.

Background

Some good work has been done on citation of online Bible quotes, including BibleRef and OpenBible.info. There are even some Wordpress plugins for BibleRef.

Bibleref is a simple approach to automatically identifying Bible references [in] web pages.

I use and recommend BibleRef, which is a foundational proposal that focuses directly on the citation, for example: <cite class="bibleref">2 Tim 3:16</cite>. All of the present examples use the BibleRef citation format.

But I wanted something more comprehensive, that would help with marking up entire biblical texts or even a whole Bible.

Bibleref is part of a general movement toward markup that expresses more semantic, rather than presentational, element.

So, my question is broader than citation format: what elements should we use,
as best practice?

What’s “Best”?

When considering what may be a “best” way to mark up the Bible, several requirements or principles come to mind.

First, it means using (X)HTML in a way that is valid, minimal, and semantic. In this case, the term “valid” means essentially “meeting the requirements set by the W3C of the specification selected by the Doctype of that document.”

“Minimal” refers to adding as few elements and attributes as reasonably possible to the text itself, while preserving its structure.

And for purposes of this article, “semantic” means a few things: using meaningful elements that match each portion of the text, and communicate its functional meaning. In other words, if some text is a primary heading, use an <h1> (heading) element; if it’s a paragraph, use a <p> (paragraph) element; if a block quote, use a <blockquote> element, etc.

The word “semantic” also mandates general web standards principles, including:

CMS – Friendly

Another principle for my project is to recognize the ubiquity of the content management system. Most current content on the web is no longer in static web pages; rather, it is stored in a database and presented dynamically when someone asks for it.

So, the four examples I’ve prepared are all marked up using the excellent Textile syntax, the “humane Web text generator”.

Not a Microformat

I support and use Microformats, but it should be noted that this Bible markup pattern is not a Microformat, and for various reasons it probably never will be.

Current Practice

I did some research on how some publishers and versions are presented on the web, basically by looking at as many of these four examples in the four versions that were available on these five sites: Bible Gateway, English Standard Version, eBible, YouVersion, and WEB Bible.

The markup was about what you might expect from large sites with big content management systems. With very few exceptions, the markup of the pages were all invalid, not minimal, and not semantic.

However, they all at least declared a Doctype, and compared to many enormous commercial sites, most of the markup was rather clean. In fact, almost all used heading elements well, and used some arrangement of mainly paragraph elements with some class attributes. Depending on the content management system, my impression is that much of this Bible text markup could be much improved without undue effort.

Thing One, and Thing Two

Some informal study indicates there seem to be two basic semantic types or “genres” of biblical text.

The first type can be categorized as prose, and includes paragraphs, lists, and block quotes. The second type may be called verse, which includes poems, songs, and other lyrical matter.

I realize this may be gross over-simplification, but when constructing any taxonomy there is a tension in selecting the number of categories. In this case, I propose simply two categories, which adds semantic richness while being simple enough for this introductory article.

The two main genres in the Bible are narrative and poetry.
Editors’ Preface to the ESV Literary Study Bible

Even a simple two-category taxonomy can yield powerful results: pick up a Bible and compare Genesis to Psalms, which contain mostly prose and verse respectively. It’s obvious they are presented differently, and this basic presentational character can be preserved in the markup.

Examples

As examples of this proposed Bible markup pattern, it seemed appropriate to use passages from both Old and New Testaments, including prose, block quotes, and verse.

I selected four texts as examples, which are posted on my blog: Exodus 20, Psalm 23, Matthew 5, and Matthew 6:5-15. These passages are also known as the Ten Commandments, the Beatitudes, the Lord’s Prayer, and … the Twenty-Third Psalm.

Each example is presented in a different version: Exodus 20 in the NIV, Psalm 23 in the ESV, Matthew 5 in the WEB, and Matthew 6:5-15 in the KJV.

Overview: The Method

  1. Bible: The first step is simple: a container element, such as a <div>, gets a class attribute of “bible”.
  1. Headings: The second step is also easy: Headings use heading elements, such as <h2>, <h3>, etc.
  1. Paragraphs: Put almost everything else in a paragraph.
  1. Block quotes: surround one or more paragraphs of a block quote in with <blockquote> and </blockquote> tags.
  1. Verse numbers and Footnotes: At first, I thought it would be clever to use ordered lists for the verse numbers, but that only works if everyone begins all their Bible quotations from the beginning of a chapter. So, verse numbers are <sup> elements. Similarly, footnote reference numbers are <sup> elements with a class attribute of “footnote”.
  1. Verse, Poetry, etc.: As between “prose” and “verse,” I chose to make “prose” the default. If some or all of a passage is verse, then enclose that text with an element (such as a <span>) with a class attribute of “verse”.
  1. Additional: There are a few additional aspects, including div elements to enclose multi-paragraph “stanzas” of verse (see Psalm 23), and the markup of the footnotes.

The Styles

For ease of reference, the Bible style information is embedded in the head of the examples, so you can simply view the page source.

In actual use, they should be included in a separate CSS file with the other styles for that page. A sample CSS file of the Bible styles is available for download.

The Future

Some thoughts for the future:

Let the discussion begin….

Discuss This Topic

  1. 1 Nathan Smith

    Great write-up. One little nitpick though: Selah isn’t so much an affirmation, as an indicator for the reader to pause, and reflect on what was being said.

    http://en.wikipedia.org/wiki/Selah

    Selah = “Breathe, you reader, soak it in.” Almost like saying “dramatic pause” in a stage play. That being said, perhaps it could be marked up the same as Amen, which would be an affirmation, such as: “it is so” or “let it become such.”

     
  2. 2 Robert

    I really like this layout that you are proposing here Michael. Great write-up!

     
  3. 3 Jina Bolton

    Hey Michael —

    Cool idea and project to take on. I have some ideas:

    Some things you might consider — maybe class=“bible” could be something a little more less-specific, just so other religions could use similar markup patterns (religious opinions aside). Something like “scripture”? Regarding Microformats, while I doubt that marking up a bible would ever be considered, but it might be if it was made to be used for any religion.

    As for the superscript verse numbers, it does seem that this is the best solution we have available. However, I’m a little concerned about how that would be read on a screen reader. For example: What’s the difference between a 2 as in a second verse, verses a 2 as in a second footnote? Perhaps there could be more text (either in a span that gets hidden, or possibly just in a title attribute?) describing that difference (verse 2, or footnote 2), that just wouldn’t be visible to regular browsers?

    Also, where we use all-caps for “LORD”, maybe an em or strong would work, so that screenreaders can apply the proper emphasis needed? We could possibly use a similar treatment for “red text”, where some scriptures will make the words of Christ in red. Anyway, not exactly sure what would be the best way to do that.

    Great job. :)

     
  4. 4 Daniel Bergey

    Very cool … I may use this markup pattern in an upcoming project. I’ll keep an eye out for updates and changes, too.

    Slightly off-topic, but do you know of any web services that offer APIs to fetch bible texts from multiple versions? I know gnpcb.org offers ESV access, but biblegateway.com does not.

     
  5. 5 Austin from Boston

    Hi Michael,

    I really liked your initial idea of formatting the verses as LIs within an OL. If you can generate dynamic CSS, it’s simple to set a start # on the resulting list for excerpts. But the format ultimately is not conducive to copyng and pasting, since CSS rarely comes along for the ride. Also, for what it’s worth, scripture is not a list. Technology should affirm that, where possible. :)

    So what are verse numbers, semantically? Why not labels? How about label class=verse for=john_3_16. then you can say in css that label.verse{ vertical-align: superscript }, and put each verse in a span with a unique id, for easy dom access in an rich ui application as well as correct screen-reader interpretation. As far as the inital verse number, just add the class=“first” and then say label.first.verse{ display: none } in your css ( this may give IE6 users some hiccups, but won’t break usability ). Footnotes should be links, with css class footnote as you say. Cross-references should also be links.

    What to do with all the textual conventions that have grown up? Italicization for words inserted by the translators, small caps for the use of LORD indicating Adonai, words of christ in red? All of these should be represented sematically, of course. For LORD ( and other invocations of the holy name ) there is usually an extended name in the original that may be of interest to the user. How about classing LORD as an ABBR abbreviation element with a title of “Adonai”, “YHWH”, or “Elohim”. Then you can class the abbreviation class=“holy-name” with css for smallcaps( http://joeclark.org/standards/small-caps.html ). Backreferences to the Hebrew Bible in the New Testament texts should be links( class=quotation ), and probably should also be styled as small caps because many readers are accustomed to it.
    Jesus speaks in both distinct paragraphs and snippets within verses, so class “jesus-speaking” or “words-of-christ” should be applied to P tags or spans as necessary.
    Insertions may be better represented by grayed text than by the traditonal italics. CSS class “translator-insertion” makes this easy to explore. Alternative word choices can live in span titles instead of footnotes.

    A real challenge in creating a standard markup is creating something that can be extended beyond the creator’s original purpose. If I am running a client that is using your Bible markup as a source and parsing that data, how much work do I have to do? How much does your markup help me if I do not have the css? If I do a fulltext search for “wine” and want to know how many references to that word exist, grouped by book, a really solid markup makes it easy for me to do that based on XPATH or css selectors. Then manipulation can occur in the browser — which is the target application.

     
  6. 6 Ben Carlson

    Interesting for sure. Wouldn’t <h> tags be dependent on how a site is built though? Like if my site, before I get into the content (or these Bible verses), uses <h1> through <h4>, I’d be left with <h5> and <h6> for the Bible markup, right?

    Also what about the red-lettered edition? =)

     
  7. 7 Matt

    Very interesting proposal Michael. I think you may be on to something here. Be thanks for the CSS file as well.

     
  8. 8 Michael Montgomery

    Wow, thanks for the responses. And the great ideas & questions!

    @Nathan: Thanks for explaining. “Dramatic pause,” indeed (I’ll never forget it now!)

    @Jina: Good point; hadn’t thought about a screen reader with “verse 2” and “footnote 2”. Even more complicated is that I think some (study) Bibles use numbered footnotes for cross-references, and lettered footnotes for commentary(!) Which I didn’t yet attempt to accommodate.

    As far as all-caps, that’s currently an artifact of methodology: copy-paste the text, and Textile automatically applies class=“caps” to any string of uppercase letters. Small-caps would be better, so it’s not an optimal solution yet.

    @Daniel: Not sure. Might try eBible.com or YouVersion, but I think they may have signed copyright deals. Does it have to be an API, or can you just use the downloadable texts for e-Sword? Might ask here in the Godbit forums.

    @Austin: Interesting. I think @label@ may need to be associated with a @form@ control, and may be an inline element, so it wouldn’t be valid to use it by itself. Italics would make sense; it’s just that the versions I used didn’t include any. Lots more to think about in your comment…

    @Ben: Personally, I’d just allow the heading levels of the Bible text and the rest of the page to overlap.

    @Robert & Matt: Thanks!

     
  9. 9 Carl Camera

    Michael. Excellent work and research here, and I support the goal of standardizing markup surrounding the Bible. One suggestion for folks is to think outside the book — meaning that books need to incorporate footnotes at the bottom of pages to provide additional information, but online we can provide a richer user experience. For instance, rather than a superscript linking to a footnote linking to a cross-referenced passage, we can link directly from the superscript (or text) to the cross-referenced passage.. I’m guessing copyrighted texts would not care for any alteration.

    Mike, I would suggest you mark up Acts 2, where the author quotes Peter quoting Joel. I think this would be one of the more difficult passages to mark up semantically.

     
  10. 10 Jon

    Hey Michael,

    Interesting write-up… FWIW, I work for Gospel Communications (we produce biblegateway.com) and Daniel is correct that we don’t have a public API. It mostly has to do with restrictions from the publishers who kindly donate electronic copies of their Bibles for us to publish on the site.

    Have you looked into OSIS? While not exactly a “gold standard”, when we receive rights to use a new translation it is generally very helpful if it’s OSIS-flavored XML. Have a look at http://bibletechnologies.net/ and review the standard. Having dealt with dozens of different languages and translations, and with a few dozen more waiting in the wings, it’s very helpful to have a standard like OSIS gaining adoption among publishers.

    Hope that’s informative, if nothing else.

    Jon

     
  11. 11 Jonathan Devine

    Michael,

    I like the direction you are headed. Austin’s suggestions are interesting, but the matter of “LORD” “Adoni” “Elohim” etc. could lead to murky waters.

     
  12. 12 Michael Montgomery

    @Jon: Thanks for the quick feedback. Yes, I did look at OSIS about a year ago—it’s a full-blown XML schema, and is obviously the result of a lot of work.

    I was more interested in enabling authors (for example bloggers, and online versions of books) to publish biblical text with markup that is valid, minimal and semantic.

    I’d be interested to talk more about BibleGateway, so feel free to email: m aht michael montgomery daught net

    @Jonathan: Thanks.

     
  13. 13 Michael Montgomery

    Update: The ESV has improved their markup, and it now validates XHTML 1.0 Transitional: http://www.gnpcb.org/esv/

    Some recognition for good work!

     
  14. 14 Kat

    It’s thrilling to see people interested in doing things in a way which brings excellence to the world. Not the easiest way. Not necessarily the cheapest or fastest way, but in a way that will build a solid foundation for the future. I guess that’s kind of what the bible is about… Great work Mike.

     
  15. 15 Jack

    I too have been thinking about this, but on a smaller scale than Michael. I mark up sermon notes from Sunday morning services. In the Sunday morning service, the notes are a half-sheet of letter-sized paper with major points, scripture references, and some blanks to allow people to interact (e.g. Jesus is the (blank), we are the (blank). Answers: vine; branches).

    So I have been struggling with smaller pieces of scripture, whereas Michael appears to be wanting to markup whole books.

    While I like the idea of the ordered list to markup verses, it means you will lose the paragraphs that translations like the New International are broken into. I don’t like the idea of inserting the verse numbers into the text without some way of delineating them as separate. Most printed Bibles go for a smaller font size, but on the web we need a semantic way to handle verse numbers. How do Braille Bibles handle verse numbers?

     
  16. 16 Jack

    @Austin: I tried your idea for the label tag, and it is not required to be in a form, but since it is an inline element, it must be within a block element such as a p tag or a div tag for strict document types. However, with a Transitional document type, the idea does validate.

     
  17. 17 Michael Montgomery

    @Jack You’re absolutely correct: Bible markup should be sufficiently flexible to accommodate paragraphs that include multiple verse numbers. (In other words, each verse number should not necessarily begin a new line.)

    Good news: the current markup proposal does preserve paragraphs! See Matthew 5. I’m proposing that the verse numbers be marked up inside superscript elements. Of course, superscripts are inline elements by default, and you can put several of them together in a paragraph.

    Bad news: I showed no examples directly in this article. A big oversight (huge!), and I apologize.

    This does present an opportunity for me to prepare a follow-up article, with examples this time. I’m also working with Carl on answering his Acts 2 challenge. Stay tuned…

     
  18. 18 Robert

    Keep in mind that add class names do not have to always be with specific elements, so preserving paragraphs or any element is rather simple. Also, I think the Acts 2 challenge really isn’t a difficult one if the markup is expanded.

    I’ve been working on a Bible microformat for a little over a year, as I think we can do better than a single classname of bibleref, which really doesn’t tell a parser very much. Regardless of whether some institution accepts the microformat or not doesn’t matter to me. The fact that I could write a parser and aggregate the data does – having a single format to draw from.

    I really like what you’ve started here Michael, but I think marking up the Bible Reference must be done better – a single class [bibleref] just doesn’t cut it from a programmers point of view.

    To help spur this thought process along, I’ve written up examples of how I propose to markup Biblical references and have included some other examples as well. This was originally going to be a post here, but I think that would end up dividing the spectrum instead of coming together and solving this problem.

    Here’s those examples: http://tools.robertrevans.com/bible_markup.html

     
  19. 19 Michael Montgomery

    @Robert: The additional fidelity of a more detailed citation format is interesting (if it’s necessary for parsing; I’m no hard-core programmer).

    One thing worth checking out for ideas and a more detailed approach is OSIS, which is a full-blown XML schema for Bible texts.

    As far as marking up the text (not the citation), my purpose was to encourage valid, minimal & semantic markup of Bible texts, for general web pages and blogs, etc. So, I guess I’m a little cautious about complexity. Wouldn’t lang=“en-us” be set in the head of the document? Does class=“verse” add anything to the paragraph element? Is the sup element sufficient to mark up verse numbers, or does it really need class=“verse-number”?

    By the way, I wanted to confirm whether you received my e-mails?

     
  20. 20 Robert

    @Michael: I did check out the OSIS, but it’s more complex than needed, but necessary for XML.

    The first issue that arises by using bibleref is that brings many points of failures for a parser. It’s a single class for a citation that contains at least 3 types or more (Book, Chapter, Verse).

    Think about it like this, we have the adr microformat, what if it was just adr and did not include street-address, region, postal, etc.? This shifts the burden to the parser to strip out the various aspects of the address.

    Same with bibleref, the parser must accurately find the Book, Chapter, and Verse. What the parser receives is 2Tim3:1, when we could do something more along the lines of Book: 2 Tim, Chapter: 3, Verse: 1. Giving proper descriptions for each aspect really reduces many points of failure and lifts the burden off of the parser to know things it really shouldn’t have to know.

    As per the lang=“en-us”, it’s a valid attribute, but is primary used with the html element. I’d personally find it valuable when parsing to have some identifier to let me know what language the text is in. That way I can disseminate it to the proper audience at a later time.

    The added classes (verse, verse-number) are again for easy pulling. If you add two classes that you can save yourself 10 lines or more of code, then in my opinion it’s worth it.

    Keep in mind that the parser could be in JavaScript, which would make grabbing the elements without those class-names easy, but if it were written in C, those class-names become a lot more valuable.

    I got one email from you, were there more?

     
  21. 21 Michael Montgomery

    The proposal for changing the “citation part” is interesting.

    I’ll keep working on the “passages part”…

     
  22. 22 John Dyer

    One major thing to keep in mind as this kind of thing moves forward is the difficulty of book names and chapter order for computer readable/parsable markup – Different languages and styles write book names differently – If you went to a numbered book system, you’d need to take into account the apocryphal books – The book/chapter/verse numbering differs in Hebrew and English

     
  23. 23 David Reimer

    Great initiative, great article, great comment thread. What more can you ask for? :)

    Good to see OSIS mentioned. It would be worth knowing about ThML (= Theological Markup Language), an XML format used by the Christian Classics Ethereal Library. The Definition describes how it handles scripture citations (around paragraph 8). Possibly not precisely what Michael is after, but another participant in this sort of discussion.

     

Comments closed after 2 weeks.