Gentlemen Prefer Blondes: A Critical Edition

Production Details

Document Sources

This digital critical edition of Gentlemen Prefer Blondes is built on the foundation of seven documents representing the two important published editions of Anita Loos’s novel: the six issues of Harper’s Bazar where the novel original appeared in monthly installments between March and August 1925, and a first printing of the initial Boni & Liveright book edition published in November 1925. The copies of the magazine and book digitized for the purposes of the edition reside in the University Library at the University of Illinois, with access to the first printing of the first edition courtesy of the Rare Book & Manuscript Library (RBML). RBML has three copies of the first printing, and two were used for the production of the digitized images used here: one copy was used for the book itself including all interior content and the hardback binding, and the other was used exclusively for the dust jacket due to better condition. The Editor’s Introduction discusses choices of editions for inclusion and the consequences of those decisions at greater length.

Digitization and File Preparation of Page Images

The University of Illinois Library’s Digitization Services provided high-quality digitized images of the documents, producing access and preservation-quality TIFF images shared with the editor. Photoshop was used to crop illustrations from the access TIFFs for use in the edition, and the cropped illustrations were then saved as JPG files for use in the web display. A Research and Production Assistant also used Adobe to convert and arrange the access TIFFs into a PDF for each chapter for the book and a PDF for each chapter for the magazine (using only the pages with the novel text and illustrations). Digitization Services provided a fuller PDF of each magazine issue; however, only the novel contents went through the OCR correction process that follows.

Optical Character Recognition (OCR) and Text Correction

The Research and Production Assistant then used ABBYY FineReader 12 to create OCR text and correct OCR for each chapter of the book. ABBYY Finereader 15 was used for the same process with the magazine page images. The OCR correction process involved fixing boundary boxes to correctly indicate the placement of illustrations and text blocks, and then correcting OCR using ABBYY’s side-by-side view that allows editing of the text in a separate box from the pdf page image. These OCR-corrected PDFs are provided in this edition for individuals wishing to see facsimile images of the original print context, including the full magazine contents that appeared alongside Blondes.
     For the presentation of the critical edition, the OCR text was extracted from the PDFs into two plain text files. In the process running headers and page numbers were removed, as were artificial line breaks. At this point, a second Research and Production Assistant completed a second round of OCR correction, proofing the text files against the page images for the book and the magazine. Finally, the editor completed a third and final round of OCR correction and proofing comparing the text files to the page images. The veracity of the text also had an additional check in the collation process, which in a small number of cases identified apparent differences in spelling between the magazine and book printings of the text that did not fit established patterns, prompting a check against the page images. The final corrected text of the book version is used for the primary reading view of the present Scalar edition.

Collation

For collation, the Editor and the second Research and Production Assistant use the Windows desktop version of Juxta, an open-source tool created for this work that allows side-by-side tracking of texts. Juxta is a late-in-life product, and after completion of this phase of work ceased to be available. The Editor considered use of Adobe Acrobat’s text comparison feature instead. However, after examining the collation output of each, Juxta’s collation was considered to be much more useable. Specifically, in cases of text that has been revised rather than simply deleted or added, Juxta errs on the side of identifying smaller portions of revised text with some intermittent words and phrases that remain the same called out separately, whereas Acrobat errs on the side of identifying broader “revised” passages despite some repeated information. While both of these would cause challenges that would need attention to provide a final interpretation of the variants for the variants view of the present edition, it was far easier to identify places where, for example, Juxta unnecessarily identified one word as staying the same in a much larger passage that was entirely rewritten (in which case the word, by the editor’s interpretation, should be considered part of the revised passage), than it was to identify Acrobat’s unnecessary conflation of passages that the Editor would interpret as having been revised separately with some unrevised text between them.
     The Research and Production Assistant used the collation from Juxta to create the original HTML markup with directions provided by the Editor. The original markup was created using the text of the revised book version as the base text, with HTML span tags added around text with four distinct class attribute values attached. The first two were complimentary: a “removed” value indicates text deleted from the magazine version of the text in that location when the book was published, whereas an “added” value indicates text added to the book that was not present in the magazine at that location. In some cases, a word, phrase, or sentence(s) could be both removed and added if the content was moved between the editions. In these cases, a “move” class attribute value was also assigned to both spans. Words or phrases were in most cases (with the exception of highly unique phrasing) only counted as "moved" if they stayed within the same sentence. Finally, in cases where a paragraph break was added or deleted in the production of the print book edition, a pilcrow symbol was added to the appropriate point in the text and included in a span tag with a class attribute value of “para”.

Derivative Markup and Presentation

An original approach to the presentation relied on the original markup to provide two distinct views of each chapter: one with a magazine-centric and one with a book-centric markup. These showed all variants with content excluded from one version struck through and unique content double underlined. However, this approach created extreme visual clutter and maximized problems with accessibility for individuals with screen readers.
     Instead, this fully collated textual data was modified to create the final presentation of the text from the two documents. Each chapter page shows the two versions side-by-side in columns using an html data table. Use of the data table has two advantages. One is that it keeps related portions of the text side by side despite sections with lengthy cuts or additions. The second is that it divides the text into small chunks to maximize comparability of the text for individuals with screen readers, attempting to combat as much as possible the inherent problems of making variants legible to individuals with screen readers. In practical terms, this means that each table cell contains a paragraph, with a handful of exceptions in cases where paragraph breaks themselves changed between published versions of the text. In the cases with changed paragraph breaks, all paragraphs in the sequence in either variant are included in the same row of the table.
     CSS rules were created for styling of the span attribute values. Text unique to one document was highlighted one color, while moved text and paragraph changes were highlighted with a contrasting color.

Presentation of Image Variants

A comparison of the Ralph Barton illustrations as they appeared in the magazine and book versions is presented on a separate page of the present edition, and highlights differences in sizing and shading between the two print versions. The second Research and Production Assistant used Photoshop to pair each cropped illustration from the magazine with its counterpart from the book in a side-by-side view. The Editor took physical measurements of the illustrations in both versions (included in the media metadata), and this information was used to first check the size of the TIFF files. In the case of the magazine edition, the digitized file dimensions were not quite correct (although were fine in their width-to-height proportions), and so the page TIFFs for the magazine were corrected for size and the illustrations re-cropped for use in the side-by-side view.
     Users of the edition should note that the size of the illustrations on their screens will be related to a blend of factors including the file itself, screen size of their device, and Scalar’s incorporation of the image. Illustrations thus may not appear on the screen the same size as in the print originals. However, the method above should guarantee that the relative sizes in the side-by-side view of the illustrations does reflect the actual variation in the original documents.

Production Files

Users of the edition who would like a machine-readable version of the variants may extract the HTML encoding and CSS from the website or download (forthcoming) stand-alone files for the original collation markup and the presentation view, which will be collected at the bottom of this page for the final published version of this edition.