On recording and producing a demanding audiobook

Learnings from The Harmonium Handbook, now on Audible, with comments on AI audiobooks

Sep 28, 2024

(Original version first published 20 Sept 2024 on the Crystal Clarity Publisher’s Blog; this present version has a few edits, a couple additional images, and an added section on AI-generated audiobooks. Also bless the Algorithm Angels, the Digital Devas, or whatever you’d like to call them by selecting the “heart” icon ❤️ even if you’re not a subscriber. It helps!)

I’m delighted to announce the release of the audiobook version of The Harmonium Handbook: Owning, Playing, and Maintaining the Indian Reed Organ, which first appeared in print (under my spiritual+legal/non-fiction name, Satyaki Kraig Brockschmidt), over 20 years ago.

Because I recorded all the audio and did quite a bit of the post-production myself to help reduce costs for my publisher, Crystal Clarity, I thought to share a few details and learnings from its production. Narrating the text was enough in itself; what made The Harmonium Handbook unique is the challenge and the opportunity to incorporate additional sounds and music. This is a book about a musical instrument, after all, and although some parts of the book work best in print, other parts lend themselves very well to audio.

What is a harmonium?

Before going into specifics about the demands of recording the audiobook, though, you might not know what this instrument called the harmonium actually is. Simply put, harmoniums are small, hand-pumped reed organs that were originally Western instruments but that are now made pretty much exclusively in India.

*A typical India harmonium; this particular instrument is an older Bina model, probably from the 1970s, that I recently restored.*

To play, you pump the rear bellows to inflate a second bellows inside the instrument. This action builds air pressure in the internal bellows thanks to a couple of simple leather flaps that serve as one-way valves. Provided you have an air stop open (that is, you’ve pulled out one or more of the appropriate knobs on the front), depressing a key then allows that pressurized air to flow past the reeds for that key, producing sound. Most harmoniums have two sets of reeds an octave apart, making for a richer, fuller sound. Here’s a short clip of Thou Art My Life, music written by Paramhansa Yogananda for a poem by Rabindranath Tagore.

1×

0:00

-0:46

The harmonium is also one of the simplest instruments to play because unlike almost every other instrument you don’t have to coordinate your two hands. Playing a guitar, violin, or other stringed instrument, that is, requires that you synchronize how you hold the strings with one hand and how you strum, pluck, or bow with your other. Wind instruments are even more challenging, as you have to coordinate both hands along with the breath. On the harmonium, however, the two bellows design means that the pumping action is quite separate from playing notes on the keyboard—not entirely independent, but very little coordination is needed. That’s why The Harmonium Handbook can give playing instructions in a single chapter.

As the book describes in Chapter 1, the harmonium has its origins in ancient Chinese mouth organs that were brought back to Europe by Marco Polo. Europeans eventually adapted the free-reed principles of the mouth organ to a keyboard instrument and built many different models ranging from simple hand-pumped versions to foot-pumped models with multiple keyboards (manuals).

I’m here playing here an old-style foot-pump harmonium at the Black Swan Inn, Tilton, New Hampshire, September 2002.

Alongside the sewing machine, the harmonium was one of the big consumer technologies in the 1800s for two reasons: (1) it was easy to play, as I’ve described, and (2) also unlike most instruments, it produces a continuous sound. Both factors made it ideal for playing and singing hymns and other devotional music in small chapels and the home, where a full-on pipe organ was too expensive or patently impractical.

The instrument was so popular that at one point there were over 100 manufacturers in the United States alone. Alas, in the early 20th century electric organs displaced the harmonium, and it fell out of favor. But by that time missionaries had carried them to the India where they took root and remain a favorite to this day.

But I won’t spoil the whole story—you can read (or hear) more in Chapter 1 of the book.

Could it be an audiobook?

When Narayan Ramano, the publishing manager at Crystal Clarity, first asked if I could do an audiobook, I hesitated. The Harmonium Handbook is something of a technical manual, as befits its title. It contains quite a few photographs and diagrams, especially in Chapter 4, which covers how harmoniums are put together, and Chapter 5, which is a troubleshooting and maintenance guide. Could that sort of material work in audio?

At the same time, Chapters 2 and 3, which cover playing the instrument, lend themselves quite well to audio. Indeed, much of the content is best communicated through sound than printed words. For example, it’s much easier to understand the different playing styles with audio than it is to understand them through printed descriptions.

So, it was worth giving it a go. A few initial tests proved promising, and once we decided to include a companion PDF with the audiobook that contains all the pictures and illustrations, I no longer worried about listeners losing track of details that are still best communicated visually. We really could have the best of both worlds. I simply had to collect all those images into a single document and edit the recorded text to refer to them accordingly.

What helped this process tremendously was that I read through the text aloud and made whatever edits were needed so that the audio narration could refer to the companion material. You’ll notice those changes if you happen to follow along in the printed book as you listen to the audio.

Recording narration is a performance

Before starting this project, I didn’t appreciate just how much recording an audiobook is a performance. I’m not even talking about the kind of performances that voice actors do for different characters in a book of fiction—that’s another level entirely. No, I’m talking only about the demands that reading a book aloud—with all the necessary breathing and enunciation—makes on one’s physical body. For, like public speaking and singing, both of which I have ample experience, vocal narration takes a lot of lung work.

Thus, I learned a few things during the recording process:

It worked best for me to stand, pace myself, and read from a screen as if reading from a teleprompter.
When I started, I tried to record the whole first chapter all at once but ran myself out of breath and had to start over. Eventually I learned that it was best to record about 6-8 minutes at a stretch and then rest my voice for about twice that time. During that rest I did some preliminary editing on what I’d just recorded because I could easily remember where I’d made mistakes.
The voice can and will get tired: don't expect to do too many hours all at once unless you're accustomed to long-duration public speaking.
You will get tongue-tied. Consider practicing a few vocal exercises before a recording. I also had to practice pronouncing a few names in Chapter 1, such as those of the Pansymphonicon, Uranion, Poikilorgue, Royal Seraphine, and Aeolomelodicon. With names like that of French inventor Alexandre Debain, I edited my narration text with visual clues to the pronunciation, such as “AlexANDrehh DeBAH[n]". (I hope I did Debain justice: my Germanic genes struggled a bit with the French inflections.)
You’ll also encounter certain phrases that the eyes read perfectly well on the page but end up being much harder to say out loud. Some phrases also come across as awkward when spoken. Thus, I changed a few words in the text to make them roll off the tongue more easily.
Because an audiobook is a studio recording and not a live performance, there’s no room for error. In other words, when—not if—you flub up you have to do retakes. When you make a mistake, always allow a little blank space in the recording before continuing—that silence is easy to see in an audio waveform. Then back up a sentence or to another point with a clear pause and pick up the narration from there. This way it’s easy for the editor to cut out the partial sentence with the mistake and use your clean version instead.

One other practical tip: give each recorded segment a sequential filenames (like 01-04 for the fourth segment in Chapter 1). This makes it easy to import into the audio editor and keep them lined up in order.

Thanks for reading Deus in Fabula. This post is public so feel free to share it.

Recording equipment

What kind of equipment did I use to record the narration? The answer might surprise you because the usual advice is to use an expensive microphone with a pop filter and some other fancy bits. I didn’t have such items when I started the project, so I first did some tests with equipment I already had on hand: a computer headset, a lavalier microphone, and a $10 wired headset plugged into my modest Android phone (a Blade X1 5G if you must know). I then sent the recordings to Crystal Clarity’s audio engineer to check if any of them might work. It turned out that the latter option—the inexpensive wired headset plugged into my phone—worked so well that I recorded the whole narration that way.

I also used my phone to record sounds from the instrument, everything from played notes to buzzing reeds and little clicks and squeaks that I wanted to include in the troubleshooting sections of Chapter 5. In these cases I simply held the phone close to the origin of the sound and used its built-in microphone. Again, the phone rose to the challenge, providing recordings with little to no extraneous noise.

For the audiobook, I also played some passages on the harmonium and sang as well. For these segments, I again made a series of test recordings with my phone mounted on a tripod and placed at different distances from both the harmonium and my mouth. The engineer and I found the placement that resulted in a good balance.

Audio editing software

From my days at Microsoft, I retained a license for an older version of a program called Camtasia. Camtasia is primarily meant for screen recording but also has a good audio editor, meaning that it lets you stretch out the audio waveform so you can see every detail down to the millisecond—every pop, every breath, every lip smack, every extraneous noise. By listening to the audio as you watch the time bar pass over the waveform, you quickly learn what specific noises look like visually. Editors like Camtasia then let you easily snip out segments of any length. As long as those segments are surrounded by silence, you won’t hear any artifacts of the editing in the final production.

A before-and-after of a short clip in Camtasia. In the top image you can see a few extraneous mouth noises in the waveform, which are edited out in the bottom image.

Had I used a better microphone with a pop filter, I might not have needed to clip out as many little bits. But I was recording a relatively short book; had I been doing War and Peace or Les Misérables I would have optimized the recording process more to reduce the editing time.

Another helpful tip is to learn your editor’s keyboard shortcuts for the operations you do most often. In Camtasia I first learned one for “Ripple Delete,” which both deletes a selected segment and then shifts the remaining audio to eliminate the gap. Then I learned a shortcut to select everything to the right of the time bar, which made it easy for me to slide everything over to make a gap for an added sound (or, in a few cases, a correction that I re-recorded). These two keyboard shortcuts alone greatly improved my efficiency.

Recording and mixing in added sounds

As mentioned earlier, what also made The Harmonium Handbook interesting—and what adds tremendous value to the audiobook version—are all the instances where it’s helpful to hear what the narration is trying to describe in words. This is, again, a book about a musical instrument. When I rehearsed the narration and made edits to the text, I also identified any little sound or extra audio clip that I wanted to include, such playing and singing certain pieces or capturing noises caused by malfunctions. Again, being able to hear the different playing styles that I talk about in Chapter 3 adds tremendous value to the audiobook, so much so that readers who own the printed book will likely want the audiobook as well.

A view in Camtasia of a section of the waveform for Chapter 5 showing the narration along the lower track and added sounds on the upper track.

The trickiest sounds to capture were those produced by maladjusted instruments. When I originally wrote The Harmonium Handbook, I was importing a dozen or more instruments from India every three months. I imported somewhere around 120 between 1998 and 2004 and every last one of them needed some kind of tweaking. That’s how I learned what could go wrong.

When recording the audiobook in early 2024, however, I didn’t have such an ample supply of needy subjects on hand. I had to instead inflict many of those problems on my personal harmonium. But rest easy—it’s back to its good form.

Once I collected all the sounds, I arranged the short clips in Camtasia on a parallel track to the narration. I then made appropriate gaps in the narration to accommodate the sounds so the two tracks would play together seamlessly. I delivered both tracks separately to the engineer so he could do the final balancing and mixing.

How long did it take?

As I worked on recording and then editing one section of a chapter at a time, I learned that it took 15-20 minutes for every 250 words of narration, to which was added the time to record, edit, and mix in the added sounds. Altogether I worked on this project part-time for about six weeks, so I’m estimating that I spend a total of about 130 hours on the 29,000 word book.

In the end, I’m delighted with how the audiobook of The Harmonium Handbook turned out and that you can enjoy it now as well!

What about AI-generated audiobooks?

(This section does not appear in the post on the Crystal Clarity Blog.)

Having done a full audiobook the old-fashioned human way, I can understand the attraction of authors looking to produce an audiobook with AI. Not every author is comfortable recording themselves and not every author has a voice that necessarily works well on audio. For myself, I gained plenty of public-speaking experience during my years at Microsoft and have done enough singing to know that I could trust my voice through the process.

Furthermore, it’s expensive to have a human record an audiobook, especially if you have to hire a voice actor. BookBaby, for example, according to https://www.bookbaby.com/audiobooks, would charge the following for the 29,000 words of The Harmonium Handbook:

Author-narrated (you record and hire the editor): $500 plus $0.14 per word. Total for 29,000 words = $4560
Voice actor-narrated: $500 + $0.16/word = $5140 for 29,000 words.

Obviously, a lot of the cost is in the editing rather than the narration, which is exactly why I did so much of my own editing for The Harmonium Handbook.

BookBaby also offers a service that uses samples of your own voice to train an AI system to produce text-to-speech narration in your “own voice.” It’s less expensive than the human-read options above—$2000 + $0.06/word = $3480 for 29,000 words—making it an attractive option.

In the case of The Harmonium Handbook, too, I must note that AI could have been used for only the narration. I would have still spent time to rehearse and edit the manuscript to change the references to the companion PDF. I would also still have had to record, edit, and produce all the added sounds, and then the project would still have required a lot of editing to produce the mix. For this specific project, then, using AI tools probably would have increased the overall cost.

What’s lost in the process of using AI?

Cost and technical considerations aside, I must also ask: what’s lost in the process of using AI for audiobooks? Training an AI with your voice might produce an audiobook that more or less sounds like you, but let’s be honest: it’s not you at all. It’s a fabrication. I might even so far as to say it’s a deception, even a disingenuous lie, because it not only lacks the author’s consciousness but any consciousness at all. Humans, as spiritual beings, respond to consciousness as much as we respond to sense inputs, and thus using an AI cuts out an entire dimension of communication. (I think this is also true of writing.)

For that reason, I prefer a hearing a human voice actor to AI-generated audio, even to an AI using the author’s voice, because an AI, being a computational engine, cannot feel what’s going on. Perhaps that’s not so important with non-fiction, but with fiction, and especially fiction that involves spirituality and mystical realism as I discuss here on Deus in Fabula, an unconscious, unfeeling AI simply cannot bring human sensibilities to the narration. Perhaps computational tools may someday be capable of this, but with the present technology, no.

And if I have a choice, I prefer an audiobook read by the author because the author will understand nuances of the text that a voice actor could perform only with coaching from the author (otherwise it’s presumption). For example, there are a few places in The Harmonium Handbook where I knew I was injecting a little humor, so I could chuckle a little with the words. I don’t think AI is ever going to do that sort of thing unless also trained by the author, and even then, it’s still a function of consciousness and not computation.

All that said, I can understand an author using AI with a stock-voice to at least get something out there to start generating revenue. I would suggest, then, that authors save revenue from sales of such an audiobook to then invest in an author- or voice actor-narrated version. I think your book—and your readers/listeners—deserve it.

What do you think?