I think I found the absolute worst creative commons book to attempt to adopt into an epub - and then I did that.


Here is something I made. The proper e-book that bokselskapet used to make their published version of Norske hexeformularer og magiske opskrifter. My work took about 5 years to complete and probably wasn’t worth it but I’m way to proud of my work not to write a long blog complaining about the work.
If you aren’t interested in reading it was such a bad idea to try to digitise this book specifically and just want to read the book just look here:
https://www.bokselskap.no/boker/hexeformularer/tittelside

Please take note of the mention on the title page.

Somewhere around 2016-2017 I became aware of the book Hexeformularer og Magiske Opskrifter by Dr Anton Christian Bang. Published in 1901/1902 the book is every spell Dr Bang could get his hands on from Norwegian sources. Going through everything from court documents, newspapers, ethnographic studies, grimoires (svartebøker) and anywhere else such things were published he created a magnificent document showing the witchcraft as practiced in Norway.

These Recipes/Instructions/Descriptions give a fascinating insight into the weirdness, and recognisability of the wants of someone living 100-300 years ago. While there are even spells dating back to the 13th century, most are from 17- and 18-hundreds.

The Norwegian National library (NB) has the book downloadable for free as a pdf and I spent a lot of time enjoying this fantastic scan. It is such a weird and wonderful collection. My Bachelor’s degree theses was on creation of e-books so I thought this could be a fun project to turn into an e-book.

When I am talking about e-book it might be good to actually describe what I mean by that. By an e-book, I mean a book in a reflowable format. Reflowable means that any number of different aspect ratios and screens and sizes can read the book. That means that things like a word-file document or a pdf cannot be described as an e-book because it is locked unto a specific size.  Documents such as epub, mobi, azw, txt or html files change when the line breaks in a paragraph to fit the screen it is shown on, it reflows.

The book that NB had uploaded was a scan of the original book. They had used Optical Character Recognition to "read" the text and added that text to the pdf. The OCR has some problems though.

it creates has a lot of spelling mistakes, missing text, misreads splotches as letters etc.

It can't tell the difference between the paragraph continuing on the next line and a paragraph end.
So it took me a lot of work to correct the ocr into a proper book.

Then came formatting. Ebooks are good at long texts like a novel. It can also do poems somewhat well as long as all lines have the same indent. Hexeformularers doesn’t, so for every line where the where the indent is different you have to manually program that.

Then you have the symbols. Most books use your normal alphabet that is normally available on keyboard. When you deal with a large collection of spells. As well as just straight up drawings that you can just easily add into the text. Of course, then you have to describe the drawing in the alt-text so that if anyone who for some reason can't read the text can use the file as well which is difficult when the drawing is as complicated and as nonsensical as some of these are. Then you have random symbols in the middle of the text. First you look up to see if there is an actual Unicode symbol for the thing you are looking for. Unicode has a lot of symbols available and unless you are an expert in symbols used in 17th and 18th century spells you have to take time everytime a new symbol shows up. If the symbol is there, you have to put it in and note it for later use in case it reoccurs. If it does not, you have to take a screenshot and use that symbol. And that one will definitely reoccur so you note that in the same list of weird symbols.
Then you have that one pound-symbol that for some reason is different from every other pound-symbol in the entire book. You go into a spiral trying to find out what that means until you realise that it was either a mistake caused when a  physical type got mixed into the wrong box or the just ran out of the normal ones. This is something that happened regularly back in the days when physical type was used.

After all that I sent it to Bokselskap.no who were happy to publish it. They did some extra jiggery-pokery to the code and went through it an extra time and the result is what you can see on their webpage.

So in the end, if you in the future download this book from bokselskap.no then you will see a small credit with the sentence “Digitaliseringen er basert på fil mottatt fra Sindre Hovland Søreide.” (The digitization is based on a file received from Sindre Hovland Søreide). Maybe not worth the 5-6 years of on and of work but there you go. I personally couldn’t be happier.


Kommentarer

Populære innlegg fra denne bloggen

Utskrivbare julekort

No Gods, No Masters, Only Food

Four basic food Groups