I think I found the absolute worst creative commons book to attempt to adopt into an epub - and then I did that.
Here is something I made. The proper e-book that bokselskapet used to
make their published version of Norske hexeformularer
og magiske opskrifter. My work took about 5 years to complete and probably
wasn’t worth it but I’m way to proud of my work not to write a long blog
complaining about the work.
If you aren’t interested in reading it was such a bad idea to try to digitise this
book specifically and just want to read the book just look here:
https://www.bokselskap.no/boker/hexeformularer/tittelside
Please take note of the mention on the title page.
Somewhere around 2016-2017 I became aware of the book Hexeformularer og Magiske
Opskrifter by Dr Anton Christian Bang. Published in 1901/1902 the book is every
spell Dr Bang could get his hands on from Norwegian sources. Going through
everything from court documents, newspapers, ethnographic studies, grimoires
(svartebøker) and anywhere else such things were published he created a
magnificent document showing the witchcraft as practiced in Norway.
These
Recipes/Instructions/Descriptions give a fascinating insight into the
weirdness, and recognisability of the wants of someone living 100-300 years ago. While
there are even spells dating back to the 13th century, most are from 17- and
18-hundreds.
The
Norwegian National library (NB) has the book downloadable for free as a pdf and
I spent a lot of time enjoying this fantastic scan. It is such a weird and
wonderful collection. My Bachelor’s degree theses was on creation of e-books so
I thought this could be a fun project to turn into an e-book.
When I am
talking about e-book it might be good to actually describe what I mean by that.
By an e-book, I mean a book in a reflowable format. Reflowable means that any
number of different aspect ratios and screens and sizes can read the book. That
means that things like a word-file document or a pdf cannot be described as an
e-book because it is locked unto a specific size. Documents such as epub,
mobi, azw, txt or html files change when the line breaks in a paragraph to fit
the screen it is shown on, it reflows.
The book
that NB had uploaded was a scan of the original book. They had used
Optical Character Recognition to "read" the text and added that text
to the pdf. The OCR has some problems though.
it creates has a lot of spelling mistakes, missing text, misreads
splotches as letters etc.
It can't tell the difference between the paragraph continuing on the
next line and a paragraph end.
So it took me a lot of work to correct the ocr into a proper book.
Then came
formatting. Ebooks are good at long texts like a novel. It can also do poems somewhat
well as long as all lines have the same indent. Hexeformularers doesn’t, so for
every line where the where the indent is different you have to manually program
that.
Then you
have the symbols. Most books use your normal alphabet that is normally
available on keyboard. When you deal with a large collection of spells. As well
as just straight up drawings that you can just easily add into the text. Of
course, then you have to describe the drawing in the alt-text so that if anyone
who for some reason can't read the text can use the file as well which is
difficult when the drawing is as complicated and as nonsensical as some of
these are. Then you have random symbols in the middle of the text. First you
look up to see if there is an actual Unicode symbol for the thing you are
looking for. Unicode has a lot of symbols available and unless you are an
expert in symbols used in 17th and 18th century spells you have to
take time everytime a new symbol shows up. If the symbol is there, you have to put
it in and note it for later use in case it reoccurs. If it does not, you have
to take a screenshot and use that symbol. And that one will definitely reoccur
so you note that in the same list of weird symbols.
Then you have that one pound-symbol that for some reason is different from
every other pound-symbol in the entire book. You go into a spiral trying to
find out what that means until you realise that it was either a mistake caused
when a physical type got mixed into the
wrong box or the just ran out of the normal ones. This is something that
happened regularly back in the days when physical type was used.
After all that I sent it to Bokselskap.no who were happy to publish it. They
did some extra jiggery-pokery to the code and went through it an extra time and
the result is what you can see on their webpage.
So in the end, if you in the future download this book from bokselskap.no then you
will see a small credit with the sentence “Digitaliseringen
er basert på fil mottatt fra Sindre Hovland Søreide.” (The digitization is
based on a file received from Sindre Hovland Søreide). Maybe not worth the 5-6
years of on and of work but there you go. I personally couldn’t be happier.
Kommentarer
Legg inn en kommentar