Have you ever heard of the Voynich manuscript? It’s a mysterious book, possibly created in the early 15th century, that contains weird illustrations and text written in an unknown script and language. Since its rediscovery in 1912 by Wilfrid Voynich, it has eluded the decipherment attempts of generations of cryptographers. The Voynich manuscript is a fascinating piece of history that has inspired many novels, games and films. Amateur cryptographers can find the latest news and research on the Voynich manuscript and other uncracked ciphers on Nick Pelling’s blog Cipher Mysteries. He’s also the author of the readable non-fiction book The Curse of the Voynich.
To celebrate the publication of the 500th Sandra and Woo strip, I have decided to publish “my own Voynich manuscript”. So here it is, The Book of Woo! As you can see, it resembles the Voynich manuscript in several ways. But of course we couldn’t create 240 pages, 4 had to be enough. Unlike the Voynich manuscript, The Book of Woo definitely contains sensible information that can be deciphered. I guarantee it ;-). And I will pay the person who is able to provide a decipherment that’s sufficiently close to the plain text a reward of $500. Send your decipherment attempt(s) to novil@gmx.de. I would also love to hear about your general ideas or statistical analyses that you carried out. There is no deadline. I will not publish the solution until at least strip #1000.
But be warned: It’s a huge challenge and I don’t expect to receive a valid decipherment at all. It’s primarily a work of art, not a puzzle for the general public. I believe that only experienced and dedicated code breakers have the chance to succeed. A lot of time was spent on the encryption. If you think you can simply carry out a frequency analysis on the letters and be able to reconstruct the English or German plain text this way, well, that’s just a waste of time. However, to make things a little easier, I want to give you the following hints:
- The encryption isn’t based on an algorithm only suitable for computers which executes a loop 100 times or something like that.
- The encryption isn’t based on some sort of device or mechanism that is hard to get.
- No “classical” steganographic method was used since that would just be impossibly hard to crack.
- The plain text is some sort of literature, as one can guess from Woo’s comment and the illustrations. A lot of time went into the plain text as well, it’s not just a copy of the first page of Rascal or something like that.
You can download larger versions of the four pages of the Book of Woo here:
[Update: 10 August 2013] Everybody who is seriously interested in deciphering The Book of Woo should read the comment section. There is a lot of interesting information in it.
[Update: 31 March 2015] The Book of Woo Wiki, now maintained by our reader Chris, also contains valuabable information for anyone who’s trying to break the code. In case the wiki should go offline sometime in the future, I created a complete backup of the wiki’s content on 31 March 2015.
In other news, the winners of the Sandra and Woo and Gaia fanart contest 2013 have been posted.
Thanks to everyone who participated!
- Sandra: Hey, Woo, what are you writing?
- Woo: Oh, just a little story.
- Sandra: Really? Can I have a look?
- Woo: Sure, I’ve just finished it.
- Sandra: What in Voynich’s name…?!
|
If anyone wants to partner with me to figure it out, e-mail me at utuy400@yahoo.com
I looked at the voynich Manuscript and you know what I realized it bears a striking to modern science and the modern English Language which is also strange because the Manuscript itself is hundreds of years old but just can’t get over how familiar it looks.
Hello !
Just a few thingies :
* the word “raccoon” or “coon” won’t necessarrly appear in the text, as stated above : in comic 0494, we see the goddess use the periphrasis “the masked ones” (“maskierten”).
*on a linguistic level, a brief examination of the code reveail a few basic facts :
– The language appear to be written regularly from left to right (nothing too much earth-shattering here)
-the same word-coding system seems to be used throughout the text. (For instance, it’s not a code with a progressive shift from a sentence to another)
-it looks like a flexional language with affixes : at first glance, one can detect various suffixes (look out for the three last characters of the second word – fist page) : it could be a plural form, a verbalization etc…)
The first page image reminded me of Ratatoskr climbing Yggdrasil and the bird could then be The unnamed eagle I do not thikn that would make much sense if you combine it with the other pages but it could mean that Novil used mythology for the story/stories
Maybe the space in this text is a character in plain text and one of the
Characters (the most frequent one) is the space.
That would erase a lot of easy to spot combinations of text
I’m making remarkable progress on my grammar approach, but I’m going to keep quiet about that for now. However, I did want to point out a slight mistake that I made and other people might make. On line 15, there is no space between the 4th character and the 5th; that’s one long word. More specifically, one 6-letter word, which has a 95% chance of being a verb.
@ someboddy:
LOL Howard The Duck had a better lovestory than Twilight and Hunger Games combined.
I love this comic! 😀
I programmed a little cyper helping tool. Just paste in the cypher text into the top box and click the read text button to start replacing letters with whatever you choose. It does have a couple bugs. First it will crash if you click the button with no text in it and it has a bug where if you replace a letter with empty text it will crash the next time you try to change the letter. I’ll fix those bugs later.
Amaterasu….
Is no one else going to point out the fact that the goddess raccoon looks like a raccoon version of Okami?
Well, I surely want to decode it, but I am sure that even if I tried my best, it would take me years to decode it. I want to see what does it mean as soon as possible, I’m really, REALLY excited. Anyways, knowing that you upload two strips per week, it means that the strip 1,000th will come in 250 weeks, and a year has 52.177457 weeks, then we have 4.79134121082 years, more accurately, 4 years, 9 months, 14 days, 21 hours, 11 minutes and 17 seconds. Well, my calculator makes a sad face, and that is IF they are all released on time. So please, PLEASE, someone decode it. I can’t wait that long.
@ Phlosioneer:
I’ve been using that technique with my program with little luck.
The problem i’m having is the sheer number of combinations of seemingly impossible letter groupings. Take for example that quoted text with the transcoded phrase “sn c=h=”.
The T like thing which is the “=” letter here is the most common letter in the script programmatically. This means it “should” be a vowel or something like a t or s. The “sn” part the n is also one of the most common ones so it should too be a vowel or the like. Since “here” is an obvious choice for this word i tried it but there are plenty of other LONGER words which start with the “c=h=” combination some in the 12 and 11 letter range which make the wording “here” improbable at best impossible likely.
I’ll paste in a list of statistics based on the text and the replacement version so you can see what i mean..
Letters By Percentage:
(e) =: 7.26230291447683
(-) n: 6.97563306258958
(-) j: 6.83229813664596
(-) d: 6.73674151935021
(-) p: 5.01672240802676
(-) w: 5.01672240802676
(-) m: 5.01672240802676
(-) r: 5.01672240802676
(-) z: 4.73005255613951
(-) v: 4.58671763019589
(-) a: 3.63115145723841
(-) y: 3.53559483994267
(-) e: 2.96225513616818
(-) u: 2.96225513616818
(-) x: 2.91447682752031
(-) s: 2.81892021022456
(-) l: 2.67558528428094
(-) i: 2.58002866698519
(-) >: 2.53225035833731
(-) $: 2.15002388915432
(-) o: 2.0066889632107
(-) #: 1.81557572861921
(r) h: 1.76779741997133
(-) g: 1.76779741997133
(h) c: 1.62446249402771
(-) b: 1.24223602484472
(-) f: 1.19445771619685
(-) q: 1.0989010989011
(-) t: 0.62111801242236
(-) k: 0.525561395126612
(-) &: 0.382226469182991
First Letter By Count:
(-) i: 40
(-) l: 36
(-) s: 31
(-) g: 23
(-) x: 21
(h) c: 20
(-) #: 16
(-) >: 15
(e) =: 15
(-) v: 14
(-) y: 14
(-) $: 14
(-) d: 14
(-) r: 13
(-) f: 13
(-) p: 10
(-) &: 8
(-) e: 8
(-) m: 8
(-) n: 6
(-) u: 6
(-) o: 4
(r) h: 4
(-) t: 3
(-) a: 3
(-) j: 2
(-) k: 2
(-) q: 2
Last Letter By Count:
(-) n: 83
(-) j: 66
(e) =: 51
(-) d: 49
(-) m: 29
(-) a: 26
(-) v: 15
(-) e: 13
(-) u: 12
(-) y: 9
(-) &: 8
(-) o: 2
(-) q: 1
(-) w: 1
with 1 letters by count:
(-) &: 8
(-) v: 1
with 2 letters by count:
(–) sn: 5
(-e) s=: 2
(–) >n: 2
(–) $j: 1
(–) xj: 1
with 3 letters by count:
(–e) em=: 4
(—) dpd: 4
(-e-) t=m: 3
(e-e) =r=: 3
(—) qgd: 2
(—) kda: 2
(e–) =rv: 2
(—) #ua: 2
(—) fjo: 1
(—) jpe: 1
(—) dpy: 1
(—) rua: 1
(—) n#v: 1
with 4 letters by count:
(—-) gdod: 6
(—-) ibrn: 5
(here) c=h=: 5
(—-) iuad: 4
(—-) fdlj: 3
(e-h-) =mcv: 3
(-e–) #=in: 3
(—-) rnrn: 3
(–h-) sbcv: 3
(—-) $ypj: 3
(—-) svrn: 3
(-e-e) #=s=: 2
(—-) gdle: 2
(e–e) =m>=: 2
(—-) lqpj: 2
(—-) daxd: 2
(—-) fd$d: 2
(—e) lem=: 2
(—-) fqpy: 2
(—e) rb#=: 2
(—-) ld$d: 2
(—-) vzmn: 2
(-e-e) i=s=: 2
(–r-) ivhn: 2
(—-) ljfy: 2
(—-) fqgy: 2
(—-) supj: 2
(-e-e) s=s=: 2
(—-) $ern: 2
(—-) lyoj: 2
(—-) odle: 1
(—-) lqae: 1
(he–) c=iu: 1
(—-) mnsn: 1
(—e) nmi=: 1
(—-) uxya: 1
(—-) upja: 1
(re–) h=iu: 1
(h—) cu$e: 1
(-ere) r=h=: 1
(—e) xe#=: 1
(—-) xdpj: 1
(—-) #brv: 1
(—-) xjaj: 1
(-e-e) >=m=: 1
(—e) ae#=: 1
(-e–) r=>v: 1
(—-) odpq: 1
(—-) ge#n: 1
(—-) yw$d: 1
(—-) ge#v: 1
(—-) xdad: 1
(—-) $qgy: 1
(—-) odod: 1
(—-) dagy: 1
(r—) hnrv: 1
(h—) cufj: 1
(—-) >nmn: 1
(—-) pqfd: 1
(—-) $d$d: 1
with 5 letters by count:
(h—-) cvm>u: 7
(—–) gyaxe: 5
(—–) xjpja: 2
(—–) >nrnm: 2
(—–) adaxd: 2
(—–) xjw$j: 1
(–e–) vz=rv: 1
(—–) $jwpj: 1
(—–) vziuo: 1
(-e–e) m=m>=: 1
(—–) iuopj: 1
(—–) yawxj: 1
(—–) vzuad: 1
(-e—) s=z>n: 1
(—–) ywfem: 1
(—–) rb#nm: 1
(—-e) >nzs=: 1
(–re-) vzh=m: 1
(-e—) m=inm: 1
(—–) dpdld: 1
used twice in a row by count:
Story with replacements:
–r— ——- —- —-: —— ——- —- ——– –r-. re—e —- ——e ——— – – —–e. h—- —– —- —- ——- –e— ———– — ——- —- —– –r-. — ——- —- —-: ——- —- ——–. –e ——- -e-e —— – here here–e- -e-e e– -e-e— here-he—e — ——- —- —. —– here h—- ——- ——- –e ——- —– ——– ——- ——. —- —e— —- —e–e —- –e–e—- —- —— —- e-e —– —-. —– ——- –e -e–. h—- ——— —e ——- —- ——- —- –e—- —— ———- here–e —-.——- —- ——- –e— -e-e here–e —-. -e ——- -e– —- ——— —- ——- —- ——- h—- —.–r— ——- h— —e—- ——- —- ——- –r— e—– ——— – —–e. — ——— —- — —-r— e-h-. -e——–e — ——— —-: —- ——- —– —- —-r— e– ——- —e –r—- —— —- —-. ——— —- ——- ——– —e—— –h—- –h—- —e—— -e— -e– —-. -e— –r—- —- ———– ——- —-. –r— e—– – —–e— r— here -e-e —- —-. ——— —- ——— –e –r—e ——— —— —— – ——— —– ——–.—- —e— —- —e–e ——— ——— – ——— —- ——–. –r—— – —–e— ——- –h- —–e— ——- —- –rere e-h-.h—- ——- —– ——- –r——— ———– — —- ——- ——- h—- —. —— —-. —- h—- —- —- -e–e- ———– —- —- ——— ————e —-e —e— —e —-.-e—- e–e –re- –e–. e—-e— — —e e-h-. -e- ——- ———-. —– ———- — —e. -e- –h—- —– — here—. -e- –h—- —– —- ——-. — ——- —– —–e—- —– –r—. — —– ——- ——- ——- -e-e-e ———- h— —- ——-. –r— –h- e-h—- -e-e— —–e—- ——— —- ——– ——– — —– —- ———- —e—.-e- —- ———– -e e-h- —– —— – ——. —- h—- ———- —— —- -ere —— re– —- —— —- —— — —- —- — —- –r— -e—– -e–e —-. ——— —- ——–. — –r—— here —e–e— -e– —- here —e–e— –r—– –r—– —- e-e. — ——— —-. —e –h-. -e-e ——- -e-e ——- e–e. he– —– -e-e–e —— ——- —-! —– ——–e -e–e— ———— —-! —– -e-e–e -e–e— -e-e e-e —-. —e re——– — ——– — —-. ———– ——–e — ——- —- —! ——– ——— – ——- ——!
I translated the code on the bottom of the pages of Artemis Fowl before the Artemis Fowl Files came out with the code key inside, but that was because, in Artemis fowl, there was a letter written in the language I was translating from that Artemis had received and there was a translation of it on the next page, thus I was able to flip back and forth and figure out which symbols matched which letters.
abowden wrote:
So that just makes The Book of Woo even more like the Voynich Manuscript than it already was! 🙂
@ Jamie:
You don’t quite understand my approach, Jamie. The letters themselves are meaningless to me; I’m only concerned with the patterns they create. Using your example of the word c=h=, the two “=” characters might not refer to the same letter, and the c and h characters might very well refer to the same letter. However, meaning can still be found. My analysis is as follows:
First, there is another word with it in quotes. It would be a fair guess that the quotes refer to a name, like saying “He called himself ‘the exterminator’.” Therefore, the two words together probably form a noun phrase.
Next, let’s look at all the other places and forms that word can take. It appears two other times in the same sentence: once at word #7, and once at word #11. Each time, it is has suffixes attached to it. I haven’t done a complete analysis because there are more fruitful words I could be spending my time on. Of particular interest, though, is that there is a suffixed version of “c=h=” immediately after the quotes, and then again in the same sentence, which is a very curious pattern. I was thinking it could perhaps be a shortened form of a name, like saying “The God ‘Seeohm’ Seeohmhtlahmakasay”. Perhaps a bit informal, but a theory none-the-less. This is Woo we’re talking about.
Performing a brief suffix search for those particular suffixes doesn’t turn up any other words with those suffixes. Therefore, they probably don’t indicate case. That means “c=h=” can be used as a sub-word, like “where” in “wherever”, or “whom” in “whomsoever.”
Using this as a hunch, we can scan the text for other appearances of those suffixes as proper words, and we might get lucky. I haven’t done this on that particular word yet, but a quick search through my notes doesn’t yield anything promising.
Next, we can look at surrounding words. The two-letter word immediately before it in the quotes, “sn”, is much more fruitful. It starts several sentences, and is often used before one word in particular which I think is a verb. Further, it features in the three-word sentence in line 5 of page 4, which greatly helps in determining likely part of speech. Personally, I think it’s a pronoun or definite article of some sort, the latter of which bodes well with our initial idea that the words in quotes were a noun phrase. It is also found in a 4 word clause in line 3 of page 2, right before the colon, which would reenforce this idea.
If you continue this analysis for as many words as possible throughout the text, you will start to get a surprisingly good idea of what the sentence structure looks like.
As a final note, remember that anything could be a null character, null sequence, diphthong, or the like, particularly in relation to word conjugations and endings, as that is an easy exploit for cryptanalysts when spaces are preserved. Novil has taken deliberate steps to thwart frequency analysis; I don’t recommend using it except after finding all the null characters / patterns or breaking a layer of encipherment.
@ Jamie:
Also, I recommend doing it by hand. Computers are tools, not cryptanalysts. You will find many more patterns if you start circling, highlighting, and drawing connections. There are several obscure patterns I’ve noticed already that may be artifacts of an underlying layer of encipherment. There are repeating patterns of ABAB, ABCB, ABCBDBE, and ABCBDEB words that shouldn’t be happening.
Also, for the community at large, I have four things to report.
1) There are most likely more than 31 characters in this alphabet, because the odds of all 26 letters being used are high but not 100%. It’s entirely possible there are no z’s or x’s, for example.
2) Some kind of substitution cipher is at work. I know this because there is a letter frequency (that is, some letters appear much more often than others, in an exponential curve). If it weren’t a substitution cipher, then the letters would be evenly distributed. Further, the enciphering of one character to another occurs (at least mostly) independent of the letters near it, and of its position in the word. I know this because I’ve found 4 and 5 letter words with both suffixes AND prefixes.
3) If there are null characters, they follow a very defined pattern based on word length, neighboring letters, and/or word conjugation. There are NO known instances of a word being “interrupted” with a random letter, which means that when a word is repeated, all of the copies have the same null characters in the same places.
4) The strange circle-loop-circle character is indeclinable. Because of this, it is probably a noun which doesn’t make sense in the singular or plural, or a name, or a grammar construct (like a preposition, pronoun, etc). There is also a possibility it is a red herring and is actually a null character.
Another comic I’ve seen about the Voynich Manuscript: http://xkcd.com/593/
Something I thought I should point out:
I actually built an alphabet / language of my own recently. The language, when spoken, would sound nearly the same as it would in English, but to throw off potential code-crackers the language is written as pronounced.
Also, my alphabet contains separate symbols for “long vowels” and “short vowels” as well as separate symbols for certain letter combinations like “TH”, “SH”, & “CH”. Every symbol represented only 1 sound and there were no redundant symbols.
So, for example, the word “chimpanzee” would be written out as “CH, short I, M, P, short A, N, Z, long I” for a total of 8 characters verses the original 10. This would also work for German: “äußerst” (extremely) would be written as “long O, long E, S, R, S, T” (please correct me if I mispronounced that. I’m limited largely to Google Translate for my German 🙂 ).
My alphabet ended up containing 30 symbols (not including numbers and punctuation) and did not include characters such as C (which is written as S or K), Q (written as Ku), or X (Ks).
I have no idea how similar/dissimilar Woo’s code may be, but it’s food for thought.
I have no idea if this is helpful or not but because there are extra characters (GREAT ideas for what they could be by the way) it could be that some characters are like switches in that they change between different encryption systems with each appearance. hope this helps!
@ Jamie:
only 14 last letters? with 6 of them dominating that…
@ Phlosioneer:
I agree the approach must be something other than just letter replacement. After some analysis with the program I made and the letter distribution. A simple copy and paste of any text into the program reveals the letter distribution is far different than anything in English. This language also lacks any kind of plural letter as far as i can render as well as zero duplicate letters.
I could pretty easily change the program around to look at whole patterns ignoring seemingly obvious punctuation. Actually as i was writing this I thought I would and did so in a bit i’ll update my name link to the new program to replace whole words with some other text 😛
For fun i went and saw how many times the “words” are used and interestingly enough the deviation is very spread out. Much more so than say if i pasted the declartion of independence in there.
In typical english I’d get something like 4-5 % on words like of, the, and, to but in this instance the highest used “word” is that squiggly line thing which is a & in the character version. And that is only 2% of the words in the text. I’ll post the top 10 with the summerized ending particles.
Words By Percentage:
(-) &: 2.19178082191781
(-) cvm>u: 1.91780821917808
(-) gdod: 1.64383561643836
(-) lehvrn: 1.64383561643836
(-) c=h=: 1.36986301369863
(-) gyaxe: 1.36986301369863
(-) svrnzrn: 1.36986301369863
(-) ibrn: 1.36986301369863
(-) iuoypjwpj: 1.36986301369863
(-) sn: 1.36986301369863
Ending Particles By Count (ignores singles):
(—-) wpem: 3
(—-) svt=: 2
(—-) wlja: 2
(—) zrn: 18
(—) wpj: 17
(—) wxj: 10
(—) z>n: 7
(—) wpd: 5
(—) zr=: 4
(—) w$j: 3
(—) zs=: 3
(—) wya: 2
(—) pja: 2
(—) rnm: 2
(–) pj: 2
I find it very interesting that lower letter count, 2 – 3 letters, accounts for less words than the higher counts 4+. Very backwards from normal english deviations.
Code updated. @Phill I’m not a fan of so much drawing. I tend to be spontanious in my replacements and hate all the erasing. Computers allow me to just replace and remove at will with high speed 🙂
@ nebosuke:
Yeah only 14 last letters pretty interesting right?. I’m using that character version of the text as provided. I’m assuming its pretty accurate given the detail. I did some minor checking myself with no compliants.
In a typical english sentence more than half the words would end with either a e,s,t, or d but that doesn’t appear to be how it works in this language. I think Phlosioneer is on the right track any other solution wouldn’t explain the very strange patterns that emerge when frequency analysis is applied.
I did forget to point out one thing though the alpha character before the o loop thing is the only single character word not to be repeated ever again. It is however used to start various words and in words. If this was a character replacement thing I’d say it was an “A”.
One other food for thought matter is this could be a system of some characters being merged into one letter explaining the seemingly impossible patterns that occur when trying to look at every character.
More importantly though is you can note that nowhere is a single character used as a suffix more than once in the entire script and I find that VERY odd. I can simply not imagine that a suffix like the letter S can be absent from a document.
It’s great to see that some of you are still trying hard to break the code. Certainly, some also don’t want to publish their results and work alone.
Those are some interesting results that have been posted on the second comment page. But of course all the data is meaningless if you’re not able to draw the right conclusions from it.
I sent an email to the NSA about this. Hey, they’re all into decoding, they have those super-Cray computers, and if they have a free minute or two . . . wonder if I or you will hear from them?
The Voynich manuscript is obviously a rule book for a role playing game:
http://xkcd.com/593/
@ Novil:
I know how fun it is to watch people break your codes. I was the head of the FYE page for my school, and I would watch people agonize over seemingly complicated codes where I would simply ceasar-shift an entire essay that didn’t contain the letter “e”. You must be smiling ear to ear right now.
Quick question, though. If you were to try to decrypt the Book of Woo yourself, having not written it, do you think you could do it? Making us wait 4.5 years for a stated solution is quite the declaration about the security of the code.
Thought I would share some interesting finds. After rereading the cipher rules I decided I would look at some classical ciphers like the german ubchi and double disposition among many others to see if they would effect the letter distribution when encrypted. Funny enough even though these are supposedly very difficult ciphers neither of the WW2 style ciphers changed the actual distribution of the letters in a known sentence. All they did was scatter them around a bunch.
Based on this finding I started looking for ciphers which WOULD change the letter distribution and stumbled on a cipher algorithm called “Vienere”. I wasn’t expecting to find any cipher which would change letter distribution after failing on the classics from WW2 but this one actually did it and changed the distribution entirely! I’ll be trying out this one now to try and bring back the letter distribution to “normal” and then a simple letter replacement should finish the job.
Oh and this encryption is supposedly able to be done on pen and paper and keeps punctuation marks!
Ok, according to my “research” (actually just looking dumbly as the wall of text), I can say the following: There are 2 “special” characters that are almost always part of the suffix(es): w and z,
subset of 14 letters I called “singles”, and subset of another 14 letters that are always followed by one from the “singles”, forming around 47 pairs over the whole text, and the & symbol, always followed by 1 of two patterns.
There are at least two words made of two identical parts, rnrn and s=s=. If we assume that the text is English, that makes it quite unlikely that each pair encodes the same set of letters, since the words that match the doubling pattern, like “couscous” seem out of place. Though it may be possible that these words are exclamations or names.
Right now I just hope this is not a permutation cypher, i.e. that the symbols correspond roughly to groups of letters…
Right now thanks to the distrobution of the letters there is very good evidence that this particular cipher is based on the Vigenere algorithm which uses a repeating passcode to encode the alphabet. One of the interesting weaknesses of this algorithm is that since the cipher passcode is repeating then repeating lines of text will also repeat in the encoded form.
The clear case for this is in the third page whe we have the near “the” looking word repeated. Since there is a nearly identical piece on this page the text spacing between them is 25 characters. The other spacings are 40 and 42? Hard to say if the 42 one is legit.
My money is that the cipher passcode is more simple than it is complex. It could be 25 characters but likely is a division of that like 5 characters. I could definitely use some help coming up with a passcode. Also if any of you want to try this technique the spacing is ONLY the letters! when counting do not include punctuation or spacing of any kind.
See this wiki page for information. http://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher
Since I’ve been nearly convinced of this possibility I need to account for the extra letters. I believe that there are not more than 26 letters here but what we see are perhaps capital and lowercase versions of certain letters. Should be some sort of glyph resemblance if its true.
Thoughts people?
Some interesting news as I’m fiddling with it.
If the other letters are just “extra” and perhaps german in origin not encoded then the cipher is 12 letters long “the book of woo” just so happens to fit this but it doesn’t work.
book of sandra comes slightly closer to lining things up but not perfect.
@ Jamie:
10000
@ Jamie:
The more I play around with it the more it seems that there are too many letters and that needs to be fixed first before its possible to continue deciphering.
@ Jamie: I’m not sure of your next step, but if it involves trying to find a pass phrase consisting of 42 letters, please stop this insanity.
@ nebosuke:
unfortuantely for a polyalphbetic cipher it can’t have a passcode with numbers as they are invalid.
@ Novil:
No not 42 but a common repeating pattern is around 5 letters.
My real issue is i can’t tell from the symbols if its really a 31 character alphabet or if its a 26 letter alphabet with some extra’s tossed in there like capitals and what not. Either way I’m too tired to think about it ^^; *collaspes into bed*
@ Novil:
The right conclusions being “Novil first wrote text, flipping a coin each word to decide whether to use English or German version, and each sentence to decide which grammar to use. He then compressed the result and applied RSA encoding, so that the text statistics became close to uniform distribution. Then he thought up all statistics we have noticed, and developed the coding scheme to make the resulting text have those statistics to throw us off the trail.” Note that all your hints are still valid in this case 🙂
So, could you tell us at least if this code can be deciphered using finite state automaton with less than a billion states?
kurokotetsu wrote:
I did think that would probably be an issue, but all the same…
It might be phonetic/syllabic in combination with the 5-bit idea from before – or use ideas from morse or braille; IE most characters code for actual sounds, but there’s one or two that are modifiers for rarer symbols… either acting directly on the symbol that comes immediately before or after them (in which case, you might see it being repeated in an alternating pattern where two or more of the modified characters follow each other), or switching a second “code page” in and out. The former behaviour is how I sort of remember braille and morse working (and UTC-8?), the latter is how Baudot and Unicode-16 (and EBCDIC?) operate IIRC.
By use of this, you can effectively double or treble the breadth of your symbology for relatively little cost without having to overload on characters, so long as either the “secondary” characters are expected to be quite rarely used (for the “direct acting” one – so J, Q, X and Z, rare punctuation, single numbers, accented or dipthong type sounds), or you don’t expect there to be frequent transitions between the two pages (keep the second one exclusively for numbers and punctuation – as per baudot – or use some advanced frequency analysis to make a kind of “Dvorak version” of your language where each page holds the sounds that are most commonly found next to each other, and switching is only for the rarer combinations…).
I’m not sure what the frequencies are in the IPA phonetic alphabet, nor in e.g. the japanese kana. But it’s worth bearing in mind there are only about 46 ~ 52 kana (exact number depends on whose version you believe), plus seven additional modifier symbols that are added on sort of like accents and so can take up an additional symbol space by themselves. This could quite happily translate into a 60 or 62-symbol code space (2 tables of 30 or 31 with one or two spaces reserved as modifiers) whilst even leaving room for an additional symbol to signify “number” or “punctuation”. Remember, we might not even need to include numbers or punctuation marks in this code, and spaces seem to mean what they usually do … at least, that’s the assumption at this point, which may well be false. But designers of real work encodings (for computers etc) usually have to include them; we can therefore cram in a little more information to the same number of “bits”.
IPA, on the face of it, looks rather more complex, but when you break it down there’s less than 50 actual base characters, plus a handful of modifying accents; the full 100-and-some symbol codespace is made up of various combinations of single or paired base characters, plus the accents. And that’s to cover the whole breadth of (human…) language, if you’re looking to represent the sounds of a single language, or indeed a single particularly-accented dialect, you can probably cram that down further and make it fit quite nicely. Most “normal” scripts, after all, have various gaps in them peculiar to the language which gave them birth; hence the need for the IPA in the first place. In fact it’s possible to simplify it still further and come up with a somewhat coarser, pidgin-like representation which is much less nuanced and relies on context and interpretation when read out – Hebrew is a bit like that, AFAIK…
And this is without entertaining the idea that it might only be a -limited- phonetic or syllabic version; IE either splitting out certain otherwise falsely homophonic letters out into the distinct sounds they make when actually vocalised in different concepts (o vs oo vs u vs “long o” as represented by an overscored character in e.g. romanisations of japanese), AND/OR combining two-letter combinations (sh, ph, ch etc) into a single character… such as seen in spanish with ñ (“ny” in english), icelandic (þ, “th”), greek (various ones I haven’t got ascii codes for), or indeed german with ß (“ss”).
Might even be Cyrillic o_O
Or… hey… what script is used in Burma? There’s more than just Roman, Greek/Cyrillic, and Japanese in the world after all. There’s Thai, Korean, and a few others, none of which are 100% directly transliteratable.
*googles*
Burmese script has 33 characters in it… hmm…! How many unique symbols were actually determined in this script, again?
====
Also, in terms of additional layers of brain hurty-ness… Is there any chance that it’s written in the native north american language which gave rise to SOTLMSK in the first place? There may be a translation step into $YOURLOCALLANGUAGE -after- the literal deciphering is through, after all.
(Cherokee? Sioux? Other? I can’t remember what it was, any more…)
Or at the very least, there may be a few words scattered in there that call back to it. What’s the matching word for “Raccoon”, for example?
=======
Is it worth me putting a small-stakes bet on it being, e.g. Cherokee, transliterated into Burmese script, and then letter-substituted using an invented symbol set which either ignores the accents and diacritics, or represents them in a non-obvious manner?
Jamie wrote:
“Sandra and Woo” = 14 characters (as is “Sandra und Woo”)
“Traffic lights sequenced for 50mph are also sequenced for 150…”
14 x 3 = 42
*spooky music plays*
… well OK, I’m not sure how you include a space as part of the encoding passphrase, but it’s a start is it not? 😀
*starts looking for some other relevant text which is 14 characters long without the spaces included*
Phlosioneer wrote:
…or it could be representative of some longer word – a 2 or 3 character one, say. Perhaps “and” or “und” given its general similarity to an ampersand. Or it could be something random like “in” (bilingual), “as” (if we assume english), etc…
Orrrrrrrrrr in order to make it HARDER, he might have therefore made it German on purpose, so that a relatively small number of readers would have an easier time of it. Also, rather easier to come up with these things in your native tongue… particularly if you don’t want your strip’s translator being a security weak point…!
@ Satsuoni:
I think this, amongst other things found on google whilst looking for e.g. “english words with repeating letter pairs”, might be useful.
As for those words themselves, if they’re not some kind of deceptive artefact that’s arisen purely as a result of a kooky encoding scheme, they could be someone’s name? Isis, Lele, Lolo, Toto, Tete, Meme, Bebe, Dodo etc. Even if the idea is “it’s not a simple letter substitution cipher”, if it’s gone through the process of being translated and/or transliterated first, simple names like that, which have no literal meaning, might end up being written in a very similar manner out the other side. (And of course, if there’s some kind of linguistic scramble going on, Toto, Tete, Meme and Bebe are valid words in other languages, particularly french… Wasn’t Sid’s mother French?)
@Jamie@ Jamie:
I tried to find the key lenght for a vigenère cipher with the words ‘lehvrn’ and ‘svrnzrn’. While it first seemed to be 7 characters long, there are also distances that are prime numbers (43, 103, 257 …). No better results with and without spaces+punctuation.
I think to identify the first encryption method it is not that important to know the exact number of different characters, as long as repetition can be found, but I have only rather limited knowledge of decryption.
@Satsuoni
Novil said it can be solved without a computer or other expensive hardware.
@ Melkor849:
That’s basically the same as my original example of a near-english semi-phonetic alphabet … (i’trust, nuncle, it’s the one used by Adam Warren in his Dirty Pair books, in particular “Sim Hell”). You don’t go so far as making a full IPA representation, you just sort of take it back towards Greek almost, nixing both digraphs/dipthongs and letters which make different sounds depending on context.
30-odd letters seems about right. I bet if you were to go back and review you could add a couple to that (…and there’s no reason at all that every last one would end up being used, anyway).
@ Satsuoni:
One has to wonder, given someone else’s comment that the text seems to be made up of simple, monolithic sentences, whether what we see as full stops (periods) are actually, say, commas (which could be inspired by how “continental european” numbers are written with commas as the decimal place and periods as the thousands-separator, whereas in much of the rest of the world (including UK, America, etc) it’s the reverse. And, in this world of commas without periods, your “suffixes” themselves actually represent ends of sentences or other pauses. The stops might even be accents rather than punctuation… o_O
(And if I revisit my crazy “5 bit code actually being a 6 or 7 bit one split up” idea… the “10000” at the top of one page could be an implication of some kind of XOR cypher – not uncommon in digital encryption – that flips every fifth bit… presumably at whichever stage means the output is caused to vary as a result rather than just undergoing a normal substitution… I haven’t had my lunch, so I’m not thinking clearly enough to work that one out. Plus, the limited number of word-ending “letters”, and the “suffix” letters, could be artefacts of the final letter of each word generally being truncated with its least significant bits being effectively zeroed as a result (or changing 1-0-0-0-0 on a loop…). Just by sheer dumb luck and chance (and stats), this might have ended up with such a limited amount, and the ones that are coded for just by the MSBs with zeroed LSBs will end up being the most common… this could also be an “in”…)
@ Mac Johnson:
…they sure do.
Also, there’s 31 of them.
Nice catch.
(or… wait… am I misunderstanding your post?)
@ nebosuke:
I don’t know (I would guess somewhere around .5%-2%), you can check the German Wikipedia for text samples.
@ Novil:
Give it to CHARITY? Oh no, not charity! What has charity done to deserve it?
Nah, just kidding, that sounds lurvely. Just as long as it’s not the Salvation Army, which is purely EVIL.
@ tahrey:
Spaces are not used as previously said so its still only 12. The passcode would look like.
thebookofwoo (note that capitals are not used either)
My current pet theory after a night of sleeping is to replace the “extra” letters with something anything really and try some deciphering codes on them. If I’m successful I’ll end up with text which is mostly correct and some typo’s.
Of course once I do the decryption it will be a simple letter replacement thing afterward. The goal at least is to see normalized letter distribution after trying a passcode.
I just got a mail by a reader who made the first significant breakthrough in deciphering the Book of Woo!
@ Ghusk:
I think we are seeing prime numbers in the cipher key length because of the extra symbols. I want to try some tests where I’ll just replace the unknown 5 extra’s with something random from the alphabet and then try and decode it on the tool.
I expect if i do this the result will likely be normal plaintext english but with typo’s. I think i can use this to figure out if there are any unencoded symbols in the text.
@ Novil:
Already? Aww 🙁 Share? 🙂
By the way, I am almost certain that there is a typo on the fourth page, but not quite.
OK Quick summary of what we have found out so far: the code is probably in both German and English, the German alphabet has thirty characters the code alphabet has 31. there appears to be no null characters except for the possibility of the loop with two circles. anything i missed?
Well, here is a scale off a red herring: this alphabet can be divided as 1 ( (8×4,1) (8×4,1)). Don’t know if that means anything important.
A couple more interesting stats. This is a count of the number of times a word appears with a certain letter count.
The book of woo:
(1) letters 9
(2) letters 11
(3) letters 27
(4) letters 109
(5) letters 33
(6) letters 33
(7) letters 67
(8) letters 18
(9) letters 29
(10) letters 16
(11) letters 9
(12) letters 3
(13) letters 1
Declaration of Independence:
(1) letters 8
(2) letters 108
(3) letters 127
(4) letters 77
(5) letters 64
(6) letters 66
(7) letters 54
(8) letters 30
(9) letters 29
(10) letters 28
(11) letters 16
(12) letters 5
(13) letters 4
(14) letters 5
As you can see clearly two letter and 3 letter words dominate the English language. In the book of woo the only dominating letter count is 4 then followed by 7 letters. I think this represents some evidence that there are letter pairs which are really just 1 letter.
Thoughts?
To be honest, I think things are getting a bit over complicated. The past few comments seem to be nothing but replacing gibberish with gibberish.
Ecodude wrote:
Yeah. If you begin to see prime numbers in the cipher text and consider 42 character long pass phrases, it’s probably time for another approach. *slightly exaggerated*
This really does seem like a type of retelling of the Prometheus myth from the perspective of raccoons…
Tried a 26 character shift cipher anyway, with no results that could be words in English or German .
@ Ecodude:
To be honest, I’m taking everything with a bowl of salt until another comic or two is posted. Then, whoever is still posting here is actually serious enough with solving this cooperatively to have a mature, consistent, back-and-forth discussion of the pros and cons of different approaches. For now, there are just too many people posting and not enough certainty that they’re serious to bother responding with critiques or advice beyond that which would help the community at large too.
@ tahrey:
Dude, there’s 174 comments and counting on the english page, and the german page leveled out at 14 two days ago. Novil wants *someone* to solve it; and considering Novil’s announcement that someone has sent him (Is Novil a guy? I don’t know, sorry.) actual progress, odds are in English’s favor.
Is there a reason this comic isn’t viewable on an iPhone or is it just me?