Have you ever heard of the Voynich manuscript? It’s a mysterious book, possibly created in the early 15th century, that contains weird illustrations and text written in an unknown script and language. Since its rediscovery in 1912 by Wilfrid Voynich, it has eluded the decipherment attempts of generations of cryptographers. The Voynich manuscript is a fascinating piece of history that has inspired many novels, games and films. Amateur cryptographers can find the latest news and research on the Voynich manuscript and other uncracked ciphers on Nick Pelling’s blog Cipher Mysteries. He’s also the author of the readable non-fiction book The Curse of the Voynich.
To celebrate the publication of the 500th Sandra and Woo strip, I have decided to publish “my own Voynich manuscript”. So here it is, The Book of Woo! As you can see, it resembles the Voynich manuscript in several ways. But of course we couldn’t create 240 pages, 4 had to be enough. Unlike the Voynich manuscript, The Book of Woo definitely contains sensible information that can be deciphered. I guarantee it ;-). And I will pay the person who is able to provide a decipherment that’s sufficiently close to the plain text a reward of $500. Send your decipherment attempt(s) to novil@gmx.de. I would also love to hear about your general ideas or statistical analyses that you carried out. There is no deadline. I will not publish the solution until at least strip #1000.
But be warned: It’s a huge challenge and I don’t expect to receive a valid decipherment at all. It’s primarily a work of art, not a puzzle for the general public. I believe that only experienced and dedicated code breakers have the chance to succeed. A lot of time was spent on the encryption. If you think you can simply carry out a frequency analysis on the letters and be able to reconstruct the English or German plain text this way, well, that’s just a waste of time. However, to make things a little easier, I want to give you the following hints:
- The encryption isn’t based on an algorithm only suitable for computers which executes a loop 100 times or something like that.
- The encryption isn’t based on some sort of device or mechanism that is hard to get.
- No “classical” steganographic method was used since that would just be impossibly hard to crack.
- The plain text is some sort of literature, as one can guess from Woo’s comment and the illustrations. A lot of time went into the plain text as well, it’s not just a copy of the first page of Rascal or something like that.
You can download larger versions of the four pages of the Book of Woo here:
[Update: 10 August 2013] Everybody who is seriously interested in deciphering The Book of Woo should read the comment section. There is a lot of interesting information in it.
[Update: 31 March 2015] The Book of Woo Wiki, now maintained by our reader Chris, also contains valuabable information for anyone who’s trying to break the code. In case the wiki should go offline sometime in the future, I created a complete backup of the wiki’s content on 31 March 2015.
In other news, the winners of the Sandra and Woo and Gaia fanart contest 2013 have been posted.
Thanks to everyone who participated!
- Sandra: Hey, Woo, what are you writing?
- Woo: Oh, just a little story.
- Sandra: Really? Can I have a look?
- Woo: Sure, I’ve just finished it.
- Sandra: What in Voynich’s name…?!
|
Hi all..
I know this may be coming a bit late for some folks, but unfortunately I didn’t discover this strip until fairly recently. I’m hoping not everyone out there has given up already..
In any case, if there’s still anybody out there interested, I’ve decided to set up a Book of Woo Wiki!
I originally started this as just an attempt to organize my own notes about the current state of things and potentially useful tidbits of information as I read through the forum here, but then I figured that it might be a useful resource for other people too. Everyone is welcome to contribute..
It currently has a summary of what is generally known/suspected/etc, as well as copies of all of the different common transliterations. I’m planning on going through in the next few days and fleshing it out with more info pulled from the comments here, as well as some of my own work. Any comments/suggestions/corrections, please let me know!
Hiho.. Next order of business :)..
Given what has been learned by Satsuoni about the alphabet duplication, I would like to propose a new character-mapping/transliteration of the source text:
I have taken Ryan’s original transcription (with a small change of swapping the meanings of “s” and “$”), and then have taken the “second” set of characters, and mapped them to be the same letter as their equivalent in the other set, but upper-case instead of lower-case. This results in (what I think are) several useful properties:
1. It can still represent the full range of glyphs in a one-to-one mapping (like Ryan’s original), but:
2. All word characters (except “&”) translate to ASCII letter characters. This makes it much easier to use with existing text-processing tools and libraries, and also (in my opinion) makes it easier to read (and detect patterns) by human eyes.
3. Reducing the character set by “Satsuoni-folding” is a simple (and intuitive) matter of simply converting the text to either all-lower or all-upper case. (I would recommend a general convention of lower case, but it probably doesn’t matter that much)
Anyway, this is the mapping I’ve been using myself for a little bit now, and I’ve just posted it on the wiki (http://bookofwoo.foogod.com/wiki/Transliterations#Foogod) for anyone else who wants to play with it.
(I have also just recently put together a font file of all of the Book of Woo character glyphs using this mapping, which is also available from the wiki)
The summer is coming up, and I managed to secure grant money working on AI learning and database analysis. So that means that I’ll be able to set loose some serious pattern-recognition software loose on the cipher and see what it comes up with.
@ Foogod:
I, for one, will definately be using that wiki, and post anything I find.
I have… well, as close to given up as to make no difference, I guess.
You might want to add to “what you know” the tips I have received from Novil so long ago. They have been included in my comments, but here is a whole list for clarity:
– As you can see there are no super complicated encryption steps involved, just a good number of them.
– Maybe it’s just my personal preference, but a transcription where = is the most common letter seems unintuitive to me.
– Another reader posted a very curious result pretty early. It’s still present in the new transcription, but not many code breakers seem to have understood its significance.
– Almost nobody seems to consider a particular method of obfuscating text.
This was received on fourth of August, so the referred comments would be before that.
As you can see, it is very cryptic.
Also, there are 16 symbols in the transcription, since ‘&’ is included. Roughly divided into 8 consonant-like, 5 vowel-like, 1 intermediate and one suffix-maker (‘z’), and &
I’ve been checking back here every now and then to see if anyone makes a breakthrough.
Are we just going to ignore the decipher I came up with around page 5?
@ Charles:
Check the wiki. http://bookofwoo.foogod.com/wiki/Main_Page
I know some things about classical cryptography. I’ve studied it since I was in 8th Grade and I am now in my twenties. One thing that has made me laugh is seeing people so easily dismiss the idea of it being as easy as a substitution cipher. And yet, after studying the pages, it seems to be simply that. It’s like a geometric simple substitution cipher, Pigpen cipher, which replaces the plaintext with ciphertext symbols, only this ciphertext is a lot more elaborate and the symbols don’t have a system like the Pigpen cipher does. In this way, it acts like the simple monoalphabetic substitution cipher from Simon Singh’s Cipher Challenge, which replaced the plaintext letter with another letter from the alphabet randomly . Now, the thought did come to mind that the cipher used might be like ASCII, replacing uppercase letters, lowercase letters, numbers, and punctuation marks with a symbol, which would mean there are somewhere around 95 ciphertext symbols, making it much more complex than a simple substitution cipher where there are only 26 ciphertext symbols. However, this idea was immediately rejected when I saw the quotation marks on the first page, the colon on the second, the exclamation points on the last page, and the periods on all four pages. I firmly stand behind my belief that this is a simple graphic substitution cipher.
@ jackoftrades:
There are more patterns in the text than such a cipher should produce. First, there are two alphabets with a character that switches usage. Once that is removed from the text, letters alternate between three sub-alphabets in a very complex but clearly meaningful pattern. Pigpen cipher may be a step – but there’s a lot more steps between the cipher text and plain text that need to be figured out too.
Phlosioneer wrote:
Ah, I see something more or less the same as what I discovered has been added to the working theories section.
Here’s hoping some progress can be made.
@ Charles:
Indeed. This cipher is tricky to crack, but it will eventually crumble under the enormous weight of our expert cryptanalysts! All… 3(?) of us.
Just my two cents here. I have little skill in these things but was curious about two things.
1) Has anyone tried using German or English phrasing for a match against the repeated phrases on page 3? There are two sentences with nearly identical characters. Since we know that the final product is a story, the pictures probably give some sort of clue as to what is happening. Maybe if we can crack just the one phrase the rest can come from that?
2) What progress has been made looking at the characters as syllabic instead of character translations? German and English share a common ancestor language and therefore share a number of the same sounds, so both languages could be used in the original for a single phonetic cypher.
Sorry if I’m introducing red herrings. I’ve only just discovered this and I’m now slightly obsessed.
@ neoladycasa:
Everything’s a red herring unless it’s right.
That being said, I’m more concerned with “words” that are the same two characters. Until there’s an explanation for that, I’m stuck.
Charles wrote:
It is quite vexing. I’m making educated guesses about what parts of speech words are using repeated phrases; but until I have something substantiated by evidence, I’m going to keep all that private.
One thing I am rather confident about at this point is that words are composed, at least in part, of sub-words tacked on the end to add meaning. E.g. the 3rd person, plural, past tense of “to be” tacked on the verb “run” might mean “they ran”.
Has it been considered that the symbols that indicate a shift of alphabet could actually be part of the alphabet? That is, a particular letter, when used, shifts the alphabet and what are currently considered shift symbols also have a letter meaning?
This is my first post here and although I tried to read all the previous posts I might have missed something. So I apologize if I’m stating something obvious or previously posted.
As it stands the original alphabet of 31 letters is reduced to 2 (identical) alpabets of 15 letters with 2 (corresponding) letters from each alphabet indicating a switch to the other alphabet.
Somehow these 15 characters (or 14 if the switching characters are not part of the text) need to be transformed to 26 characters (or thereabouts).
Could it be that some letters are represented by just 1 character and other letters are represented by 2 characters?
To be more specific: 8 letters could be represented by 8 characters, 6 letters could be represented by a combination of 3×3 characters and 12 letters could be represented by a combination of 4×4 characters. That gives a total of 26 letters.
The combination of 3×3 characters only can give 6 letters because nowhere in the text there are two identical characters following each other. Likewise with the combination of 4×4 characters.
Am I making any sense?
Well, I can answer myself that I was not…..
Hello,
is the author still reading this forum?
I have got two questions which I think do not reveal to much and do not make it any easier to solve it:
1. Is the cipher deterministic?
I mean that for example using random anagrams of words makes the cipher nondeterministic since you can not decide whether “amry” is “army” or “mary” and then even if you know the password/key there would exist several possible plaintext. On the other hand deterministic cipher is for example columnar transposition or monoalphabetic substitution, when if you know password then there is just one correct plaintext.
2. Can you read the plaintext straight away from the ciphertext or you need to rewrite (manipulate) the ciphertext once or several times to get to text that can be read?
For example monoalphabetic substition is such a cipher that can be read straight away, if you got used to the substitution key, no matter how extravagant symbols you use. On other other hand columnar transposition is unreadable even if you know the key, you need to rewrite (manipulate) it twice first to be able to read it. Double columnar transposition needs the ciphertext to be rewritten four times to be able to read it.
Just posted this to the working theories page on the wiki:
W appears to be used as a dash connecting smaller words to make larger ones. Several four and three letter standalone words in the text appear as a part of larger words within the text. Smaller words isolated and found used as a prefix before W on the first page of The Book of Woo include UAD, LYOJ, LDSD, and GDOD.
W may also be a space character introduced at some point during encryption to throw off word length analysis by creating a large word which is actually two short words in the plain text.
Viktor wrote:
Contrary to your assumption, this piece of information would reveal *very much*.
Statistical analysis has shown that the cipher does not behave like normal English/German text. This should answer your question.
Thank you for your answers.
Novil wrote:
There are ciphers that mask statistical properties of a language and anyway plaintext can be read straight away from the ciphertext, if you know how to read (if you get used to the cipher), the simple substitution is not the only such cipher.
But I take your answer as you cannot read the plaintext straight away from ciphertext even if you know how the encryption was performed.
If I wanted to write a secret book by hand (not using computer) I would use such a cipher that could be read straight away form ciphertext. I do not see too much sense in writing a secret book and then if I wanted to read it I would need to spend hours to decode just one page. This is also my opinion on Voynich manuscript, if it indeed contains any meaningful information then I think it can be read easily without manipulation of ciphertext, once you know how encryption was performed and how to read it.
The fact that plaintext can be read straight away from ciphertext does not mean that the cipher must be easy to crack.
Viktor wrote:
Your question is not very clear – nobody stops you from readin plaintext from columnar transposition, if you train yourself to read letters in the different order, say, every n-th letter with a given offset – a task that is easier than learning a new set of symbols (or new meaning for the old ones)
The difference between this code and the normally used ones is that this one was specifically designed as a piece of art, so reading it was, presumably, not a direct intention, especially compared to difficulty. As was mentioned by Novil several times in the above comments, the code involved several coding phases (about 5?) (that can be done by hand), of which we have decoded, as far as I can tell, 1.
Satsuoni wrote:
This is not true. It is beyond human capabilities to read (irregular) columnar transposition straight away from ciphertext even if you know the key (longer than, say, 10) that was used for encryption. It is not like to read every n-th letter as you said, this is not columnar transposition. On the other hand as a child almost everybody made his own symbols for letters and was able to read his/her encrypted texts, that is pretty easy. Even if there are five phases of coding, the cipher still could be such that it would be possible to read it straight away.
But discussing “what is possible to read and what is not” was not my intention. I just wanted to know whether this particular cipher is such that it can be read straight away if you know how the encryption was performed. It would be nice if it was like this, but nothing is wrong even if it is not. I am not complaining.
My apologies if this theory has already been thrown out there and shot down – or even so obvious it didn’t need mentioning – but in any case I didn’t find it in the wiki anywhere (I would have posted this there, but the relevant discussion page hasn’t been used since May). If it’s at all worth taking into account, it would explain why (POSSIBLE SPOILER) there are only, for all intents and purposes, fifteen +1 characters in the cypher (end spoiler).
What if the translation is phonetic? For instance, a hard C and a K being represented by the same glyph, or the T H string by a single glyph akin to the thorn rune, in German V and F could be the same glyph, and so on. This may also account for the curve of the letter frequency falling more or less between both English and German, as such a simplification would tend to iron out any extremities in either language.
Although the word lengths make this next possibility more unlikely, it may also be worth considering that the vowels have been condensed or even eliminated, as in an abjad.
Again, I apologize if this is an old idea. I just figured since there’s already something of a collaboration happening I may as well throw in my two cents’ worth.
Viktor wrote:
Um… First, I must admit that I didn’t, and that I, in fact, cannot “read the code straight away” in case of simple substitution if the substitution uses alphabet I know, it is very distracting.
Second, I meant distinct phases (different, or at least non-commutative(?)). Like, for example, “write every symbol if Huffman-coded binary, then take every octet, square it and swap with a neighbor chosen on the number of 1-s in it, then take every 4 bits and encode them with 16 symbols, then make an alphabet swap for half of text.” Anyway, you are right and that is not the point. There is little doubt (in my mind) that this code involves transposition in some way at least on some stage, but possibly on syllables/roots rather than lone symbols.
S10nn4CHu1GH1M wrote:
It has been suggested, but not followed through, AFAIK. Personally, I don’t know languages in question well enough to work with such an idea, so, if you can come up with reasonable mapping, go ahead.
Did it occur to anybody that each symbol might not represent a letter, but a whole word, phrase or cluster of letters? Or at least, a few letters may represent important words, like one for the goddess of raccoons’ name. Anyway, I think if that is not the case, I am mostly a fan of the ’26 english letters and four German extra ones (don’t know how to make them on a keyboard, but the ones with two dots above them and the one which means a double ‘s’). Other than that, I have to point out the letter 10000 above the picture of the half dead raccoon, although I don’t know what it means.
The story is probably going to be something along the lines of: ‘The goddess of raccoons was being harrassed by wolves and hawks, so climbed a tree to escape. She then gave cunning and agility to her subjects to help them likewise outwit the terrors of the world. But yet another terror came when the humans got involved, capturing raccoons and making them as good as dead. However, when fire breathing birds attacked, it was the raccoons who were sent on some kind of mission in return for their freedom?’
Alright, I give up, I have no idea how I would go about this.
@ Kerilithia:
except it may be in german. 😉
10000 is a year, I think. Not much part of the rest except to mark date.
@ Garrett Williams:
1 year later, the game is still on.
Heh… Maybe coments will help, but too many of them.
Somehow i didn’t notice this first time I read it first time.
I don’t know if it will help, but i guess it’s just made with English and German. Probably one paragraph is in one language, but i could be wrong. It would make text possible to read, and same time “frequency analize” for full text would be useless.
The switching between sets of characters sort of reminds me of changing positions when playing an instrument. Maybe it’s like that and an alphabet is used to express something the other can not, like concepts or names.
I’m apologize if this is an old idea or blatantly wrong, but maybe it helped?
A couple of thoughts here:
First off, Initially I thought maybe the character set changed based on English or German but then, looking at where the switches lie, there were a number of similarities….
I noticed that the word ” lUOYPJ” shows up an AWFUL lot, in both iterations of the language. What’s doublely unique about this is there’s ALWAYS a switch when this happens. It’s always lUOYPJ or Luoypj.
This makes me think that this is an important name of some sort, or even just one of the character’s names. What if the switch happens when we meet a character in the story?
Looking at some of the other “Switches,” it seems the pair GYAXu”/”gyaxU” and “uad”/”UAD” follows the same rule.
“Fua lyoj”/”fUA LYOJ”, YWLu/ywlU, sUPJ/Supj and “WPua” “wpUA” have more letters attached to one or the other side.
LQAuw is unique, in that it only appears singly – no reverse pair
Anyway, could these have significance somehow? What if, out of reverence, when a person is mentioned, we get a tense change?
@ Rourkie:
That’s because ‘U’ and ‘u’ ARE the switches, each ‘U’ changes the case. Every word containing letter u will be on a switch
@ StarGateTABC:
Oh, right. Of course. **Feels ridiculous now*
Still, the frequency in which Luoypj shows up near the end could still be a hint, right…? Maybe? (Probably not…)
@ Satsuoni:
I just joined to the team with actually some knowledge from studies about cryptoanalysis.. And I’m not offended. I know that i already SHOULD crack it. It shouldn’t be that hard. With steps you already have done on wiki, this hard work and some think boring steps, I was sure it’s easy now. And i failed. After few hours last night, when i couldnt believe that i failed – I failed.
Something is wrong with this…
I believe i made some progress with things i get from you. I didnt’ even find Potbelly Hill nor Bauchigen Hugel, so maybe I did this progress wrong… But can I add what i did to wiki? Where i should add this, I’m not sure about things i done, we didn’t talk about this…
Damn, can’t I edit my post?
I have a question… is there a way to show all post on the one page? Right now I’m thinking about reading them, but I need to print them all for this, i couldn’t force myself to read more than one page on the computer.
Woah…
One day and some new stuff on wiki i didn’t read. I guess i was all wrong, you guys go in totall other direction.
Rourkie wrote:
Actually it appears right behind & symbol almost every time. If it’s correct that & is a full word (maybe the seeothalamwhatever’s name) it might be an adjective. Also it’s usually suffixed with either xj or pj, so they might be some kind of camparation suffix (cold -> coldER -> coldEST).
That actualy makes sence, now that I think about it.
I got a feeling that if those theories which say there are 2 alphabets used could be right. Using the key letters to know when to switch could be right but instead of just 2 English alphabets perhaps one is English and the other is a German alphabet. Then in the occurrence of the switch letters could designate when to use which alphabet.
@ g:
Impossible! Rather, it appears to be made up of glyphs! =P
The story being told seems to be…a story being told.
I noted that the first four words were bolded, with a colon following–
As if it were a title.
The next two words are in a lighter font, as if italicized–like a subtitle.
(E.g. “The Story Of God: Her Struggles”.)
The rest seems to be setting up the story, giving a one-paragraph introduction.
It includes a word in quotation marks, after another colon, and is likely either a character muttering a short phrase or the name of something.
Then, you see near the bottom of the page a second paragraph. This is something I’d think was almost poetic, like a children’s story, in placement and formatting.
The story continues on the second page, setting up a new scene (this the extra colon), and elaborating on it.
And then, yet again, there’s two sentences near the bottom, as if telling a poem.
The third page uses a different formatting, with three sentences (one of them short) used to produce a vivid set-up of the imagery.
The bottom of that page is basically either a list of things or a bunch of verses.
And the fourth page seems to serve as the epilogue, wrapping things up with an exclamation mark. The final sentence (broken into three lines) could very well be along the lines of “That’s The End!”
One thing that should be noted about this is that if you observe it, it seems to actually be telling elements of Woo’s story. The third page has a vivid image of the hellish cage Woo was trapped in by his first owner, and on the other side is the celestial home of Lily yet with signs of human technology, likely symbolizing his two homes.
Another thing that occurs to me is that Woo, being a raccoon that is fluent in human English, likely draws from it in his writing, but that his writing may also take from the forest environment. Basically, I don’t think it’s meant to be much of a code, so much as it is just normal writing. Writing that happens to appear as if a code, but is actually just creative writing.
However, this of course does nothing to actually solve the story.
But that is the basis of my own theory–just a story using a language we don’t know, not an elaborate code. The writing might use a clever trick (on/off switch, as confirmed), serving to give the writing variety, but I like the idea that it’s mainly just a way to avoid repetition in writing the tale.
The good news is, I have found a pencil-and-paper cipher that produces a frequency distribution that looks very similar to the one we’re seeing. The bad news is, I haven’t yet figured out how to break it.
I have a Perl script that can take arbitrary English-language text (modifying it to also do German would not be difficult) and enciphers it. I’ll upload this to the wiki some time soon — probably not this morning, because I need to get ready for work soon, but maybe tonight or tomorrow.
The short version is, first you use something similar to a polybius square, only slightly larger, to reduce the number of symbols to maybe eight or so, with the side effect of flattening the frequency distribution significantly (among other things, because you can assign more than one box in the square to really common characters). This encodes each character as a digraph, row and column coordinates in the square. Then, as a second step, you use a one-to-many substitution cipher, arranged so that one possible coordinate has only one character it can be rendered as, another has two, another has three, and so on, and you choose from the possibilities for each in a way that favors ones earlier in its list of possibilities.
This may not be exactly what was done to the Book of Woo. I probably have some of the details slightly wrong. But I suspect that examining my cipher may lead us to think in the right kinds of directions, and perhaps a method that can break my cipher, as soon as we find one, can be similarly applied to the Book of Woo cipher — or the layer of it that we’re currently looking at, anyway.
Like I said, I’ll upload my enciphering code to the Wiki soon. Maybe we can make some progress on this thing.
@ Jonadab:
I’ve made my working theory, and the Perl script that implements a cipher that may be similar (or, at least, produces a similar frequency distribution), on the wiki, here:
http://bookofwoo.foogod.com/wiki/Working_Theories#Polybius_Square_plus_Multi-Option_Substitution
Translation:
(Accuracy of Translation presumed to be within 84%)
Page 1:
Teh frist chaptet: Goddess is seeing shadow wolf!
Seeolamaskay vas valking a park through house when she seeded shadwow wolfes and eagels!
“We eated you Seeolamask!” The shadow eagels and wolfs crieded!
“You can tries to do taht!” Slotahmathkay sayded, and katateed them in theirs hearts! But there vas more shadwo wolps too! So she has to run to oter place!
“Oh no they is going to eat Woo too, Nooooo!” Slayohathalcasque sayded, and she climbded up tree!
But weasils ten steal teh sun and brun down tree! And seeliokaska paws is burnded!
Page 2:
Caprer 2: Seeohlalay finds powars! Or DOES SHE!?
Seeohthalkamask has to finded powars, so she can fighteded teh evil shadow wolfves! So she gather teh hand of vecna, the sickle of ra, and teh wifi signal of ixbalanque. So she gains teh powars of lazors!
Seoothlakay shootededs teh lazors at teh shadow volfes and they is died, but oter raccoons say. “You no our gods no more Sooathalokaski! We now is worship JESUS!” And Seeothalayka vas RAGE!
She shooteded all of teh blasfemars vith lazors and stuff, and tehyd all died and all vas tobe happy.
Page 3:
Chapper 4: Teh trilbulations of Gods!
Seeoothalmaske now is uncontesteded supreem gods, but she not happy, because evil sientist god captour her and puteded her in evils cage!
He perform evil experiment on Sleeothacasque, and turn her into sekelton! But becaz she godis, she REGNARATE!
Seeloothamakakay not like be captive of teh scaintist god, so she run to PARADAIS!
Page 4:
Capter 4: TEH FALL OF PARADAIS!
Seeothalmaskakay rans to paradais, and teh hunam god sayded to her! “You can no stay here seelothakay!” And Simeothakakay sayded unto him “But I must or evil sience guy captored me!” And human godz not like sience, so he say “Ok u cans stay but u must make SEKS vith me!”
Seelothamaskay thoughted it vas ulgy to make sex with human god person, but she has to SAVE WOO OR HE EATED BY SHADOW WOLVES! So she say “Oki” And she droped her tail, and tehy maked tse sex, and it GOOOOOD!
Ten Slothamaskakay bringded all racoons into Paredise but one eatded teh knowledge fruit, and teh hmuan god sayded! “Now I throw u all out!”
And he sent vultureman to throwded racuns out of paradises, and then felt beast from LotR breathded fires on paradais and all vas layed to Burnination!
z separates tokens which can be “words” themselves. Thus it is a space or some prefix/suffix indicator. A strange case is the word /z=r=i= in the fourth line. After converting z into space appear a single /, which is never seen alone.
Maybe it is a typo and it actually is vz=r=i=? Since both v and =r=i= are words.
I have found an intriguing tokenization.
Most tokens are of two characters, with few single-character tokens.
Character z is always a single token.
Every two-character token ends with one of: = n v / b
Each of these five characters can also be a single-character token.
And the special character m, which can be a single-character token or begin some two-character token.
The existing two-character tokens are (in the first line I give the ones that can make words by themselves, maybe with repetition and maybe after z-removal):
rn, >n, s=, r=, sn, mn, t/, cb
>=, ib, #=, h=, i/, rb, c=, sv, t=, m/, #n, hv, c/, s/, #v, in, m=, mv, sb, cv, i=, rv, #b, iv, hn, >/, >v, #/, r/, tv
The existence of five special characters hints that they could be the vowels.
Then, that the tokens end always with a vowel hints to some phonetic encoding.
The special character m never starts a word, but it can end them. It could be a modifier of the previous token.
The token z is very special; it never starts or ends a word, so it could even be a space.
As the “vowels” always end a token and m always starts a token, there is not ambiguity in the parsing of the text into tokens.
Here I paste the tokens separated by apostrophe (‘):
i/’hv’rn sv’rn’z’rn mn’sn v’z’mn: i/’hv’rn sv’rn’z’rn c/’#n >/’#=’z’#/’m iv’hn. h=’m’z’r=
sv’rn ib’rn’z’r= /’m=’z’rn’z’t/ v & >n’sv’t=. cv’m’>/ m=’m’>= h=’rb in’#v >n’mn’z’r=
/’m=’z’rn /’>v’m’z’r/’m’z’rn #/’m iv’hn’z’rn #b’cv v’z’#/’m iv’hn. #/’m iv’hn’z’rn c/’#v
v’z’mn: /’z’=’r=’i= v’z’s= s=’s=’z’in’m. /’m= sv’rn’z’rn >=’m= v’z’mn’sn “sn c=’h=”
c=’h=’z’t=’m i=’s= =’rv i=’s=’z’rn c=’h=’z’c=’m’z’s= >n ib’m/’z’>n >n’mn =’r=. cv’m’>/
c=’h= cv’m’>/ c=’h=’z’s= rv’cv’z’r= /’m= sv’rn’z’rn rb’#n’m v’z’#v’m’z’>n s/’rn’z’v’m
>=’#=’r=. c=’h= >/’m=’z’v’m i/’m=
>/’m=’z’r= i/’m= /’m=’z’r=’z’#/’m
iv’hn n’#v’z’rn s/’rn =’r=
v’z’/’m= sv’rn. cv’m’>/
ib’m/’z’r= /’m= r=’>v.
cv’m’>/ >n’sv’t=’z’r=
m/’#= ib’m/’z’rn >=’m=
v’z’m=’in’m c=’h= /’m=’z’r/’m
c=’m’z’s= >n’sv’t=’z’r/’m
c=’h=’z’s= ib’m/.
>n’mn’z’>n c=’h=
ib’rn’z’r= /’m=’z’rn
#=’s= c=’h=’z’s= >n’mn.
s= >n’mn’z’rn #=’in
ib’rn r/’m’z’s=’z’>n
#=’in ib’rn’z’>n
#b’rv r=’i/’z’>n
cv’m’>/ =’rv.
—
i/’hv’rn sv’rn’z’rn c/’#n >/’#=’z’#/’m iv’hn’z’rn s=’s= sv’rn’z’>n i/’hv’rn =’rv’z’>n
in’m’>n’m’z’>n & >n’sv’t=. sn i/’hv’rn’z’rn i=’s= =’r= v’z’i/’hv’rn =’m’cv.
m=’in’m’z’mn’z’r= sn i/’hv’rn’z’rn i=’s=:
sv’rn ib’rn’z’rn >n’rn’m rb’#=
v’z’i/’hv’rn =’rv #v’>v’#v’m rb’#=
iv’hn’z’>n /’m=’z’>n sv’rn ib’rn.
c=’m’z’mn’z’r= h=’h= iv’hn’z’rn
c=’h=’z’r/’m rb’#=’z’>n’z’sn sb’cv’z’rn
in’cv’rv’m v’z’m=’in’m’z’>n s=’z’>n
#=’in ib’rn. m=’in’m iv’hn’z’>n sv’rn
ib’rn’z’rn’z’r/’m r=’h=’z’sn sb’cv.
i/’hv’rn =’rn’z’>n & >n’sv’t=’z’rn hn’rv
c=’h= i=’s= sv’rn ib’rn. i/’hv’rn’z’rn
#b’cv v’z’mn’sn’z’>n /’m= iv’hn’z’r=
i/’hv’rn’z’>n in’m’>n’m =’rv’z’>n &
>n’sv’t=’z’rn =’r=’i= v’z’s=’z’in’m.
c=’h= >/’m=’z’v’m i/’m= >/’m=’z’r=
i/’hv’rn’z’>n in’m’>n’m’z’>n &
>n’sv’t=’z’rn =’m’>= v’z’>=’#=’r=.
i/’hv’rn’z’>n & >n’sv’t=’z’rn #v’>v’#v’m sb’cv
v’z’rb’#=’z’rn #v’>v’#v’m rn’rn v’z’h=’h= =’m’cv.
—
cv’m’>/ in’#v’z’>n m=’m’>=
h=’rb’z’r= i/’hv’rn’z’mn’z’rn
sb’in’z’r/’m’z’s= >n #=’in
ib’rn’z’>n r=’i/’z’>n cv’m’>/
=’rv. =’r=’z’rn =’m’cv.
c=’i/ cv’m’>/
in’#v ib’m/
10000
r=’z’t=’m i/’m=’z’h=’m’z’rn
#=’s= c=’h= in’m’>n’m’z’v’m
in’m’>n’m’z’r/’m’z’s= >n’z’s=
mv’m=’z’>n n’m’i= ib’rn.
t=’m’z’rn =’m’>= v’z’h=’m v’z’=’rv. =’rv’z’r=’z’rn r/’m >/’#= =’m’cv.
t=’m i/’m=’z’rn c=’h=’z’s=’z’sn. sn’z’rn c=’h=’z’s=’z’>n t=’m i/’m=.
t=’m sb’cv’z’rn >n’rn’m n’#v c=’h=’z’sn.
t=’m sb’cv’z’rn >n’rn’m /’rn’m c=’h=’z’sn. t=’m rn’rn’z’rn >n’rn’m
v’z’i/’m=’z’r/’m >n’z’sn i/’hv’rn. sn i/’hv’rn sb’cv’z’rn #=’s=’z’t/
v’z’>n’rn’m >=’#=’r= v’z’s/’rn’z’r/’m c/’s/ ib’rn s/’m’ib’c=. i/’hv’rn sb’cv
=’m’cv’z’rn #=’s=’z’t/ v’z’>/’m=’z’#n’m r/’m’z’s=’z’>n #=’in ib’rn’z’c=’m
#n’hv’m’z’v’m #n’hv’m’z’>n #b’rv =’r=’z’>n’z’t=’m i/’m=’z’mn.
—
t=’m i/’m= sb’cv’z’rn’z’r/’m s= =’m’cv v’z’i/’hv’rn’z’>n & >n’sv’t=. c=’i/ cv’m’>/
iv’rn’z’>n’z’s= =’rv’z’r= h=’i/ r=’h= n’#v’z’v’m h=’i/ s/’rn n’#v’z’rn /’>v’m v’z’c/’s/
b’c= c=’h= >=’rn =’r= c=’h= i/’hv’rn >=’rn’z’>n m=’m’>= rn’rn. i/’hv’rn’z’rn =’m’>=
v’z’s=’z’in’m. sn i/’hv’rn’z’rn c=’h= n’m’i=’z’s=’z’>n #=’in ib’rn c=’h= n’m’i=’z’s=’z’>n
#n’hv’m’z’v’m #n’hv’m’z’>n #b’rv =’r=. sn i/’hv’rn’z’rn #=’s= i/’m= sb’cv. s=’s=
sn’tv’z’v’m s=’s= sv’rn’z’rn =’m’>=. c=’i/ cv’m’>/ #=’s=’z’r= i/’hv’rn rn’rn’z’rn
iv’hn! cv’m’>/ mn’z’#n’m’z’r= m=’m’>=’z’>n i/’hv’rn’z’sn’z’rn ib’rn! cv’m’>/ #=’s=’z’r=
m=’m’>=’z’rn #=’s= =’r= rn’rn. rb’#= h=’m’z’rn’z’r/’m b’c= #n’hv’m’z’>n n’r/ s/’rn.
rb’#=’z’cb’z’h=’m rn’z’r/’m’z’s= >n sb’in’z’v’m s/’rn =’r=! cv’m’>/’z’>n
i/’hv’rn’z’>n &
>n’sv’t=’z’rn’z’in’m!
@ Cristóbal:
Your descriptions of the tokens remind me of syllabaries, and in particular the Japanese syllabary. There you will find five vowels, a special character n, which can be pronounced m if it is followed by a labial consonant, e.g. tempura, and the remaining syllables are consonant vowel combinations.
Here’s an example of the syllabary (there are two – this particular one is katakana, but the hiragana syllabary would use the same syllables). Note in particular that some syllables have a voiced and an unvoiced version, e.g. sa/za, etc, and some syllables don’t exist in the syllabary at all, e.g. we, wi, and wu, and ye and yi. (I gather at one time they did exist, but spelling regularization over the past hundred years has eliminated them.)
http://www.realkana.com/katakana/
@ Helge:
My guess is that the encryption steps at a minimum were as follows:
1) transcribe the plain text to kana
2) transcribe the kana to romaji
3) perform a substitution cypher on the romaji
There might have been additional substitution cyphers, of course, or even actual encryption, but if step 1 was involved anywhere in there, it probably was step 1, followed by step 2. I don’t think it is possible to do full character encryption without wiping out the token pattern that the previous poster discovered.
The thing about step 1 is that it means you will not find plain text English or German helpful for frequency analysis, since transcribing into kana has to follow certain rules.
fight – faito (final consonants are the -o or -u syllable)
arbeit – arubaito (consonant clusters use the -u syllable)
god – goodo (long vowels are doubled)
schwimmen – shiu(i)mmen (some syllables don’t exist at all and must be dummied up)
There are probably other points I’m not thinking of.
@ Helge:
“fight – faito (final consonants are the -o or -u syllable)”
Yep, I messed that one up: fu(a)ito is the correct transliteration. There is no fa in Japanese. ^_^
@ Helge:
In any event, there are online tools available to do this. Using the ones at http://www.sljfaq.org/ to convert the first two sentences of Moby Dick, I get the following text:
kooru・mii・isyumiieru・somu・yaazu・agoo・nebaa・maindo・hau・rongu・purisaisurii・habingu・rittaru・oo・noo・manii・in・mai・paasu・ando・nasingu・patexikyuraa・toxuu・intoresuto・mii・on・syoo・ai・sooto・ai・uddo・seiru・abauto・ei・rittaru・ando・sii・zaa・uxooterii・paato・obu・zaa・waarudo
Note there’s no punctuation. Somehow “watery” became uxooterii. Ishmael is now isyumiieru. There are actually a couple of different rules for romaji transliteration. The ones I used were Nippon style and wapuro (word processor), which doesn’t use diacrits. (Diacrits would have increased the number of vowel-like characters.)
So I’m not the only one who noticed the similarity to the Japanese alphabet.
However, there’s a major flaw in your theory, Helge:
If the first step was to change to Japanese (or anything other language), you couldn’t claim the plain text was English or German.
Novil, I have a question you might answer.
Is it necessary to translate the text to another language at any point in the decryption/encryption process?
I figured it’s worth asking.
I don’t think Helge is actually suggesting the plaintext was translated into Japanese. More like transliterated. By changing a few consonants and adding vowels between them, you can make text of any language spellable using the Japanese syllabic alphabet.
For example, Japanese has a lot of loan words from English that do just this. Bathroom becomes Basuruumu, because Japanese doesn’t have a “th” sound, it becomes an ‘s’, a vowel is added between the ‘s’ and the ‘r’ and a vowel is added to the end of the ‘m’ to make it a proper syllable.
Charles wrote:
Why not? In that case, English or German would be the “most” plain text.
You should know how good you are: i started thinking of that exact manuscript by the second paagd of The Book of Woo
@ Ryan:
You’re forgetting the fact that you’re quoting something that may be directly related to this. “In other news, great events cast their shadows (far) ahead. If you know where to look for them.” Guess where you need to look for them.
@ Rourkie:
I would say that if the same few words occur just before/after a switch, that is likely to be meaningful. The name of a person (or god) is as good a guess as any. I wouldn’t rule out a set phrase, though, such as “Peace Be Upon Him.” or “So it is writ.”
Would it be possible that the letters are flipped? Like this:
dluow ti eb elbissop taht eht srettel era deppilf?
@ sylphidHeather:
We don’t even know what language it’s in.
@ Luke:
haha perfect!
You know how there has been discussion of two sets of letters? What if the manuscript is in both english and german? Possibly one of the sets corresponds to each? No evidence to back this aside from us knowing that the plain text is in both languages. I have zero experience in this but maybe to provoke some thought?
I’ve resigned myself to just waiting for the next hint.
Although given the last “hint”, I don’t have high hopes.