Have you ever heard of the Voynich manuscript? It’s a mysterious book, possibly created in the early 15th century, that contains weird illustrations and text written in an unknown script and language. Since its rediscovery in 1912 by Wilfrid Voynich, it has eluded the decipherment attempts of generations of cryptographers. The Voynich manuscript is a fascinating piece of history that has inspired many novels, games and films. Amateur cryptographers can find the latest news and research on the Voynich manuscript and other uncracked ciphers on Nick Pelling’s blog Cipher Mysteries. He’s also the author of the readable non-fiction book The Curse of the Voynich.
To celebrate the publication of the 500th Sandra and Woo strip, I have decided to publish “my own Voynich manuscript”. So here it is, The Book of Woo! As you can see, it resembles the Voynich manuscript in several ways. But of course we couldn’t create 240 pages, 4 had to be enough. Unlike the Voynich manuscript, The Book of Woo definitely contains sensible information that can be deciphered. I guarantee it ;-). And I will pay the person who is able to provide a decipherment that’s sufficiently close to the plain text a reward of $500. Send your decipherment attempt(s) to novil@gmx.de. I would also love to hear about your general ideas or statistical analyses that you carried out. There is no deadline. I will not publish the solution until at least strip #1000.
But be warned: It’s a huge challenge and I don’t expect to receive a valid decipherment at all. It’s primarily a work of art, not a puzzle for the general public. I believe that only experienced and dedicated code breakers have the chance to succeed. A lot of time was spent on the encryption. If you think you can simply carry out a frequency analysis on the letters and be able to reconstruct the English or German plain text this way, well, that’s just a waste of time. However, to make things a little easier, I want to give you the following hints:
- The encryption isn’t based on an algorithm only suitable for computers which executes a loop 100 times or something like that.
- The encryption isn’t based on some sort of device or mechanism that is hard to get.
- No “classical” steganographic method was used since that would just be impossibly hard to crack.
- The plain text is some sort of literature, as one can guess from Woo’s comment and the illustrations. A lot of time went into the plain text as well, it’s not just a copy of the first page of Rascal or something like that.
You can download larger versions of the four pages of the Book of Woo here:
[Update: 10 August 2013] Everybody who is seriously interested in deciphering The Book of Woo should read the comment section. There is a lot of interesting information in it.
[Update: 31 March 2015] The Book of Woo Wiki, now maintained by our reader Chris, also contains valuabable information for anyone who’s trying to break the code. In case the wiki should go offline sometime in the future, I created a complete backup of the wiki’s content on 31 March 2015.
In other news, the winners of the Sandra and Woo and Gaia fanart contest 2013 have been posted.
Thanks to everyone who participated!
- Sandra: Hey, Woo, what are you writing?
- Woo: Oh, just a little story.
- Sandra: Really? Can I have a look?
- Woo: Sure, I’ve just finished it.
- Sandra: What in Voynich’s name…?!
|
Charles wrote:
Uh, well, I assumed one was a Vignere, and given the text alignment, assumed the first three words would still be Sandra and Woo. So, I tried Sandra and Woo as the password.
Here’s a trick:
Go to Rumkin Cipher Tools (google it), and head over to the Vignere section.
Here’s a Vingere cipher to plug in.
Steltk agu Eqy akv bgbrbwqe!
Now, assuming “Sandra and Woo” are the first three words, you can turn on decode, and plug that in as your password. The password reveals itself in the decoded text. Copy/paste the password in the decoded text into the password box for the full reveal.
Most vignere’s aren’t that simple, but that one was. I’d use a similar methodology for the one-time pad that isn’t random.
Anyways, so that was pretty easy. Turning back to the one that was solved in 2012 with that password, I identified the code that was solved in 2012 as a keyed vignere rather than a cryptogram with the same password.
So, then I just looked for the cipher type that matched the other two phrases with the password in hand.
All of it pretty easy. The other type was a bifid.
Now, on the last one, it looked at a hunch like a table, so I set one up. A table set up for Sandra and Woo as a password looks like this:
S A N D R
W O B C E
F G H I K
L M P Q T
U V X Y Z
At this point, I noted the extra “is”, so plugged that into my data set. “Sandra and woo is”. I find that all the enciphered characters are on the same row as the plain-text, so I was hunting for a methodology behind how many movements were being made. Haven’t found a pattern, but kinda hacked the “always” by assuming each enciphered letter was also on the same line as its plain text. Only one word fits into those possibilities (that I could find). And the last word looks to have an “ing” ending.
That’s where I stopped. I looked at all 10-character English words ending in ING and narrowed down the hunt to, well, 0. Going forwards and backwards, “EARMARKED” was the closest word, but it doesn’t fit in there. So, maybe a spelling mistake or something.
Sorry, keyed Caesar.
The last one’s playfair.
“Sandra and Woo is always surprising”
Based on all available data, I’m going to recommend the following assumptions:
1) The Book of Woo does not use Vignere, Bifid, Caesar, or Playfair ciphers.
2) The password used for at least one layer will either be “Sandra and Woo” or “The Book of Woo”.
Charles wrote:
Awesome. I didn’t even look there due to the character pairing. Well done!
Charles wrote:
So, since I’ve been fascinatedly reading all the comments, I now have to add a bit here as well. While that plaintext looks and sounds right and I can come up with the same with a playfair table of
S A N D R
W O B C E
F G H I K
L M P Q T
U V X Y Z
there is the troubling case of the ‘hh’ in the encrypted text. If I understood the rules correctly, there shouldn’t be the possibility of two similar consecutive characters in the cipher text, should there?
If I replace ‘Woo’ with ‘Woxo’ and ‘always surprising’ with ‘alwaysx surprising’ in the (assumed) plaintext and apply above table, I end up with
ANDRSNNDSC BVCGANUFDVNU WSNTDKDFAH
in comparison to the original encrypted text of
ANDRSNNDSC HHFDSMOSUD WSNTDKDFAH
Taking the password to be ‘Sandra and Woo is’ the table becomes
S A N D R
W O I B C
E F G H K
L M P Q T
U V X Y Z
with the encrypted text being
ANDRSNND SBIVIBANUEDVNU WSNT NCNWIP
in comparison to the original encrpyted text
ANDRSNND SCHHFDSMOSUD WSNT DKDFAH
which differs even more than with the first passphrase
Where am I wrong?
Oliver
@ rockus:
“is” isn’t part of the password; it was added to the “known plaintext”.
I’m still uncertain about “hh” in the playfair, but the “ss” in always surprising” is okay as the two Ss don’t fall into a single letter digram. Still, there’s no question that Charles’ answer is correct as you can punch the crypt text and the password in the Playfair applet and it solves.
There’s a secondary option with playfair where instead of placing an x between two consecutive letters, you shift them down and to the right. So “oo” becomes “hh”.
Thanks!
Toll, so wahnsinnig viele Kommentare hier, wie soll man die alle lesen? Vor allem wenn man schlecht in english ist?
Toll auch die Transkiption(!), das muß eine extreme Arbeit gewesen sein, aber eine, die sich gelohnt hat, weil wir nun alle möglichen statistischen Tests über die Daten fahren können. Cool, danke.
Wenn Ihr meine 5 Cents hören wollt:
Der Autor schrieb bereits, daß es nicht einfach werden wird (trotzdem wollen hier etliche Kommentatoren das E per Häufigkeitsanalyse finden – lesen spart Zeit!)
Eine komplexe Verschlüsselungsmethode, und sei es nur ein 5-spaltiges Polyalphabet, sorgt normalerweise für ein totales Zeichenwirrarr.
Und genau das haben wir hier NICHT!
Der Code-Text ist SCHÖN!
Diese Tatsache sollte bei allen Versuchen der Entschlüsselung ganz oben stehen, denn das ist selten!
Mein Tipp:
Um solche Schönheit aus einem Klartext zu erhalten, und es trotzdem “schwer” zu machen, eignet sich am besten “Transposition” (siehe WIkipedia).
Wenn man Buchstaben vertauscht, anstatt sie zu ersetzen, kann man sogar das Schriftbild gezielt verschönern, der Codetext kann dadurch noch schöner werden als der Klartext, während Transpositionen von Natur aus immer schon sehr schwer zu knacken sind.
Wenn ich Oliver Knörzer wäre, hätte ich es so gemacht.
Charles wrote:
Ah, that explains it. So my comment is content-less. Ignore 😉
Thanks,
Oliver
Charles wrote:
However, if I put in the first line of transcripted ciphertext into http://practicalcryptography.com/ciphers/classical-era/vigenere-gronsfeld-and-autokey/ with the passphrase ‘sandrwo’, it yields
ibpanjslesenjungsaqmrrkyjwsbunbnwsdjssqtyhyuofshoncjfiiiodrcscqnhrttdhynbn
which contains ‘lesen’ and ‘jung’, German for ‘to read’ and ‘young’. Strikes me as too much of a coincidence…
Oliver
@ rockus:
Within a stream of ~75 letters it’s not unlikely to find two short random words.
…but it’s highly unlikely that the transcribed text got the letters correct on the first shot.
The correct mapping of the glyphs to letters will only be done after the text is solved.
@ Phil:
Novil already confirmed the letters in at least one of the transcriptions.
The text is not a playfair cypher at least not at the stage its at right now. Went through all 550 million combinations of a 5×5 playfair with no real results. I sort of expected as much given the letter count frequencies seen in this text.
I think there is probably 2 levels of encryption left. One is likely checkerboard like or some other substitution. Then followed by letter substitution or something like playfair.
I’ll probably have to go back to my method of removing letters from the transcription and then comparing the resulting word length frequencies in hopes of finding an operator symbol which reduces to one letter after its followed by another character or something similar. Can’t really see any other way around that unless the text simply is artificially spaced then the word lengths wouldn’t matter.
Cool: are we happy that Novil’s “i/hvrn svrnzrn mnsn vzmn: i/hvrn svrnzrn c/#n >/#=z#/m ivhn. h=mzr= svrn ibrnzr= /m=zrnzt/” is the correct transcription? The post was a little cryptic.
@ Phil:
Ciphers tend to be cryptic.
Thank you; I’ll be here all week!
….In all seriousness, though, yes. Novil gave (as far as he knows) the correct transcription.
@ Charles:
Haha! I guess it was a little more cryptic than non-cryptic. 🙂
Thanks man.
Just a FYI but I’ve updated my tool decided to throw in a helping function to finding possible operator symbols. It will not function as a replace text tool at that point it will simply remove 1 letter from the text and 2 letters from the cipher text and spit out a word counts sorted from 2 letters down. One letter words are pretty rare so i’m skipping that.
Funny enough doing this does seem to point out there are two possible operators. All the best word length counts come from having two letters removed not just one. Plenty of combinations which don’t screw up having too many letters which are 1 letter words either.
Could be off on this thinking not really sure anymore *shrugs*
Maybe some sort of padded hex?
“c=h=” is an interesting word.
You have [“sn c=h=”] where it appears to be a noun
and then a sentence which starts
[cvm>/ c=h= cvm>/] where it doesn’t appear to be an adjective or adverb
and then a sentence which starts
[c=h= >/m=zvm i/m=] where it appears to be some sort of article
(Ignoring &)
Perhaps there is a “word length shift”? For example, one-letter words may be expressed as 2 or 3 letters, 2 as 4, two threes or a three and then a four to a seven? It would explain the bizarre length distribution, especially the mini peak at 7.
Kyrene wrote:
Ya, that’s where I am also. Except, there is only 2 one-letter words in English, and apparently none in German. So I think there is probably 1 letter = 1 letter, 2 letters = 1 letter, and possibly 3 letters = 1 letter encoding.
The “v” is probably an i
One of the following is an “a”: sn, s=, or >n. I kinda like “sn” for “a”.
The other two have single letter encodings and represent two-letter words.
I absolutely don’t know if that will work, but that’s where I am in my thinking.
Looking at the matrix of character-pairs and sorting them a bit, I noticed that there seem to be three group of characters
c#ihsrt>
v=nlb
z(space)
and m seems to be the odd one out.
Within each group they’re never paired together
c#ihsrt> is only followed by one of v=nlb
v=nlb can be followed by a character any other group
z and space seem to behave the same way, only ever following v=nlb and m
But if z is another space (if indeed space is a space), then we get rather a lot of very short words.
Another thing I noticed is that if v starts a word, it’s always followed by z (except in the one case where v occurs as a single letter ‘word’).
And if anyone’s interested (and even if they’re not), here’s the longest repeating phrases (each occurring twice):
#nhvmzvm #nhvmz>n #brv (neither starts nor ends a sentence)
sn ilhvrnzrn i=s= (starts a sentence once, in the middle the other)
t=m sbcvzrn >nrnm … c=h=zsn (complete sentence, but with a different word on the …)
r=ilz>n cvm>l =rv (ends the sentence in both cases)
ilhvrn svrnzrn cl#n >l#=z#lm (starts the sentence in both cases)
c=h= >lm=zvm ilm= >lm=zr= (starts the sentence in both cases)
@ Jan-Willem:
Yes, see older comments 🙂 Mine on page 2 in particular.
Z is almost certainly a spacing/order operator. One thing many people seem to not consider is the fact that permutation is possible, such as swapping first syllables of words, then gluing some words together, etc.
Personally, I think “sn” is something like “Ze” (The) or “Di” (Die) articles. Might be wrong, of course.
As before, there are not enough letters for simple substitution. Or vigenere, for that matter. We don’t even know if the final alphabet is 26 or 30 chars long.
“c=h=” is weird. It is often, though not always paired with “sn” in the sentence somewhere, like:
“snzrn c=h=zs=z>n t=m i/~=”
“t=m sbcvzrn >nrnm n#v c=h=zsn”
and “sn c=h=”, of course: ” vz~nsn “sn c=h=” c=h=zt=m”, note “~nsn” before quote marks. (Sorry about ~, it is another “m”)
Right, time for another hint from Novil (running out of those):
“- Almost nobody seems to consider a particular method of obfuscating text.”
@ Satsuoni:
Consider?
It’s hard consider something you don’t know…
@ Satsuoni:
I certainly hope its not obfuscated in the traditional method of scrambling letters. If it is I have no idea how we are going to decipher the text give no words could be used to get back to the right alphabet. We could easily end up with the scrambled to het as abc and have no way of tracking it back save guessing letter frequencies which would be massively time consuming. I’d certainly give it up lol. This of course assumes we get passed the letter problems we have already..
What I’m really hoping is Novil encoded the text as some kind of number game. At least as some way to explain why we have 15 symbols when more realistically we should see something like 20 or more.
I’d like to think the number 15 isn’t coincidental. Perhaps the encoded letters are in hex and sometimes the hex number is in the ten’s spot. This could be done with decimal too but with of course 1 and 2 in the ten’s spot. Decimal gives up a lot more symbols though as you can only have 9 in 1 digit land.
Satsuoni wrote:
I think the wording of this tip was not perfect since I had to be extremely vague. Better ignore it.
Hi, this is the first time I’ve chimed in on this (btw, Thomas J. God is a different being who probably already knows all the answers). I’ve noticed that there is a limited set of rules that seems to govern what glyphs can appear where.
First, there are glyphs that I consider to be in “Group 1,” or “G1”: # > c h i m r s t. G1 glyphs begin most words, and never end them.
Then, there are glyphs that I am calling “Group 2,” or “G2”: = b e n v. Whenever a G1 glyph appears, it is always followed by a G2. G2 glyphs can sometimes begin words and can sometimes appear without a G1 preceding them.
Then there are m and z. The z glyph never begins or ends a word, and never appears as the second-to-last glyph of a word. It seems to function as a prefix for G1G2 digraphs, and sometimes for lone G2 glyphs. It seems that m can appear both as a G1 glyph and also as a suffix for G1G2 digraphs and lone G2 glyphs.
This means that all glyph groups in the text fall into one of these categories:
G1G2 digraphs
zG1G2 trigraphs
G1G2m trigraphs
zG1G2m tetragraphs
G2 unigraphs
zG2 digraphs
G2m digraphs
zG2m trigraphs
Or, if you look at it a slightly different way, suppose G1 simply contains the potential for being absent, or an invisible glyph, if you will. This means that all glyph groups that appear in the text consist of G1G2 digraphs with potential z prefixes and m suffixes. There are 50 possible G1G2 combinations, and this combined with z prefixes, m suffixes, and both means that there could be up to 200 different n-graphs in the text. However, there seem to be a lot of combinations allowed by my theory that don’t appear — right now I’m seeing something like 81 n-graphs in the text. One problem, as Satsuoni has pointed out, is the fact that the m glyph appears both in G1 and as a suffix. This makes it difficult to parse a G1G2mG2 pattern — is the m a suffix to the first G1G2, or is it the G1 glyph connected with the second G2?
One question is then what the z prefix and m suffix mean. Do they modify the G1G2 pattern, or do they simply produce a totally new n-graph with a completely separate meaning?
Then there’s the issue of what these glyph groups or n-graphs represent. Obviously there are more of them than there are letters in both German and English, so what do they mean? Since there are so few 1-glyph words (only “v”, considering the “&” symbol to be something else entirely), this suggests that single letters have been replaced with multi-letter patterns (a multiliteral cipher). However, with so many n-graphs available, it’s also possible for there to be common multi-letter combinations that have been replaced with separate n-graphs. Perhaps one n-graph means “t,” another means “o,” and another one means “to.” It’s also quite possible that more than one of them translates to the same symbol or group of symbols in the next layer — there might be multiple n-graphs that mean “e.” Also, there might be “meta” n-graphs that mean “double the previous letter,” “ignore the last n-graph,” or the like.
I just thought I would present my thoughts — right now my main interest is not in winning the prize, but in finding out what the text means so I can read Woo’s story.
@ Thomas J. Lee:
Thanks, Thomas J. Lee (who’s totally not Thomas J. God in disguise)!
So does the G2 thing mean that the words ‘I’ and ‘a’ are likely “b” “e” “n” or “v”?
@ Mr. Random:
> So does the G2 thing mean that the words ‘I’ and ‘a’ are likely “b” “e” “n” or “v”?
There’s no reason to assume that. The word “a,” for example, could perhaps be mapped to “v,” yes, but it could also be mapped to “sn,” or “em=.” (I’m not suggesting that this is what the actual meanings are; I’m just saying that it could map to a unigraph, digraph or even trigraph.) Or we could have more than one mapping for it. Also, there might be different mappings for the letter “a” within a word, as opposed to “a” as a stand-alone word. This isn’t over yet.
Ok, so I’m way behind the curve on this, and my apologies in advance if this has already been suggested and ruled out, but…
It was pointed out earlier that there seemed to be two separate 16-character symbol sets, and two symbols that switched between which of them was in use. Adding in the possibility that the symbols are actually digits rather than letters, and that characters are pairs of symbols rather than individuals…
Is it possible that, instead of the two symbols sets being merely alternates for each other, that the use of one symbol set or the other indicates the use of a different cypher?
If so, that means any analysis of the whole text using the “consolidated” value set, assuming a correlation between the two symbol sets can be made definitively, would only cause more confusion rather than clarity. It would mean that the portions of text in each of the two character sets need to be worked on separately…
@DanialArin
If you compare the two sets of characters, they behave pretty much the same statistically. So one character from one set will precede or follow the same characters as its alternate precedes/follows the equivalent alternates, with roughly the same frequencies. So while it doesn’t rule it out entirely it seems unlikely that they’re two different ciphers.
I’ve also considered splitting the text in two along those lines (thinking there might be a German and English text interwoven), but on a character or word level I can’t find enough of a difference to suggest that’s the case.
@ DanialArin:
After the initial conversion, there’s nothing special about that character. What you’re suggesting could just as easily occur with any other character as well.
Seriously people, any inspiration? The first layer was an alphabet switch, and at the second we are stuck. It is probably not a “classical cipher” to begin with, but another command-like structure change llike the first one. Novil mentioned that there are “several”, and we are just 1 in so far 🙁
There is a lot of redundancy in the text, like some letter combinations only ever being followed by another letter combos, etc (“#/m ivhn”, for example, with or without z) One has to wonder how many words are in each sentence, or whether the text really is that repetitive.
Satsuoni wrote:
Thomas J. Lee definitely seems to be on to something; the G1/G2 pattern is significant enough to not appear to be a random distribution. I have been working on it off and on.
Setting aside m and z, the following digrams appear and will flesh out, what, at a glance somewhere around 65% of the text. Digrams appear to be as follows:
#/, #=, #b, #n, #v
>/, >=, >n, >v
c/, c=, cv
h=, hn, hv
i/, i=, ib, in, iv
r/, r=, rb, rn, rv
s/, s=, sb, sn, sv
t/, t=, tv
So of these, there are 33 (and if #, >, c, h, i, r, s, t are “carriers”, a total of 40 possible). Of the original text, there are 15 characters. Somewhere between these gives us a nice character set; so I’m looking at if the text is a blend of unigraphs and digrams.
m and z appear to be unigraphs.
@ Phil:
I’ve been looking at these bigram distributions for weeks now (see my earlier posts with matrices). M can be considered a unigram or can be split into bigram/trigram depending on position (whether it is before 5 “vowels” or 8 “consonants”, for lack of better term). Z is almost certainly a spacing command of some kind (again,see my earlier posts; there are just too many patterns that exist with “z” before them *and* standalone, complete with spaces). Spaces may be a red herring or a command in and of themselves.
Of some interest are sentences:
t=m i/~=zrn c=h=zs=zsn
snzrn c=h=zs=z>n t=m i/~=
If you look at them carefully, one can see that they contain almost the same bigrams with the exception of “>n” in the second sentence, and have the last “sn” and first “t=m i/~=” swapped (“>n ” innserted between). The sentences are adjacent. Which may be poetic style “They fought as hell. As hell, they fought.”, or indicative of the fact that we don’t know anything <_<
The most frequent prefix groups (assuming prefix coding) are (~ is "m"):
33 ~=
39 r=
42 i/
43 s=
65 >n
125 rn
The most frequent di/trigrams after “z” are:
12 r/m
19 s=
19 r=
41 >n
51 rn
The last vague,vague hint from Novil reads:
“- Maybe it’s just my personal preference, but a transcription where = is the most common letter seems unintuitive to me.”
And that is all that I know 🙁 I feel stupid.
@ Satsuoni:
I’ve had a thought that perhaps “z” is not a spacing command but rather a suffix indicator — perhaps all the “zrn,” “zrem,” “z>n,” etc. groupings are the equivalents of things like “ed,” “ing,” “s”, “tion,” “ly,” “ish,” or even apostrophe+s. There are, after all, no apostrophes at all in the text, which would be strange if it’s English (but not strange if it’s German). Come to think of it, there are no commas in the text either, which would be strange in either English or German.
I haven’t been considering “z” as a space indicator simply because it seems to me that we’re dealing with a multiliteral cipher, so if each grouping of 2 or more symbols maps to only 1 letter, we would end up with a text without any words longer than 5 letters or so — and if “z” is a space, that would break them up into even smaller words. Although it’s possible that the text was deliberately written entirely of very short words, it doesn’t seem very likely. Now, if we’ve got something more complex than a multiliteral cipher, where 1/2/3/4-symbol groups map to 1/2/3-letter groups in the end (or something like that), then we’ve got a stronger case for “z” being some kind of space, but I’m still not convinced.
@ Thomas J. Lee:
Yes, at first, I thought they were suffixes too, but then I noticed that a lot of those “suffixes” exist as standalone words. To wit, nearly everything after “vz” is a complete word that exists somewhere else, made of two “syllables”.
Also:
c=h=zr/m
r/m >/#= =mcv
r/mzs=z>n #=in
>n #=in
c=mzs=
s= =mcv vzi/hvrnz>n
Etc. As you can see, unless those suffixes a) Can somehow be standalone words or b) read differently without z but still form the same chunks of phrases standalone (see second example), I don’t see how they can be suffixes. As for word length, it is still possible that some spaces have been messed up with, and are not, in fact, spaces at all.
Mind you “almost certain” doesn’t mean “certain”. Still, if they were suffixes, why use “z” at all when normal pairs would do just as well?
@ Satsuoni:
If the spacing has been messed with somehow, though, wouldn’t we see inconsistent spacing around the punctuation?
Although perhaps the punctuation has been messed with too, because there are no commas at all, no apostrophes, and no hyphens.
I think we have a more complex scheme here than just glyph group-to-letter mapping, though, because what are we to make of words like “s=s=,” “rnrn,” “rnrnzrn,” etc.? What words in English or German consist of only two repeated letters? Perhaps they’re two repeated letter groups, but there aren’t a lot of words like that either. “Couscous,” someone pointed out earlier. So if there is some sort of rotation of mapping, what pattern does it follow?
Does a doubled glyph group mean something else? What if a doubled glyph group means a comma, no matter what the glyph group is? Why does a “z” never begin a word? Why are G1 glyphs sometimes omitted, but when they appear, why are they always followed by G2 glyphs? Why does “m” sometimes act like a G1 glyph but sometimes as a suffix that follows a G2 glyph? This is frustrating, yes.
Thomas J. Lee wrote:
Indeed, I almost suspected z of being a comma – though that makes text look very strange, so probably not. I was also the one to comb the dictionaries for doubled-group words, of which there are very few in both languages, and none that fit well with the text (“baba”, “papa”, “couscous”, “tzeetzee” about sum it up, though there is a german verb “nennen”) So either there is rotation, or some glyphs can split down the middle, or they are in the wrong place and “z” swaps part of the string around, spaces and all.
Out of desperation, I pulled out all “vowels” (G2? the 5 ones), and split the results along the z. It was weird, and I don’t think it is useful, but here, just in case:
ihrsr r~s ~ihrsr rc#># #ihh rsrir r~ r t>stc>~>hri#>~ r~ r> r r#ih r#c #ih#ih rc# ~ ri sss i~sr r>~ ~sschch tisris rch c s>i~ >>~rc>chc>ch src r~sr rr# # >sr >#rch>~ i~>~ ri~~ r #ih# rsrr ~src>i~ r~r>c>>st r~#i~ r>~ ~ich~ rc s>st rch si~>~ >chir r~ r#sch s>~s>~ r#iirr s >#iir >#rri >c>rihrsr rc#># #ih rsssr >ihrr >i> >>stsihr risr ihrc~i ~ rsihr rissrir r>rr# ihrr#>#r#ih >~ >srirc ~ rhhih rch rr# > ssc ricr ~i >s >#iir~iih >srir r rrh sscihrr >>st rhrchissririhr r#c ~s >~ih rihr >i>r >>st rri s ich>~ i~>~ rihr >i> >>st r> >#rihr >>st r#>#sc r# r#>#rr hhcc>i# >~>hr rihr ~ rsi r s>#iir >ri >c>rr rccic>i#i~r ti~ h r#schi> i> r s> s~~ >iirt r> h rr r rr>#cti~ rch s ss rch s >ti~tsc r>r#ch stsc r>rrch strr r>r i~ r> sihrsihrsc r#s t >r>#r sr rcsirsicihrscc r#s t >~ #r s >#iir c#h #h >#rr > ti~ ~ti~sc r rsc ihr >>stcic>ir > sr rhirh# hisr# r> cscch>rrchihr>r >~>rrihr r> s isihr rchi s >#iirchi s >#h #h >#rrsihr r#si~scssst sssr r>cic>#s rihrrr rihc>~ # r~> >ihr s rirc>#s r~> r#srrrr#h r rc#h >rsrr# c hr r s>si srrc> >ihr >>st r i
As for small sections of words being words in and of themselves, almost ALL small words are unrelated parts of larger words. If you pick a small word, you can almost certainly find a larger word that contains it — except maybe “qi”.
@ Phil:
If the smaller words are in the middle, sure. But how may one-two letter words also happen to be valid *suffixes* of yet other, larger words which exist in text without suffixes at all? Slim to none, as far as I can tell. Admittedly, I can be wrong. That still doesn’t explain why there are whole chunks of sentences identical with the exception of “z” being stuck in front of it. Still, that is my theory for now. The more approaches are covered the better.
I agree.
As to small words that are suffixes to non-suffixed words, I only mention these examples because they are really fun: critical, titan, bear, mayas, cartel, heroin. There are others for sure.
I don’t know if it helps, and I don’t recall anyone noticing this before, but the symbol variously denoted as “&” or “oPo” is always followed by “>nsvt=” (or a word beginning with that pattern), all eight times it appears. Also, every time it appears except for the first time, it is preceded by a word ending in “z>n”. If someone’s noticed this and mentioned it, then I need to read more carefully. 🙂
Also, I don’t know if it helps, but a word with a “b”/”q” in it will never have a “t”/”k” in it, and vice versa. These are two relatively low-frequency symbols, though, so that might not be a surprise, statistically speaking.
Has anyone noticed that on the first line of the text, right after the first colon, the words “iuoypj $ypjwpj” seem to be written noticeably lighter? Does this mean anything? Perhaps this is a hint for the “folding” of the glyph alphabet, since these two words seem to be the same as the initial “ievhrn svrnzrn.”
I’ve been experimenting with the index of coincidence of the various glyph groups when the text is broken into rows of various column widths, with spaces and punctuation excluded. In the case that “z” and suffix-“m” are considered part of the glyph groups, the index is maximized when the row width is 26 groups. (I tested 2 through 30.) When such prefixes and suffixes are left out of the glyph groups, a row width of 26 still produces a very high index, but 28 is slightly higher. I don’t know what this means, but this sort of analysis is often used to break Vigenere ciphers by then using letter-frequency correlations to find the most likely key letters in each column. With a multiliteral cipher as this appears to be, though, we can’t use correlations like that, because there’s no well-defined order for the glyph groups. Still, though, it’s interesting that the rows are relatively similar to each other when you put the glyph groups into a 26- or 28-column matrix.
I did a similar thing with just the raw glyphs, and I found that a matrix 5 glyphs wide had the highest index of coincidence. That might just be a coincidence, so to speak.
@ Thomas J. Lee:
I have noticed the “>nsvt=” thing (it helped me with matching alphabets), but I don’t think I ever posted it, since I never figured out why it was there.
The index of coincidence is interesting, but I am not sure how to use it with the broken text. No real experience with it, I guess. I’ll read up.
@ Satsuoni:
I don’t know why “z>n & >nsvt=” is so frequent either, but I thought I’d post in case someone had an idea.
The index of coincidence is good for finding repeating patterns, but not necessarily for telling us what to do with them. :/ I don’t have a lot of experience with it either.
Perhaps the two charsets are different languages? Novil admitted to using English and German, which are (somewhat) similar in word lengths, which could have been obscured by length shifts. Sorry if I am way behind everyone else.
@ Satsuoni:
At this point I have to agree with you that the ‘z’ glyph (and its related ‘w’ glyph in the other glyph set) must be a space. The pattern of what characters are found preceding and following it (and what characters are never found preceding/following it) is the same as the space’s. Also, “bare” G2 glyphs (=benv, or =b/nv) occur only at the beginnings of words (i.e. after a space) or after a ‘z’, which also supports the “z=space” hypothesis.
But if we replace ‘z’ with a space everywhere, there are quite a lot of spaces and very short words. The frequency of spaces soars far beyond what it should be for either English or German. I think, then, that what we have is a situation where something else is acting as a “real” space, and that perhaps the ‘z’ glyphs and the spaces in the original text are all basically null characters — either that, or there are no spaces, and we’ll have to supply the word breaks ourselves. I’m not sure what that means for the punctuation in the text.
Your ‘~’ character (the ‘suffix’ version of the ‘m’ glyph) still falls within words, of course. I begin to wonder, though, whether it might be the other side of the coin of the “bare” G2 glyphs, a G1 glyph with no trailing G2 glyph. I’m still looking at this as a pattern of digraphs consisting of G1 glyphs (#>cihmrst) followed by G2 glyphs (=benv), with variations. Those variations seem to be a case where there’s a G2 without a leading G1 (which now only occur at the beginnings of words, if ‘z’ is treated as a space) or where there’s a G1 without a trailing G2 (where the G1 is ‘m’; this can occur in the middle of words or at the end).
Possibly interesting tidbit: Computer analysis finds the longest recurring string is ‘rem s= >n #=in ibrn” (treating ‘z’ glyphs as spaces), which occurs 3 times in the text.
I just happened to stumble upon this comic while browsing the net and became fascinated with the Book of Woo and the deciphering attempts. I’m not entirely sure I’ve followed the progress completely, but if I understand correctly you’ve got an ‘encoded’ message with 15 characters that needs to map to 26-30 characters?
I am just throwing out an idea here, but if you’re going to map a lesser number of characters into a greater number it seems you’ll need multiple lookup tables and some rule or signal character(s) that tells you which table to look in. Is it possible that there is a table of the 14 most common letters, and a table with the 12-14 least common letters, and one character that tells you which table to look in? If the tables are divided by letter frequency you’d rarely need to look in the second table, so maybe one of the ‘odd’ characters like ‘m’ or ‘z’ is the signal to look in the second table rather than the first?
Just my wild speculation. 🙂
@ Thomas J. Lee:
Maybe you are right about the “m”, but wouldn’t that make it a G2 glyph that just happens to be inside words sometimes? That would make glyph sequences a prefix code, which is nice.
As for “z”, I am sure there is more going on with it (as in, I think it messes with the word order somehow, in addition to being a space-like thing), but damned if I can figure out how…
So no more insights from me for now 🙁
And another week (or so) with little to no results. I blame my lack of knowledge in all things cryptographical <_<
Seriously, where are professionals? Did everybody just give up?
@ Satsuoni:
If you want a professional’s help, find one yourself!
Lol, no. I am just surprised that after comments on the first page, including blog post from the person who seems to know a bit about Voynich, the majority of the visible progress came from a person whose knowledge in cryptanalysis is nil (me). I have expected them to crack it already…
Anyhow, sorry if I offended anybody, I was just frustrated XD