IBM

Code Pages and Kohuepts: The Chaos of 8 Bit Extended

Code Pages and Kohuepts: The Chaos of 8 Bit Extended ASCII

#Code #Pages #Kohuepts #Chaos #Bit #Extended

“Dylan Beattie”

“But it’s plain text! What do you mean it looks weird?”

Friends, let’s take a walk around the wonderful world of code pages, the cause of more encoding headaches, bizarre punctuation and inventive workarounds than just about anything else in IT history – and along the way, we’ll meet some other…

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

34 Comments

  1. So you are the plain text guy that shows up multiple times in my feed😂. Anyway, I've recently heard a new story about this. For some context, there used to be a popular game genre within the Chinese (the language) internet called 魔塔 (Magic Tower), and one of the famous game among mainland China was made in Hong Kong, and because back then mainland mostly used GBK coding while HK used Big5, the text is turned gibberish, and somehow someone on the internet was dedicated enough to just memorize and read such gibberish in order to progress the game.

  2. Hi @Dylan… I've heard you talk on this subject a couple'a times now… but you never use the term "Mojibake" (文字化) and I've always thought you should… 'cus it's a cool word. 😉

  3. Two things re code pages:
    You need at least EGA graphics on a PC to use code pages.

    Microsoft/IBM forced users in Sweden to do all the "load code page" crap to be able to select the correct keyboard layout, even though the default "CP437" already had all characters that we really needed. Everything about code pages in DOS has an aura of "pointy haired boss"…

  4. Great video. These problems are still with us. I recently bought a new laptop and during the installation process it seemed to think I had a UK keyboard and rendered the @ as a ", which made it very difficult for me to enter my email address. I figured it out.

  5. On the keyboard trickery, the video game Library of Ruina have a mid-lategame boss that, in the original Korean, is a jumbled mess of Latin characters. And in English, their name is a jumbled mess of Korean letters. As you can guess, their name is concealed using the "type in the wrong keyboard" method, with the other translations, Chinese and Japanese, also using the same method in their own keybpards, so that their name is gibberish-but-"typable" in all languages.

  6. It still is a problem with compilers and editors if you use non-ASCII characters (like ä, ö, ü, ß, copyright, trademark etc.) in string constants (or comments even). The editor might automatically switch to UTF-8 (Notepad++ does that), which the compiler takes for standard ASCII and chokes. Usually you get garbage at some point. I got used to embed such characters as hexadecimal escape codes to avoid that pitfall.

  7. That WordStar screenshot is such a goldmine of nostalgia, I used a lot of different CP/M and DOS machines back in the olden days and they all had their differences and "killer apps"… but WordStar was the ONE constant. At college we had realised that the CP/M text editor was the ninth circle of hell and some bright spark realised you could use WordStar in "non document mode" as quite a decent text editor and so, until Microsoft put a cut down version of QBASIC in DOS-5 and called it `EDIT`, WordStar followed me around for many years.

  8. Ш щаеут ащкпуе ещ срфтпу дфтпгфпу уізусшфддн црут Ш іуфкср штащкьфешщт щт Пщщпду. Щр тщ тще фпфшт!

  9. Lovely historical info trove. In the 1960's I grew up on Dartmouth Timesharing Basic on a TeleType ASR 33, so I got to know 6-bit ASCII pretty well. Fast forward to 2004 trying to maintain a Spanish website on JDK1.4, which didn't support UTF-8 in property files. Had to copy and paste UTF-8 from Word documents into an app that converted UTF-8 to Unicode backslash escape characters. You've nicely covered quite an historical odyssey from Baudot to ASCII to EBCIDIC to Code Pages and finally Unicode. Thank you, sir!

  10. Totally understand your view on "pound sign". But as an American of Gen X age and having had a grandmother who worked for a "baby" Bell telephone company, # is the "pound key" to me. AT&T and the Bell system instructed us to use the "pound key" for specific dialing situations with touch tone phones.

  11. Yes, before codepages we had some "adjustments" to e.g. Swedish. One such was ISO ESC 2/8 4/7 which actually included the Swedish variants "inside" of ASCII, in the Swedish computer ABC80. And if I remember correctly the same code was used in what was equivalent to the English Prestel, videotext or whatever they were called. Yes, that also had consequences. Early in the internet when we still had that awful ugly quoted printable hack, the mess got so bad that we gave up and spelled our texts with a and o instead of å, ä and ö. Mostly text were still readable, due to context …

  12. Well done, Dylan!

    ASCII was a good solution considering the technical constraints of the time. It just shouldn't have lived that long. We went through Commodore ASCII (aka not really ASCII at all) and a few other proprietary variants, more official extension such as the three dozen variants of ISO-8859-random_number plus a bunch of national standards and of course the code pages, Amiga ASCII, ATASCII aka Atari ASCII, EBCDIC (punch card compatible but not ASCII-like) and more. After having survived baudot code and what not in the teleprinter age. ASCII was always a standard that was typographically impoverished, just barely good enough – it doesn't even fully cover the character set used in an average newspaper such as proper “quotes”.cent symb ¢ and more.

  13. I'm just happy I never have to do a latin-1 to utf-8 database conversion again.

    I'm also happy I never have to fix utf-8 stored with latin-1 connection to utf-8 stored with utf-8 connection again.

  14. For greek we had codepage 737, or iso8859-7, but since the whole thing was a mess, you had to use a greek font, run a VGA glyph-replacement program, and earlier computers where ascii-only anyway, my generation of computer geeks used to type greek with latin characters. We called it "greeklish" and that's what we defacto used online to communicate. In later years subsequent generations started taking offence at people using greeklish on forums since they grew up with unicode and never had to get used to reading greeklish, but for some of us, having to switch keyboard layouts mid-sentence to type an english term is just unbearable, so we keep using greeklish 🙂 Also I type at half the rate in greek, since I never got used to it…. unbearable.

  15. 11:11 On a related note, some systems (eg search-engines, auto-correct, etc.) can detect other errors like when you type something with your fingers shifted a key to the left or right. It's a lot of work to encode all of the possible mistakes that could be made to accommodate errors seamlessly, but perhaps it's a job well-suited for machine-learning. 🤔 (On the other hand, look at the mess that Microsoft made by making IE correct for sloppy web-developers. 😒)

  16. Neat fact about the keyboard layout thing. In the 2002 movie "The Bourne Identity" the protagonist assumes the fake identity of a Russian citizen named "Foma Kiniaev". He gets a fake Russian passport but his Russian passport in Cyrillic says "Ащьф Лштшфум". The prop department just set their keyboard to Russian and wrote "Foma Kiniaev" as if it was a qwerty keyboard.

    Turns out it was actually quite realistic. A few years back a guy tried to present a fake Israeli passport in Barbados under the name "Assulin Hormoz", But instead of "Hormoz" his surname in the passport in Hebrew was also typed as if it was Latin so it became "יםרצםז", which was further mangled by being rendered backwards as "זםצרםי" (bidirectional text is something you haven't covered and it's a whole other can of worms). There were also several other Hebrew mistakes in the passport such as text rendered upside down or similar looking letters being mixed up.

Leave a Reply