The Unicat Programming Language

Published on 23 August 2023 (Updated: 09 September 2023)

Welcome to the Unicat page! Here, you'll find a description of the language as well as a list of sample programs in that language.

This article was written by:

Description

Introduction

If you love cats and you love emojis, then you'll love this language since it is entirely composed of cat emojis! But seriously, this is an esoteric language, so it is designed to be challenging and difficult to do anything useful.

The Unicat language was created by gemdude46. Sadly, there was no documentation that described how to use it. In order to learn this language, I had to study the code. It is written in python2, which went End-of-Life on January 1, 2020. Fortunately, I had a python2.7 interpreter, so I was able to run the example programs and put them on the python2.7 debugger to figure out parts of the code that I couldn't figure out by just reading the code.

Character Set

There are 9 cat emojis that make up the character set for this language. Each one of these characters represent a byte code from "0" to "8":

Emoji Byte Code Emoji type
😸 "0" Grinning Cat with Smiling Eyes
😹 "1" Cat with Tears of Joy
😺 "2" Grinning Cat
😻 "3" Smiling Cat with Heart-Eyes
😼 "4" Cat with Wry Smile
😽 "5" Kissing Cat
😾 "6" Pouting Cat
😿 "7" Crying Cat
πŸ™€ "8" Weary Cat

Anything that is not a cat emoji is ignored, so code comments can be placed within code without affecting the operation.

Memory

Memory is made of an arbitrarily large set of addresses that are capable of storing arbitrarily large values. In other words, it is a python dictionary where the key is the address and the value is the integer contents of that address. Any uninitialized memory address is assumed to be zero.

Memory address -1 is a special address that contains the address of the current instruction to execute. This is initialized to -1 and is incremented before each instruction is executed.

It should be noted that the instruction memory and the memory that the program uses are entirely separate, so a program cannot modify itself.

Numbers

Numbers that are used in instructions are represented in octal (base 8), and each digit is converted to the corresponding emoji. The number is terminated with πŸ™€ ("8"). If the number is negative, the next emoji is 😿 ("7"). Otherwise, it may be terminated with any other cat emoji. For the sake of brevity, let just use πŸ™€ ("8") for positive numbers. For example:

Decimal Value Octal Value Emojis Byte Code
457 0o711 πŸ˜ΏπŸ˜ΉπŸ˜ΉπŸ™€πŸ™€ "71188"
-345 -0o531 πŸ˜½πŸ˜»πŸ˜ΉπŸ™€πŸ˜Ώ "53187"

The 0o is just a common way of represental octal values in several programming languages like C and Python, to name a few.

Any character needs to be represented by its ASCII (or Unicode) value. For example, the ASCII value of "H" is 72 (0o110), so "H" is represented as this:

πŸ˜ΉπŸ˜ΉπŸ˜ΈπŸ™€πŸ™€ ("11088")

One of the oddities of the way numbers are handled is that if there are no more emojis to process, the number is interpreted as 1337. Why 1337? Well, all I can figure is that this is actually a leet code representation for the word "elite".

Instructions

Unicat supports these 12 instructions:

Mnemonic Arguments Emojis Byte Code
asgnlit MEMADDR VALUE 😻😹 "31"
jumpif> MEMADDR INSADDR 😽😿 "57"
echovar MEMADDR 😽😼 "54"
echoval MEMADDR 😼😼 "44"
pointer MEMADDR 😼😾 "46"
randomb MEMADDR πŸ™€πŸ˜» "83"
inputst MEMADDR 😺😼 "24"
applop- MEMADDR1 MEMADDR2 πŸ˜ΏπŸ™€πŸ˜Ί "782"
applop* MEMADDR1 MEMADDR2 πŸ˜ΏπŸ™€πŸ™€ "788"
applop/ MEMADDR1 MEMADDR2 πŸ˜ΏπŸ™€πŸ˜Ώ "787"
applop+ MEMADDR1 MEMADDR2 πŸ˜ΏπŸ™€πŸ˜Έ "780" (*)
diepgrm Β  πŸ™€πŸ™€ "88"

(*) applop+ can actually be respresented by any of the following:

However, let's just stick with what is in the above table.

If an invalid instruction is detected, the program will just jump back to the beginning, creating an infinite loop. This also happens if there are no more instructions to process.

The subsequent sections describe each of the instructions and their corresponding arguments.

asgnlit

The asgnlit instruction stores a value to a memory address. It takes two arguments:

For example, let's set memory address 14 (0o16) to -8 (-0o10):

😻😹 πŸ˜ΉπŸ˜ΎπŸ™€πŸ™€ πŸ˜ΉπŸ˜ΈπŸ™€πŸ˜Ώ ("31 1688 1087")

Note that the spaces are just added to delineate the mnemonic from the arguments.

If MEMADDR is -1, the VALUE is the instruction address minus 1. This acts like an unconditional jump to instruction address VALUE plus 1.

For example, let's jump to instruction address 22 (0o26). Since the VALUE is the instruction address minus 1, this will need to be represented as 21 (0o25):

😻😹 πŸ˜ΉπŸ™€πŸ˜Ώ πŸ˜ΊπŸ˜ΎπŸ™€πŸ™€ ("31 187 2588")

jumpif>

The jumpif> instruction compares a value of a memory address to zero. If it is greater than zero, then the instruction address is changed to the specified value. Otherwise, the instruction address goes to the next address. This acts like a conditional jump. It takes two arguments:

For example, let's jump to instruction address 30 (0o36) if the value of memory address 16 (0o20) is greater than 0. The instruction address needs to be represented as 29 (0o35):

😽😿 πŸ˜ΊπŸ˜ΈπŸ™€πŸ™€ πŸ˜»πŸ˜½πŸ™€πŸ™€ ("57 2088 3588")

echovar

The echovar instruction outputs the value of a memory address as ASCII/Unicode to standard out. It takes a single argument:

For example, let's output the value of memory address 7 (0o7):

😽😼 πŸ˜ΏπŸ™€πŸ™€ ("54 788")

If memory address contains 117, a u is output since 117 is the ASCII code for u.

echoval

The echoval instruction outputs the value of a memory address as digits to standard out. It takes a single argument:

For example, let's output the value of memory address 6 (0o6):

😼😼 πŸ˜ΎπŸ™€πŸ™€ ("44 688")

If memory address contains 10, a 10 is output.

pointer

The pointer instruction treat a memory address as a pointer to second memory address and stores the value of that memory address to the first one. It takes a single argument:

This is best explained with a diagram:

          Before                           After
    ===================              ===================

    | Memory  |       |              | Memory  |       |
    | Address | Value |              | Address | Value |
    +---------+-------+              +---------+-------+
    | 3       | 7     | ---    =>    | 3       | 5     |
    +---------+-------+   |          +---------+-------+ 
--> | 7       | 5     |   |          | 7       | 5     |
|                         |
---------------------------

In this diagram, memory address 3 contains a value of 7. The value of memory address 7 is 5. Therefore, memory address 3 will contain 5, and memory address 7 remains unchanged.

For the above case, the instruction would look like this:

😼😾 πŸ˜»πŸ™€πŸ™€ ("46 388")

randomb

The randomb instruction sets a memory address to a random boolean value of 0 or 1. It takes a single argument:

For example, let's set a random boolean value to address 14 (0o16):

πŸ™€πŸ˜» πŸ˜ΉπŸ˜ΎπŸ™€πŸ™€ ("83 1688")

inputst

The inputst instruction reads a line from standard in and stores its ASCII/Unicode value to a set of memory addresses starting at the specified memory address. Each ASCII/Unicode value is stored in subsequent memory addresses. Finally, a 0 value is stored in the next memory address to null-terminate the line. It takes a single argument:

For example, let's store the input line to memory address 8 (0o10):

😺😼 πŸ˜ΉπŸ˜ΈπŸ™€πŸ™€ ("24 1088")

If the input line is "Hello" followed by a newline ("\n"). This is the result:

where "\0" is called the NUL character, which is used in other languages like C to indicate the end of a string value.

applop

The applop instructions take two arguments:

It performs the following operation depending upon the mnemonic:

Mnemonic Result
applop+ Value of MEMADDR1 equals value of MEMADDR1 plus value of MEMADDR2
applop- Value of MEMADDR1 equals value of MEMADDR1 minus value of MEMADDR2
applop* Value of MEMADDR1 equals value of MEMADDR1 times value of MEMADDR2
applop/ Value of MEMADDR1 equals quotient of value of MEMADDR1 divided by value of MEMADDR2

For example, let's set the value of memory address 9 (0o11) to 55 and the value of memory address 7 (0o7) is 10:

Mnemonic Emojis Byte Code Result
applop+ πŸ˜ΏπŸ™€πŸ˜Έ πŸ˜ΉπŸ˜ΉπŸ™€πŸ™€ πŸ˜ΏπŸ™€πŸ™€ "780 1188 788" Memory address 9 value is 55 + 10 = 65
applop- πŸ˜ΏπŸ™€πŸ˜Έ πŸ˜ΉπŸ˜ΉπŸ™€πŸ™€ πŸ˜ΏπŸ™€πŸ™€ "782 1188 788" Memory address 9 value is 55 - 10 = 45
applop- πŸ˜ΏπŸ™€πŸ™€ πŸ˜ΉπŸ˜ΉπŸ™€πŸ™€ πŸ˜ΏπŸ™€πŸ™€ "788 1188 788" Memory address 9 value is 55 * 10 = 550
applop- πŸ˜ΏπŸ™€πŸ˜Ώ πŸ˜ΉπŸ˜ΉπŸ™€πŸ™€ πŸ˜ΏπŸ™€πŸ™€ "787 1188 788" Memory address 9 value is floor(55 / 10) = 5

diepgrm

The diepgrm instruction exits the program. It has no arguments. The program must contain at least one of these somewhere, or the program will go into an infinite loop.

Conclusion

Programming in Unicat is really challenging. It feels more like programming in machine language without the benefit of an assembler. Instead of translating instructions into hexadecimal, you have to translate them into cat emojis, some of which look very similar. If your editor has zoom capability, I highly recommend using that.

When you run your program, there is a good chance it will lock up. Here are some of the reasons why (other than the obvious logic error):

By the way, I got tired of using python 2 and ported Unicat to python 3. I also fixed a couple of bugs in the original python 2 implementation. The code is available on pypi, and the source code is available on GitHub. It has some limited debugging capability which uses the python debugger to do the heavy lifting.

In the meantime, happy coding 😸!

Articles

There are 4 articles: