The Unicat Programming Language

Published on 23 August 2023 (Updated: 09 September 2023)

The Unicat

Programming Language

Welcome to the Unicat page! Here, you'll find a description of the language as well as a list of sample programs in that language.

This article was written by:

rzuckerm

Description

Introduction

If you love cats and you love emojis, then you'll love this language since it is entirely composed of cat emojis! But seriously, this is an esoteric language, so it is designed to be challenging and difficult to do anything useful.

The Unicat language was created by gemdude46. Sadly, there was no documentation that described how to use it. In order to learn this language, I had to study the code. It is written in python2, which went End-of-Life on January 1, 2020. Fortunately, I had a python2.7 interpreter, so I was able to run the example programs and put them on the python2.7 debugger to figure out parts of the code that I couldn't figure out by just reading the code.

Character Set

There are 9 cat emojis that make up the character set for this language. Each one of these characters represent a byte code from "0" to "8":

Emoji	Byte Code	Emoji type
😸	`"0"`	Grinning Cat with Smiling Eyes
😹	`"1"`	Cat with Tears of Joy
😺	`"2"`	Grinning Cat
😻	`"3"`	Smiling Cat with Heart-Eyes
😼	`"4"`	Cat with Wry Smile
😽	`"5"`	Kissing Cat
😾	`"6"`	Pouting Cat
😿	`"7"`	Crying Cat
🙀	`"8"`	Weary Cat

Anything that is not a cat emoji is ignored, so code comments can be placed within code without affecting the operation.

Memory

Memory is made of an arbitrarily large set of addresses that are capable of storing arbitrarily large values. In other words, it is a python dictionary where the key is the address and the value is the integer contents of that address. Any uninitialized memory address is assumed to be zero.

Memory address -1 is a special address that contains the address of the current instruction to execute. This is initialized to -1 and is incremented before each instruction is executed.

It should be noted that the instruction memory and the memory that the program uses are entirely separate, so a program cannot modify itself.

Numbers

Numbers that are used in instructions are represented in octal (base 8), and each digit is converted to the corresponding emoji. The number is terminated with 🙀 ("8"). If the number is negative, the next emoji is 😿 ("7"). Otherwise, it may be terminated with any other cat emoji. For the sake of brevity, let just use 🙀 ("8") for positive numbers. For example:

Decimal Value	Octal Value	Emojis	Byte Code
457	`0o711`	😿😹😹🙀🙀	`"71188"`
-345	`-0o531`	😽😻😹🙀😿	`"53187"`

The 0o is just a common way of represental octal values in several programming languages like C and Python, to name a few.

Any character needs to be represented by its ASCII (or Unicode) value. For example, the ASCII value of "H" is 72 (0o110), so "H" is represented as this:

😹😹😸🙀🙀 ("11088")

One of the oddities of the way numbers are handled is that if there are no more emojis to process, the number is interpreted as 1337. Why 1337? Well, all I can figure is that this is actually a leet code representation for the word "elite".

Instructions

Unicat supports these 12 instructions:

Mnemonic	Arguments	Emojis	Byte Code
`asgnlit`	`MEMADDR` `VALUE`	😻😹	`"31"`
`jumpif>`	`MEMADDR` `INSADDR`	😽😿	`"57"`
`echovar`	`MEMADDR`	😽😼	`"54"`
`echoval`	`MEMADDR`	😼😼	`"44"`
`pointer`	`MEMADDR`	😼😾	`"46"`
`randomb`	`MEMADDR`	🙀😻	`"83"`
`inputst`	`MEMADDR`	😺😼	`"24"`
`applop-`	`MEMADDR1` `MEMADDR2`	😿🙀😺	`"782"`
`applop*`	`MEMADDR1` `MEMADDR2`	😿🙀🙀	`"788"`
`applop/`	`MEMADDR1` `MEMADDR2`	😿🙀😿	`"787"`
`applop+`	`MEMADDR1` `MEMADDR2`	😿🙀😸	`"780"` (*)
`diepgrm`		🙀🙀	`"88"`

(*) applop+ can actually be respresented by any of the following:

😿🙀😹 ("781")
😿🙀😻 ("783")
😿🙀😼 ("784")
😿🙀😽 ("785")
😿🙀😾 ("786")

However, let's just stick with what is in the above table.

If an invalid instruction is detected, the program will just jump back to the beginning, creating an infinite loop. This also happens if there are no more instructions to process.

The subsequent sections describe each of the instructions and their corresponding arguments.

`asgnlit`

The asgnlit instruction stores a value to a memory address. It takes two arguments:

MEMADDR - The memory address to store the value
VALUE - The value to store

For example, let's set memory address 14 (0o16) to -8 (-0o10):

😻😹 😹😾🙀🙀 😹😸🙀😿 ("31 1688 1087")

Note that the spaces are just added to delineate the mnemonic from the arguments.

If MEMADDR is -1, the VALUE is the instruction address minus 1. This acts like an unconditional jump to instruction address VALUE plus 1.

For example, let's jump to instruction address 22 (0o26). Since the VALUE is the instruction address minus 1, this will need to be represented as 21 (0o25):

😻😹 😹🙀😿 😺😾🙀🙀 ("31 187 2588")

`jumpif>`

The jumpif> instruction compares a value of a memory address to zero. If it is greater than zero, then the instruction address is changed to the specified value. Otherwise, the instruction address goes to the next address. This acts like a conditional jump. It takes two arguments:

MEMADDR - The memory address to compare
INSADDR - The instruction address minus 1

For example, let's jump to instruction address 30 (0o36) if the value of memory address 16 (0o20) is greater than 0. The instruction address needs to be represented as 29 (0o35):

😽😿 😺😸🙀🙀 😻😽🙀🙀 ("57 2088 3588")

`echovar`

The echovar instruction outputs the value of a memory address as ASCII/Unicode to standard out. It takes a single argument:

MEMADDR - The memory address to output as ASCII/Unicode

For example, let's output the value of memory address 7 (0o7):

😽😼 😿🙀🙀 ("54 788")

If memory address contains 117, a u is output since 117 is the ASCII code for u.

`echoval`

The echoval instruction outputs the value of a memory address as digits to standard out. It takes a single argument:

MEMADDR - The memory address to output as digits

For example, let's output the value of memory address 6 (0o6):

😼😼 😾🙀🙀 ("44 688")

If memory address contains 10, a 10 is output.

`pointer`

The pointer instruction treat a memory address as a pointer to second memory address and stores the value of that memory address to the first one. It takes a single argument:

MEMADDR - The memory address to use as a pointer

This is best explained with a diagram:

          Before                           After
    ===================              ===================

    | Memory  |       |              | Memory  |       |
    | Address | Value |              | Address | Value |
    +---------+-------+              +---------+-------+
    | 3       | 7     | ---    =>    | 3       | 5     |
    +---------+-------+   |          +---------+-------+ 
--> | 7       | 5     |   |          | 7       | 5     |
|                         |
---------------------------

In this diagram, memory address 3 contains a value of 7. The value of memory address 7 is 5. Therefore, memory address 3 will contain 5, and memory address 7 remains unchanged.

For the above case, the instruction would look like this:

😼😾 😻🙀🙀 ("46 388")

`randomb`

The randomb instruction sets a memory address to a random boolean value of 0 or 1. It takes a single argument:

MEMADDR - The memory address to store the random boolean value

For example, let's set a random boolean value to address 14 (0o16):

🙀😻 😹😾🙀🙀 ("83 1688")

`inputst`

The inputst instruction reads a line from standard in and stores its ASCII/Unicode value to a set of memory addresses starting at the specified memory address. Each ASCII/Unicode value is stored in subsequent memory addresses. Finally, a 0 value is stored in the next memory address to null-terminate the line. It takes a single argument:

MEMADDR - The starting memory address to store the input line

For example, let's store the input line to memory address 8 (0o10):

😺😼 😹😸🙀🙀 ("24 1088")

If the input line is "Hello" followed by a newline ("\n"). This is the result:

Memory address 8 = "H" (72)
Memory address 9 = "e" (101)
Memory address 10 = "l" (108)
Memory address 11 = "l" (108)
Memory address 12 = "o" (111)
Memory address 13 = "\n" (10)
Memory address 14 = "\0" (0)

where "\0" is called the NUL character, which is used in other languages like C to indicate the end of a string value.

`applop`

The applop instructions take two arguments:

MEMADDR1 - The first memory address
MEMADDR2 - The second memory address

It performs the following operation depending upon the mnemonic:

Mnemonic	Result
`applop+`	Value of `MEMADDR1` equals value of `MEMADDR1` plus value of `MEMADDR2`
`applop-`	Value of `MEMADDR1` equals value of `MEMADDR1` minus value of `MEMADDR2`
`applop*`	Value of `MEMADDR1` equals value of `MEMADDR1` times value of `MEMADDR2`
`applop/`	Value of `MEMADDR1` equals quotient of value of `MEMADDR1` divided by value of `MEMADDR2`

For example, let's set the value of memory address 9 (0o11) to 55 and the value of memory address 7 (0o7) is 10:

Mnemonic	Emojis	Byte Code	Result
`applop+`	😿🙀😸 😹😹🙀🙀 😿🙀🙀	`"780 1188 788"`	Memory address 9 value is `55 + 10 = 65`
`applop-`	😿🙀😸 😹😹🙀🙀 😿🙀🙀	`"782 1188 788"`	Memory address 9 value is `55 - 10 = 45`
`applop-`	😿🙀🙀 😹😹🙀🙀 😿🙀🙀	`"788 1188 788"`	Memory address 9 value is `55 * 10 = 550`
`applop-`	😿🙀😿 😹😹🙀🙀 😿🙀🙀	`"787 1188 788"`	Memory address 9 value is `floor(55 / 10) = 5`

`diepgrm`

The diepgrm instruction exits the program. It has no arguments. The program must contain at least one of these somewhere, or the program will go into an infinite loop.

Conclusion

Programming in Unicat is really challenging. It feels more like programming in machine language without the benefit of an assembler. Instead of translating instructions into hexadecimal, you have to translate them into cat emojis, some of which look very similar. If your editor has zoom capability, I highly recommend using that.

When you run your program, there is a good chance it will lock up. Here are some of the reasons why (other than the obvious logic error):

You did not translate the instructions into the right emojis.
You forgot to terminate a number with the right emojis.
You forgot to add a diepgrm instruction.
Your jump instructions are not going to the right place.

By the way, I got tired of using python 2 and ported Unicat to python 3. I also fixed a couple of bugs in the original python 2 implementation. The code is available on pypi, and the source code is available on GitHub. It has some limited debugging capability which uses the python debugger to do the heavy lifting.

In the meantime, happy coding 😸!

Articles

There are 4 articles: