Let me share a story of my experience participating in a demo contest (programs that generate audio-visual series, the main feature of which is an extremely small size – dozens of kibibytes or even less) organized by Elinext.

Disclosure: This is a Sponsored Article

In the process of general discussion, someone suggested a unique, for this demo world, idea: write a program in some scripting language. The fact is that all demos are compressed by the packer to reduce the size (and unpacked when executed). And the text is compressed much better than binary code. If the interpreter is very small, it can give a significant advantage.

Because of my previous front-end experience, I immediately came up with the idea to minify the code – remove spaces and optional elements, to shorten the length of identifiers. After all, compression preserves all information, and many elements of the syntax are not necessary.

But even so, most of the existing languages are not designed for this kind of optimization – obviously, they have many elements that are necessary for human understanding, not for the machine. So what if we develop a language specifically designed for minification?

I decided not to participate in that contest, but I kept thinking about the idea. After all, it can find use in more practical purposes than the demo – in the front-end world, the volume of client scripts is still extremely important and if it can be reduced, this solution may prove justified in at least some cases.

This is how I started working on a language prototype.

Key features

Key features of my language are the following: dynamic typing, functional paradigm, no delimiters and class identifiers.

I chose dynamic typing because static requires type description that uses our valuable space.

Functional paradigm was chosen because it is more expressive (allows a smaller amount of code to express more algorithms).

I will dwell on the last two in more detail.

Separators absence

Let us look at the following code:

add(2, multiple(2, 2))

The number of arguments that each function expects (function arity) is known in advance. Brackets and commas are needed only for readability. Let us remove them:

add 2 multiple 2 2

We will do the same with statements, simply considering them as functions:

+ 2 * 2 2

In this case, statements do not need to have priority – the order of operations is set by the order of the transcription since for function command, all of its parameters must be calculated first. So the equation above will return 6. To get an 8, the code should be written like this:

* 2 + 2 2

Identifier types

Since my statements are normal functions, I decided to expand it and allowed any punctuation characters to be used in the identifiers.

And then I decided to divide all the identifiers into two types: alphabetic and punctuation.

The fact is that in any language, identifiers need to be separated by something – either by other elements of the syntax or by whitespace characters. Otherwise, there will be an ambiguity: xy is the identifier of xy or the two identifiers x and y?

However, having two disjoint classes of identifiers, I can weaken this requirement: identifiers of different classes can be written together and there will be no ambiguity.

And since my first type is alphabetic only – it does not include figures. This allows us to write numbers together with any type of identifiers.

Take this expression for example:

foo($(5))

In my language it can be written as follows:

foo$5

Solving the problem

A number of interesting problems arose during the development of the language, which I did not expect, but was able to resolve: function commands without parameters and building an AST code structure.

Function commands without parameters

Since I do not have a special syntax for function command, as in other languages, functions without parameters should be opened simply by specifying their name:

fn answer()
  42
;

answer

So it worked, but only on one level of nesting. What if this function returns another one?

fn answer()
  fn()
    42
  ;
;

answer

Then the result will be a closure, not an internal value. And how to cause this closure? Its name was already indicated after all.

I had to use the approach from the Clojure language – trampoline: any values after their calculation falls into a special cycle, which cyclically calls closures that do not require parameters until the result is something else. Thus, the result of the second example above will also be 42, as in the first.

AST code structure building

In order to implement the minification, you must be able to determine the structure of the code without executing it.

This is not difficult when we know the number of all the function parameters. Take a look at this code:

add 2 multiple 2 2

It has the following structure:

However, as soon as I start returning closures, ambiguity appears:

fn add(x y)
  + x y
;

fn increase(x)
  + x 1
;

fn foo(n)
  if == n 2
    add
    increase
;

foo x …

How many parameters should be passed to the foo function command result? This can only be determined during the execution of the code, but not at the stage of its parsing. And this makes it impossible to implement the minification.

In order to solve this problem, I extended the typification to a semi-static one: types now only need to be specified in functions, while the role of the type is indicated only by the required number of arguments for the function itself, and for its result, if it is a closure.

fn make_adder(bias):2
  fn(x y)
    + bias + x y
  ;
;

make_adder 1 2 2

The definition of the make_adder function explicitly states that its result is a closure and expects two parameters. Therefore, it is easy to construct the structure of the last call:

Common features

The language has the following features: nil, floating-point numbers, lists, hash tables and closures. The rows are implemented based on the lists. Logical values are absent – certain values of other types are considered false, while all the others are true.

The language has a set of built-in functions: basic mathematical functions and operations (including bit arithmetic), functions for working with lists and hash tables, basic I / O.

The language supports modularity by dynamically loading source files. It supports caching, the vendor directory, and path search specified through a special environment variable.

The language also has a small standard library containing a functional style module for working with lists and a unit testing module.

Benchmark

To rate the possibilities of minification I decided to compare my language with JavaScript. For this, I wrote the same program in both languages.

As a task, I chose the algorithm for comparing Virtual DOM trees, based on these articles:

1. How to write your own Virtual DOM.
2. Write your Virtual DOM 2: Props & Events.

However in these materials, during the comparison, the real DOM changes immediately, where I simply generated a list of required changes with the address of the node to which they refer.

JavaScript version:

function make_node(type, properties, ...children) {
  return {
    'type': type,
    'properties': properties || {},
    'children': children,
  }
}

function make_difference(path, action, ...parameters) {
  return {
    'path': path,
    'action': action,
    'parameters': parameters,
  }
}

function is_different(node_1, node_2) {
  return typeof node_1 !== typeof node_2
    || (typeof node_1 === 'object'
      ? node_1['type'] !== node_2['type']
      : node_1 !== node_2)
}

function compare_property(path, name, old_value, new_value) {
  let difference
  if (!new_value) {
    difference = make_difference(path, 'remove_property', name)
  } else if (!old_value || new_value !== old_value) {
    difference = make_difference(path, 'set_property', name, new_value)
  }

  return difference
}

function compare_properties(path, old_properties, new_properties) {
  const properties = Object.assign({}, old_properties, new_properties)
  return Object.keys(properties)
    .map(name => compare_property(path, name, old_properties[name], new_properties[name]))
    .filter(difference => difference)
}

function compare_nodes(old_node, new_node, index=0, path=[]) {
  let differences = []
  if (!old_node) {
    differences.push(make_difference(path, 'create', new_node))
  } else if (!new_node) {
    differences.push(make_difference(path, 'remove', index))
  } else if (is_different(old_node, new_node)) {
    differences.push(make_difference(path, 'replace', index, new_node))
  } else if (old_node['type']) {
    const child_path = [...path, old_node['type']]
    const properties_differences = compare_properties(child_path, old_node['properties'], new_node['properties'])
    differences.push(...properties_differences)

    const maximal_children_length = Math.max(old_node['children'].length, new_node['children'].length)
    for (let i = 0; i < maximal_children_length; i++) {
      const child_differences = compare_nodes(old_node['children'][i], new_node['children'][i], i, child_path)
      differences.push(...child_differences)
    }
  }

  return differences
}

module['exports'] = {
  'make_node': make_node,
  'compare_nodes': compare_nodes,
}
My version:

let map:2 load "std/list/map";
let filter:2 load "std/list/filter";
let zip_longest:3 load "std/list/zip_longest";
let reduce:3 load "std/list/reduce";

fn make_node(kind properties children)
    #"type" kind
    #"properties" properties
    #"children" children
{};

fn make_difference(path action parameters)
    #"path" path
    #"action" action
    #"parameters" parameters
{};

fn is_different(node_i node_ii)
    || != type node_i type node_ii fn()
        if == "hash" type node_i
            fn() != ."type"node_i ."type"node_ii;
            fn() != node_i node_ii;
    ;
;

fn compare_property(path name old_value new_value)
    if !new_value
        fn() make_difference path "remove_property" ,name[];
        fn()
            if || !old_value != new_value old_value
                fn() make_difference path "set_property" ,name,new_value[];
                fn() nil;
        ;
;

fn compare_properties(path old_properties new_properties)
    let properties + old_properties new_properties;
    let differences map keys properties fn(name)
        compare_property path name .name old_properties .name new_properties
    ;;
    filter differences fn(difference)
        difference
    ;
;

fn compare_nodes(old_node new_node index path)
    if !old_node
        fn() , make_difference path "create" ,new_node[] [];
        fn()
            if !new_node
                fn() , make_difference path "remove" ,index[] [];
                fn()
                    if is_different old_node new_node
                        fn() , make_difference path "replace" ,index,new_node[] [];
                        fn()
                            if == "hash" type old_node
                                fn()
                                    let child_path + path , ."type"old_node [];
                                    let properties_differences
                                        compare_properties
                                            child_path
                                            ."properties"old_node
                                            ."properties"new_node
                                    ;
                                    let children_pairs
                                        zip_longest
                                            ."children"old_node
                                            ."children"new_node
                                            fn(node_i node_ii)
                                                ,node_i,node_ii[]
                                            ;
                                    ;
                                    let children_differences
                                        let result
                                            reduce {} children_pairs fn(result children_pair)
                                                let index ?? ."index"result 0;
                                                let differences
                                                    compare_nodes
                                                        .0 children_pair
                                                        .1 children_pair
                                                        index
                                                        child_path
                                                ;

                                                #"differences" + ?? ."differences"result [] differences
                                                #"index" ++ index
                                            {};
                                        ;

                                        ?? ."differences"result []
                                    ;

                                    + properties_differences children_differences
                                ;
                                fn() [];
                        ;
                ;
        ;
;

#"make_node" fn(kind properties children)
    make_node kind properties children
;
#"compare_nodes" fn(old_node new_node index path)
    compare_nodes old_node new_node index path
;
{}

I minified the JavaScript version using Google Closure Compiler (JavaScript-version), and my version language – manually.

Results were following:

Parameter JavaScript My language
Full version size 2398 B 2827 B
Minification version size 794 B 872 B
Size saving 66.89% 69.16%

 

 

Outcome

In order for my idea to succeed, I had to drastically surpass JavaScript’s compression capability, since you also need space for the interpreter itself. And the final result was miles away from it.

Thus, the experiment ended in failure. My general idea for the language did not return the expected benefit.

Software repository

The source code of the interpreter (implemented in Python), standard library and examples, as well as documentation are all available in the software repository under the MIT license.

LEAVE A REPLY

Please enter your comment!
Please enter your name here