Thursday, October 16, 2014

What's new: ELENA 1.9.17 - generic methods

In this post I will discuss generic methods introduced in 1.9.7 version.

Let's start with two basic things. Any message in the language consist of a verb (predefined action : get, set, insert, add, ...), a signature (user defined) and a number of parameters

For example in the following expression

aBinary insert &index:0 &literal:"0"

a message

insert&index&literal[2] 

is used. insert is a verb, index&literal is a signature (consisting of two subjects: index and literal) and 2 is a number of parameters.

Or in this message call

anS length

get&length[0] is used

And secondly the message can be created dynamically at run time combining a signature symbol and a generic message (a message without a signature)

anS ~ %length get.

These two principles are used in the generic method implementation. A generic method may responds to the message with the same verb and the parameter number but with any signature. Original signature is presaved and can be used inside the method.

Let's consider a simple example. Suppose we have the class containing coordinates

#class Point
{
   #field theX.
   #field theY.

   #constructor new &x:anX &y:anY
   [
       theX := anX.
       theY := anY.  
   ]

   #method x = theX.

   #method y = theY.
      
   #method set &x:anX
   [
       theX := anX.
   ]
      
   #method set &y:anY
   [
       theY := anY.
   ]

   #method clone = Point new &x:theX &y:theY.

   #method literal = "Point(x:" + theX literal + ", y:"
                          + theY literal + ")".
}

Let's create a variable which can contain a point

#class PointVariable
{
   #field thePoint.

   #constructor new &point:aPoint
   [
      thePoint := aPoint.
   ] 

   #method value = thePoint.
}

And let's define the operations with the point coordinates. We will use generic methods

#class PointVariable                                                                
{  
   ...

   #method(generic) append : aValue
   [
      thePoint~$subject set:(thePoint~$subject get + aValue).
   ]
       
   #method(generic) reduce : aValue
   [
      thePoint~$subject set:(thePoint~$subject get - aValue).
   ]
       
   #method(generic) multiplyBy : aValue
   [
      thePoint~$subject set:(thePoint~$subject get * aValue).
   ]
       
   #method(generic) divideInto : aValue
   [
      thePoint~$subject set:(thePoint~$subject get / aValue).
   ]
}

How it works? Let's consider the simple use case

   #var aVar := PointVariable new 
                   &point:(Point new &x:1 &y:1).

   aVar append &x:2.
   
   console writeLine:(aVar value literal).

When append&x[1] message is sent to an instance of PointVariable, a class dispatcher tries to resolve the message directly, if no match was found it will call the generic method - append, a built-in variable contains our original signature - %x.

thePoint~ %x get 

is similar to

thePoint x

and

thePoint ~%x set:aValue

similar to

thePoint set &x:aValue

Note that our example will work with a Point class containing arbitrary number of coordinates (one, two, three and so on)

This principle is used in system'dynamic'DynamicStruct class.

   #var r1 := system'dynamic'DynamicStruct new.
   #var r2 := system'dynamic'DynamicStruct new.

   r1 set &Price:20.5r set &Count:3.
   r2 set &Name:"John" set &LastName:"Smith".
   r1 set &Supplier:r2.

Monday, October 6, 2014

Lexical Structure

An ELENA module consists of one or more source files. A source file is an ordered sequence of Unicode characters (usually encoded with the UTF-8 encoding).

There are several sequences of input elements: white space, comments and tokens. The tokens are the identifiers, keywords, literals, operators and punctuators.

The raw input stream of Unicode characters is reduced by ELENA DFA into a sequence of <input elements>.

	<input> :
			{ <input element> }*
		
	<input element> :
			<white space>
			<comment>
			<token>
			
	<token> :
			<identifier>
			<full identifier>
			<local identifier>
			<keyword>
			<literal>
			<operator-or-punctuator>

Of these basic elements, only tokens are significant in the syntactic grammar of an ELENA program.

White space

ELENA White space are a space, a horizontal tab and line terminators. They are used to separate tokens.

	<white space> :
		SP (space)
		HT (horizontal tab)
		CR (return)
		LF (new line)

Comments

ELENA uses c++-style comments:

   /* block comment */

   // end-of-line comment

	<comment> :
		<block comment>
		<end-of-line comment>
		
	<block comment> :
		'/' '*' <block comment tail>
		
	<end-of-line comment> :	
		'/' '/' { <not line terminator> }*
		
	<block comment tail> :
		'*' <block comment star tail> 
                <not star> <block comment tail>
		
	<block comment star tail> :
		'/' 
                '*' <block comment star tail> 
                <neither star nor slash> <block comment tail>
		
	<not star> :
		any Unicode character except '*'
		
	<neither star nor slash> :
		any Unicode character except '*' and '/'

	<not line terminator> :
		any symbol except LR and CF

ELENA comments do not nest. Comments do not occur inside string literals

Identifiers

An identifier is a sequence of letters, underscore and digits starting with letter or underscore. An identifier length is restricted in the current compiler design (maximal 255 characters)

	<identifier> :
		<letter> { <letter or digit> }*
		
	<letter> :
		Unicode character except white space, 
                        punctuator or operator
		'_'
		
	<letter or digit> :
		<letter>
		Digit 0-9

ELENA identifiers are case sensitive.

Full identifiers

A full identifier is a sequence of identifiers separated with "'" characters. It consists of a namespace and a proper name. A full identifier length is restricted in the current compiler design (maximal 255 characters)

	<full identifier> :
		[ <name space> ]? "'" <identifier>		
		
	<name space> :
		<identifier> [ "'" { <identifier> } ]*

Local identifiers

A local identifier is a sequence of letters, underscore and digits starting with '$' character. A local identifier length is restricted in the current compiler design (maximal 255 characters)

	<local identifier> :
		'$' <identifier>

Keywords

A keyword is a sequence of letters starting with '#' character. Currently only following keywords are used though others reserved for future use: #class, #symbol, #static, #field, #method, #constructor, #var, #loop, #define, #type, #throw, #break. Keywords can be placed only in the beginning of the statement.

	<local identifier> :
		'#' { <letter> }+
	
	<letter> :
		Unicode characters

Literals

A literal is the source code representation of a value.

	<literal> :
		<integer>
		<float>
		<string>

Integer literals

An integer literal may be expressed in decimal (base 10) or hexadecimal(16).

	<integer> :
		<decimal integer>
		<hexadecimal integer>
		
	<decimal integer> :
		[ <sign> ] { <digit> }+

	<sign> :
		"+"
		"-"
		
	<digit> :
		digit 0-9
		
	<hexadecimal integer> :
		<digit> <digit or hexdigit>* 'h'
		
	<digit or hexdigit> :
		<digit>		
		one of following character - 
                       a b c d e f A B C D E F

Floating-point literals

A floating-point literal has the following parts: a whole-number part, a decimal point, and fractional part, an exponent. The exponent, if present, is indicated by the Unicide letter 'e' or 'E' followed by an optionally signed integer.

At least one digit, in either the whole number or the fraction part, and a decimal point or an exponent are required. All other parts are optional.

	<float> :
		{ <digit> }* '.' { <digit> }* [ <exponent> ] 'r'
		{ <digit> }+ <exponent> 'r'
		
	<digit> :
		digit 0-9

	<exponent> :
		<exponent sign> <integer>
		
	<exponent sign> :
		either 'E' or 'e'
		
	<integer> :
		<sign>? <digit>+
		
	<sign> :
		"+"
		"-"

Real literals are represented with 64-bit double-precision binary floating-point formats.

String literal

A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences.

	<string> : 
		'"' <string tail> '"'
		
	<string tail> :
		<string character> { <string tail> }*
		<escape sequence>  { <string tail> }*
		'%' '%' { <string tail> }*
		'"' '"' { <string tail> }*
		
	<string character> :
		any character except CR or LF or '"'

String literal escape sequences

The string literal escape sequences allow for the representation of some non-graphic character as well as the double quote and percent character.

	<escape sequence> :
		'%' <decimal escape>
		
	<decimal escape> :
		{ <digit> }+
		<alert>
		<backspace>
		<horizontal tab>
		<carriage return>
		<new line>
		
	<digit> :
		digit 0-9

	<alert> :
		'a'

	<backspace> :
		'b'

	<horizontal tab> :
		't'

	<carriage return> :
		'r'

	<new line> :
		'n'

Operators and punctuators

There are several kinds of operators and punctuators. Operators are short-cut form of messages taking one operand. Punctuators are for grouping and separating.

	<operator-or-punctuator> : one of
		'(', ')', '[', ']', '<', '>', '{', '}',
                '.', ',', '|', ':', '::', '=', '=>', 
		 '+', '-', '*', '/', '+=', '-=', '*=', '/=', 
                 '||', '&&', '^^', '<<', '>>', ':='

Sunday, June 1, 2014

The work in progress : Linux porting

In the current development cycle I started long-awaited porting to Linux.

After some adventures I managed to compile ELC (without a linker and VM client) with GCC 4.8 (I hope to use 4.9. but it is not a part of Ubuntu APT yet so I will wait) on Ubuntu 14.04 x32 platform (when I tried to use Ubuntu 64bit I've got a lot of stupid errors so I decided at least for the moment to use 32bit version).

I've learned hard that Linux is not Windows and have to rearrange my files locations. If for Windows version all files are located in the installed compiler folder in Linux I will distribute my files along the system:

API : \usr\share\elena\lib30
API Source codes : \usr\share\elena\src30
Config files : \etc\elena
Data files : \usr\share\elena

ELENA will be UTF16 system in Linux as well, so it created a lot of additional problems but I do hope I will overcome them.

My plan is followed :
1.9.16 : generate an executable files / compiled modules
1.9.17 : port console samples, ASM2BINX, ECV, SG, OG - first alpha version will be available
1.9.18 : port IDE (without debugger)
1.9.19 : support debugger (second alpha version)

Apart from Linux, I'm continuing the language development - I've added support for generic methods.

To be continued...

Saturday, May 31, 2014

Conditional branching

ELENA like Smalltalk does not support any special language constructs to implement the conditional branching. Instead special Boolean symbols (system’true and system’false) are used. All conditional operations should return these symbols as a result.

There are three branching methods :

then[1] , then&else[2], else[1] 

(m == 0) then:
[
   n + 1
]
&else: [
   m + n
].

Note that code in square brackets are in fact nested action classes ( an action class is a class supporting evaluate message). So this code is can be written in this form:

(m == 0) then: 
{
   eval
   [
      ^ n + 1.
   ]
}
&else: 
{
   eval
   [
      ^ m + n.
   ]
}.

This expression can be written using special operators as well

(m == 0) 
  ? [ n + 1 ]
  ! [ m + n ].

Note: the main difference between using explicit messages and conditional operators is that the compiler may optimize the resulting code in the later case.

We could omit true or else part

(m == 0) 
  ! [ m / n ].

Boolean symbols supports basic logical operations (AND, OR, XOR and NOT), so several conditions can be checked

(aChar >= 48) and:(aChar < 58)
? [
   theToken += aChar.
]
! [
   #throw Exception new:"Invalid expression".
]

Note that in this case both condition will be evaluated even if the first one is false. If we want to use short-circuit evaluation expression brackets should be used

(x >= 0)and:[ array@x != 0] ?
[
    ...
]

A switch statement can be implemented using => operator

^ aBulls =>
   -1 ? [ consoleEx writeLine:"Not a valid guess.". ^ true. ]
    4 ? [ consoleEx writeLine:"Congratulations! You have won!". ^ false. ]
    ! [
         theAttempt += 1.
                 
         consoleEx 
            writeLine:"Your Score is " : aBulls : " bulls and " : aCows : " cows".
                 
         ^ true.
     ].

Saturday, May 17, 2014

ELENA 2.0:Code blocks

ELENA code block consists of a sequence of statements. The block is enclosed in square brackets and may contain nested sub code blocks (which in fact are inline action classes). The statement terminator is a dot.

#method printAckermann &n:n &m:m
[
    control forrange &int:0 &int:n &do: (&int:i)
    [
        control forrange &int:0 &int:m &do: (&int:j)
        [
            ...
            
            console writeLine.
        ].
    ].
]

When a method should return a result (other than self) return statement is used. It should be the last statement in the block.

[
    ...

    ^ aRetVal / anArray length.
]

If the code block contains only return statement the simplified syntax can be used:

#method Number = convertor toReal:theValue.    

or there is an alternative block expression

[ convertor toReal:theValue ]

Note: it should not end with the terminator symbol

It is possible to declare the block variable and assigns the value to it. The variable name must be unique within the code block scope.

#var aRetVal := Integer new:0.

Tuesday, May 6, 2014

ELENA 2.0:Classes, Roles and Symbols

ELENA is an object-oriented language, so to create a program we have to declare new classes.

A class encapsulates data (fields) with code (methods) to access it. In most cases it is not possible to get a direct access to the class content (it makes sense for dynamic languages when in the most cases code is generic and can be applied for different "types"). Usually the fields refer to another classes and so on until we reach "primitive" ones which content are considered as raw data (e.g. numeric or literal values).

To work with the class we have to create its instance with the help of the special methods - constructors. A constructor is used mostly to initialize the class fields. There are special type of classes which do not have fields and constructors and can be used directly (roles).

Classes form the inheritance tree. There is the common super class - system'Object. ELENA does not support multiple inheritance, though it is possible to inherit the code using redirect handler (so called "horizontal inheritance"). When the parent is not provided the class inherits directly system'Object (the super class).

#class BaseClass
{
  #field theField1.
  #field theField2.
  
  #method field1 = theField1.

  #method field2 = theField.

}

#class DerivedClass : BaseClass
{
  #constructor new &field1:aField2 &field2:aField2
  [  
     theField1 := aField1.
     theField2 := aField2.
  ]

  #method add &field1:aField2 &field2:aField2
     = MyClass new &Field1:(theField1 + aField1) 
                   &Field2:(theField2 + aField2).
}

To create a class instance we have to send a message (usually new) to its symbol (a class symbol is declared implicitly for every class and can be used as a normal one)

#var anObject := DerivedClass new &field1:1 
                   &field2:1. // DerivedClass is a symbol

Roles cannot have constructors and their symbols can be used directly

#class(role)ClassHelper
{
   #method sumOf:anObject1:anObject2
      = anObject1 add &field1::anObject2 &field2::anObject1.
}

...

#var aSum := ClassHelper sumOf:anObject1:anObject2.

In general the symbol is a named expression and can be used to declare initialized objects, constants, reusable expressions and so on.

#symbol ZeroClass = DerivedClass new &field:0 &field:0.

A static symbol is the class instance which state is preserved. There could be only one instance of static symbol.

#static SingletonClass = DerivedClass new &field:0 &field:0.