Dailydave mailing list archives

An introduction to the ERESI language for program analysis

From: Julien Vanegue <jv274 () cl cam ac uk>
Date: Mon, 13 Aug 2007 12:39:27 +0100
Hi dailydavers,

It was suggested to present the ERESI language a bit, so here it is.

ERESI stands for ERESI Reverse Engineering Software Interface,
its web page stands at : www.eresi-project.org.

ERESI is a command-based domain-specific (scripting) language.
It provides a core  language providing the basic requirements for
a modern programming language (foreach, regex, hash tables, lists,
recursive functions, etc). This core language can be extended by
language fragments (read: set of additional commands). Each fragment
is specialized for a given task. This common language is not specific
to an architecture or operating system (even if we have been working
mostly on UNIX operating systems for INTEL and SPARC architectures).

The inspiration of ERESI are languages such as OCaML (for the
"match" command), LISP (for reflection), or Python (for the overall
syntax). ERESI is -not- object oriented, but features the concept of
"record subtyping" (e.g. inheritance of fields across structures).

Existing fragments of the ERESI languages are:

- ERESI+ : a fragment for type-based decompilation and reflection-based
analysis. Mainly composed of one command called "inform" which take
an address and the name of a type, and that register this address as being
the base of a structure whose type is the one given by the other parameter.
Basicaly, this feature allows structures of the analyzed program to be
directly made accessible naturally into the ERESI language interpreter,
just like if it was a variable declared in the ERESI language. Then you can
do : print $var.field.subfield etc directly in the ERESI language, without
needing any debug format.

- ERESI-PT: a fragment mainly composed of a command called "match"
that allow to "rewrite" programs. Programs may be rewritten from ASM
instruction to ASM instruction, but not only. Using the type system 
provided
in the ERESI core language (the same used by ERESI+), you can declare
new types that will constitute an intermediate forms suitable for your 
analysis
requirements. Then you can define a mapping from the assembler instructions
into the intermediate forms. Afterwards you can define a higher-level
intermediate representation that you can translate from the lower-level
transformation using the same "rewrite" command. For example, you might
declare a special type of expression "PotentiallyDangerousMemAccess()" that
will be translated from instructions/expressions that write in memory 
for which
the bound has to be checked. Then it suffices to walk the intermediate
representation and performs a special analysis when those types are
encountered. The idea is the one of staged analysis, where each pass is 
very
simple, but where the  complete analysis is made of multiple transformation
passes.

- ERESI-DF: Data-flow analysis: Using "def", "use", and "reachdef"
commands, the ERESI language brings the capability to compute various
data-flow analysis such as liveness, reaching definitions, or 
pointer-analysis.

- Control-flow analysis: The ERESI environment provides, as a base feature,
the construction of the control flow graph and the call graph.  Those 
graphs
can be accessed from the language. You can then program your own graph
structuring algorithm in ERESI. There is no special command for doing that,
only pre-defined data structures that can be used during program 
transformation,
or simply for walking on the program's graph representations.

The advantages of ERESI are:

- It has an easy syntax for complex analysis operations.

- It has its own type system that can handle pointers, (mutually recursive)
structures, arrays of unbounded dimensions, and -partial types-. Partial 
types
are useful in type-based decompilation, when program types are recovered 
step
by step. Thus its  possible to define structures "with holes" where some 
fields
are unknown, but for which the global structure type will be refined by 
further
analysis (thus eliminating partial types as the analysis go on).

- It makes program transformation (a known technique for a wide range of
different program analysis) very easy to specify. The syntax of ERESI is
simple, it can deal with 1 -> 1 and 1 -> N transformations (e.g. micro-asm
generation). The "hidden" concept behind program transformation in
ERESI is "record subtyping". It allows to tell whether or not two 
structures
are matching. Matching is -not- equality. Think about regular expressions
where the base object is a structure, and you'll have a good approximation
of the idea of "matching". It is also being investigated how to use the
program transformation system of ERESI to perform shape analysis
(approximating the shape of structures in a program. For instance : is
an object a tree, a directed acyclic graph, or a cycling graph ?).

- Its semantic is very well formally defined. An upcoming article focussed
on the ERESI language will make this public some time soon. As such,
implementors of ERESI language interpreter are let a lot of freedom. It
would be for instance possible to write an ERESI interpreter in python. The
idea is that program analyzers are created independently of the 
implementation
language, and can be reused across analysis framework, even if they
are implemented in a different programming language (provided they have
an ERESI interpreter). Our prototype ERESI interpreter is implemented
in the C language.

- Writing analyzers in ERESI is faster than in any other language, since 
its
commands are dedicated to analysis. ERESI is not a general purpose
programming language, but it remains Turing complete : you can write
analyzer  that never stops (can be useful for analyzers embedded in a
debugger, for instance : e2dbg ;) but also Immdbg, IDA-dbg, etc)

There is no example of ERESI programs in this email, as we keep
the primer for our upcoming articles. If you are really interested,
you can look at the .esh files in:

http://cvs.eresi-project.org/cvsweb.cgi/eresi/evarista/
http://cvs.eresi-project.org/cvsweb.cgi/eresi/testsuite/testscripts/

The bibliography related to ERESI (that you might need to
understand this email correctly) is located at :

http://www.cl.cam.ac.uk/~jv274/eresi-bib.html

I want to recall that ERESI is a perpetually work in progress
analysis environment that includes :

- ELFsh: static instrumentation of ELF binary programs.
- E2dbg : embedded debugging of ELF binary programs
- Etrace : embedded tracing of ELF binary programs
- Kernsh: instrumentation at the kernel level from userland (NEW)
- Evarista: static analysis of binary programs (NEW)

Articles about those components can be read from the ERESI website.

ERESI is a free-software project. If you wish to join us, you
can contact  the team on : team at eresi-project dot org

Julien Vanegue, for the ERESI team
_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave
Current thread:

An introduction to the ERESI language for program analysis Julien Vanegue (Aug 13)