
Dailydave mailing list archives
An introduction to the ERESI language for program analysis
From: Julien Vanegue <jv274 () cl cam ac uk>
Date: Mon, 13 Aug 2007 12:39:27 +0100
Hi dailydavers, It was suggested to present the ERESI language a bit, so here it is. ERESI stands for ERESI Reverse Engineering Software Interface, its web page stands at : www.eresi-project.org. ERESI is a command-based domain-specific (scripting) language. It provides a core language providing the basic requirements for a modern programming language (foreach, regex, hash tables, lists, recursive functions, etc). This core language can be extended by language fragments (read: set of additional commands). Each fragment is specialized for a given task. This common language is not specific to an architecture or operating system (even if we have been working mostly on UNIX operating systems for INTEL and SPARC architectures). The inspiration of ERESI are languages such as OCaML (for the "match" command), LISP (for reflection), or Python (for the overall syntax). ERESI is -not- object oriented, but features the concept of "record subtyping" (e.g. inheritance of fields across structures). Existing fragments of the ERESI languages are: - ERESI+ : a fragment for type-based decompilation and reflection-based analysis. Mainly composed of one command called "inform" which take an address and the name of a type, and that register this address as being the base of a structure whose type is the one given by the other parameter. Basicaly, this feature allows structures of the analyzed program to be directly made accessible naturally into the ERESI language interpreter, just like if it was a variable declared in the ERESI language. Then you can do : print $var.field.subfield etc directly in the ERESI language, without needing any debug format. - ERESI-PT: a fragment mainly composed of a command called "match" that allow to "rewrite" programs. Programs may be rewritten from ASM instruction to ASM instruction, but not only. Using the type system provided in the ERESI core language (the same used by ERESI+), you can declare new types that will constitute an intermediate forms suitable for your analysis requirements. Then you can define a mapping from the assembler instructions into the intermediate forms. Afterwards you can define a higher-level intermediate representation that you can translate from the lower-level transformation using the same "rewrite" command. For example, you might declare a special type of expression "PotentiallyDangerousMemAccess()" that will be translated from instructions/expressions that write in memory for which the bound has to be checked. Then it suffices to walk the intermediate representation and performs a special analysis when those types are encountered. The idea is the one of staged analysis, where each pass is very simple, but where the complete analysis is made of multiple transformation passes. - ERESI-DF: Data-flow analysis: Using "def", "use", and "reachdef" commands, the ERESI language brings the capability to compute various data-flow analysis such as liveness, reaching definitions, or pointer-analysis. - Control-flow analysis: The ERESI environment provides, as a base feature, the construction of the control flow graph and the call graph. Those graphs can be accessed from the language. You can then program your own graph structuring algorithm in ERESI. There is no special command for doing that, only pre-defined data structures that can be used during program transformation, or simply for walking on the program's graph representations. The advantages of ERESI are: - It has an easy syntax for complex analysis operations. - It has its own type system that can handle pointers, (mutually recursive) structures, arrays of unbounded dimensions, and -partial types-. Partial types are useful in type-based decompilation, when program types are recovered step by step. Thus its possible to define structures "with holes" where some fields are unknown, but for which the global structure type will be refined by further analysis (thus eliminating partial types as the analysis go on). - It makes program transformation (a known technique for a wide range of different program analysis) very easy to specify. The syntax of ERESI is simple, it can deal with 1 -> 1 and 1 -> N transformations (e.g. micro-asm generation). The "hidden" concept behind program transformation in ERESI is "record subtyping". It allows to tell whether or not two structures are matching. Matching is -not- equality. Think about regular expressions where the base object is a structure, and you'll have a good approximation of the idea of "matching". It is also being investigated how to use the program transformation system of ERESI to perform shape analysis (approximating the shape of structures in a program. For instance : is an object a tree, a directed acyclic graph, or a cycling graph ?). - Its semantic is very well formally defined. An upcoming article focussed on the ERESI language will make this public some time soon. As such, implementors of ERESI language interpreter are let a lot of freedom. It would be for instance possible to write an ERESI interpreter in python. The idea is that program analyzers are created independently of the implementation language, and can be reused across analysis framework, even if they are implemented in a different programming language (provided they have an ERESI interpreter). Our prototype ERESI interpreter is implemented in the C language. - Writing analyzers in ERESI is faster than in any other language, since its commands are dedicated to analysis. ERESI is not a general purpose programming language, but it remains Turing complete : you can write analyzer that never stops (can be useful for analyzers embedded in a debugger, for instance : e2dbg ;) but also Immdbg, IDA-dbg, etc) There is no example of ERESI programs in this email, as we keep the primer for our upcoming articles. If you are really interested, you can look at the .esh files in: http://cvs.eresi-project.org/cvsweb.cgi/eresi/evarista/ http://cvs.eresi-project.org/cvsweb.cgi/eresi/testsuite/testscripts/ The bibliography related to ERESI (that you might need to understand this email correctly) is located at : http://www.cl.cam.ac.uk/~jv274/eresi-bib.html I want to recall that ERESI is a perpetually work in progress analysis environment that includes : - ELFsh: static instrumentation of ELF binary programs. - E2dbg : embedded debugging of ELF binary programs - Etrace : embedded tracing of ELF binary programs - Kernsh: instrumentation at the kernel level from userland (NEW) - Evarista: static analysis of binary programs (NEW) Articles about those components can be read from the ERESI website. ERESI is a free-software project. If you wish to join us, you can contact the team on : team at eresi-project dot org Julien Vanegue, for the ERESI team _______________________________________________ Dailydave mailing list Dailydave () lists immunitysec com http://lists.immunitysec.com/mailman/listinfo/dailydave
Current thread:
- An introduction to the ERESI language for program analysis Julien Vanegue (Aug 13)