Adaptive allocators

Computer Engineering - Sapienza University of Rome

Student: Emilio Coppa - Supervisor: Camil Demetrescu

Experimental comparison of allocators BSA (Bad segregated allocator), developed by Emilio Coppa during the Algorithm Engineering course held in 2009-2010 at the School of Engineering of the Sapienza University of Rome, and TLSF (Two level segregated fit), a general-purpose dynamic memory allocator specifically designed to meet real-time requirements designed by Real Time Research Group of the University of Valencia. The main goal of this work was to study some adaptive allocation techniques to improve BSA(++)'s capabilities based on the analysis of typical allocation traces of applications.

Allocator descriptions:

BSA
TLSF (used version 2.4.6)
BSA++

Enviroment

Benchmarks have been produced with the following working environment: a MacBook 5.1 (Intel Core 2 Duo [email protected], 2GB DDR3, NVIDIA GeForce 9400M). The software configuration used is the following:

Gentoo Linux i686 (32bit)
GlibC 2.10.1 (ebuild gentoo revision 1)
Kernel Linux 2.6.33 (gentoo sources)
Gcc 4.4.3 (ebuild gentoo rivision 2)
Coreutils 8.4

Benchmarks

Our benchmarks files were created based on the file format of the MallocLab framework developed by R. Bryant and D. O'Hallaron. This platform provides a simple tool to measure performance of an allocator in time and space (only single-process load).

We collected our traces using a combination of different techniques:

Development of a tool based on PIN (a dynamic binary instrumentation software by Intel).
- Pros: Portability (all Linux platforms supported by PIN)
- Cons: Unreliable due to the optimizations -O2 introduced by GCC (and for example absolutely needed by GlibC).
Modified GlibC (added instruction tracing).
- Pros: Conceptually simple.
- Cons: Very complicated code, lack of compatibility with other operating systems, difficult to maintain/compile.
Library interpositioning at runtime based on the dlsym() call.
- Pros: Simple code, easy to compile, good portability
- Cons: Loop between dlsym() and calloc() (not always fixable)

In addition, we created tools to convert the collected traces in the format required by MallocLab and their analysis. This software has been published in the package MallocLabTools.

Results

The benchmarks reproduced were inspired by experiments carried out by the authors of TLSF in order to test their allocator. We ran MallocLab with the following program traces:

CFRAC: free implementation of continued fraction factorization method
Espresso: logic minimizer program
GhostScript: viewer PDF, PostScript, etc
Perl: a high-level, general-purpose, interpreted, dynamic programming language
ls: unix comand-line program
Inkscape: a vector graphics editor application
Deluge: a BitTorrent client created using Python and GTK+
VLC: a free and open source media player
GIMP: a free software raster graphics editor
Audacity: a free software, cross-platform digital audio editor and recording application

In conclusion, these are the average behaviour of tested allocators for all programs:

We summarize below the lessons we learned from our experiments:

TLSF is faster than BSA, but sometimes this boost is paid in terms of space; probably we can explain this with the good-fit policy of TLSF (BSA uses a first-fit policy);
BSA-- is very fast, but exhibits very bad space usage;
BSA++ without page allocation is faster and more efficient in space (than BSA++ with page allocation). However, in real systems (not MallocLab) frequent calls to sbrk() may take longer than in MallocLab;
BSA++ VAR AS with a fair threshold never impacts in worst-case space scenarios (compare for example BSA++'s bad space usage and BSA++ VAR with Espresso) without losing benefits of optimations (with BSA++ based on popularity index criteria or average popular index criteria, in order to avoid these scenarios, we have to choose a very high (too restrictive) threshold);
BSA++ VAR do better in time (almost 1.5x-2x speedup) and in space (0-1% of util space) than BSA or TLSF;
Wrong threshold's calibration can exhibit very negative results for BSA++; so we can say that the pattern's informations (extracted during trace's analyse) are effectively useful;