Adaptive allocators
Computer Engineering - Sapienza University of Rome
Student: Emilio Coppa - Supervisor: Camil Demetrescu
Experimental comparison of allocators BSA (Bad
segregated allocator), developed by Emilio Coppa during the
Algorithm Engineering course held in 2009-2010 at the
School
of Engineering of the Sapienza University of Rome,
and TLSF (Two level segregated fit), a
general-purpose dynamic memory allocator specifically
designed to meet real-time requirements designed by
Real Time Research Group
of the University of Valencia.
The main goal of this work was to study some adaptive allocation
techniques to improve BSA(++)'s capabilities based on the
analysis of typical allocation traces of applications.
Allocator descriptions:
Enviroment
Benchmarks have been produced with the following working environment:
a MacBook 5.1 (Intel Core 2 Duo [email protected], 2GB DDR3, NVIDIA GeForce
9400M). The software configuration used is the following:
- Gentoo Linux i686 (32bit)
- GlibC 2.10.1 (ebuild gentoo revision 1)
- Kernel Linux 2.6.33 (gentoo sources)
- Gcc 4.4.3 (ebuild gentoo rivision 2)
- Coreutils 8.4
Benchmarks
Our benchmarks files were created based on the file format of
the MallocLab framework developed by
R. Bryant and D. O'Hallaron.
This platform provides a simple tool to measure performance of
an allocator in time and space (only single-process load).
We collected our traces using a combination of different techniques:
-
Development of a tool based on PIN (a dynamic binary
instrumentation software by Intel).
-
Pros: Portability (all Linux platforms
supported by PIN)
-
Cons: Unreliable due to the optimizations
-O2 introduced by GCC (and for example
absolutely needed by GlibC).
-
Modified GlibC (added instruction tracing).
- Pros: Conceptually simple.
-
Cons: Very complicated code,
lack of compatibility with other
operating systems, difficult to
maintain/compile.
-
Library interpositioning at runtime based on the
dlsym() call.
- Pros: Simple code, easy to compile, good portability
-
Cons: Loop between dlsym() and
calloc() (not always fixable)
In addition, we created tools to convert the collected traces
in the format required by MallocLab and their analysis.
This software has been published in the package
MallocLabTools.
Results
The benchmarks reproduced were inspired by experiments
carried out by the authors of TLSF in order to test their
allocator. We ran MallocLab with the following program
traces:
- CFRAC: free implementation of continued fraction factorization method
- Espresso: logic minimizer program
- GhostScript: viewer PDF, PostScript, etc
- Perl: a high-level, general-purpose, interpreted, dynamic programming language
- ls: unix comand-line program
- Inkscape: a vector graphics editor application
- Deluge: a BitTorrent client created using Python and GTK+
- VLC: a free and open source media player
- GIMP: a free software raster graphics editor
- Audacity: a free software, cross-platform digital audio editor and recording application
In conclusion, these are the average behaviour of tested
allocators for all programs:
We summarize below the lessons we learned from our experiments:
-
TLSF is faster than BSA, but sometimes this boost
is paid in terms of space;
probably we can explain this with the good-fit
policy of TLSF (BSA uses a first-fit policy);
- BSA-- is very fast, but exhibits very bad space usage;
-
BSA++ without page allocation is faster and more efficient in space
(than BSA++ with page allocation). However, in real systems
(not MallocLab) frequent calls to sbrk() may take longer than in MallocLab;
-
BSA++ VAR AS with a fair threshold
never impacts in
worst-case space scenarios (compare for example
BSA++'s bad space usage
and
BSA++ VAR with Espresso)
without losing benefits of optimations (with BSA++
based on popularity index criteria or
average popular index criteria, in order to
avoid these scenarios, we have to choose a
very high (too restrictive) threshold);
-
BSA++ VAR do better in time (almost 1.5x-2x speedup)
and in space (0-1% of util space) than BSA or TLSF;
-
Wrong threshold's calibration can exhibit very
negative results for BSA++; so we can say
that the pattern's informations (extracted
during trace's analyse) are effectively useful;
References