Ma Photo - à faire!
Gaël Thomas Professor at Telecom SudParis Samovar Laboratory, HP2 group of the computer science department
Département d'informatique, Bureau D306
Telecom SudParis
9, rue Charles Fourier, F-91011 Evry, France
Born on 1976/10/20, Paris
French Citizen, 2 children
Phone: +33 6 10 39 31 17
Email: gael.thomas At telecom-sudparis.eu
Full CV Short CV

Since 2014, I'm full professor at Telecom SudParis. My research focuses on virtualization, operating systems, concurrency and language runtimes. I am particularly interested in improving the performance, the design and the safety of the systems. I'm a member of the Samovar laboratory and of the HP2 team of the computer science department, which investigates high-performance computing and systems. After having been the chair of the french chapter of the ACM SIGOPS from 2011 to 2014, I acted as treasurer from 2014 to 2016. I received the PhD degree from UPMC Sorbonne Université in 2005 and my "Habilitation à diriger les recherche", also from UPMC Sorbonne Université, in 2012. From 2006 to 2014, I was associate professor at UPMC Sorbonne Université, in the LIP6 laboratory, and in 2005, I performed postdoctoral research at the Université de Grenoble “Jospeh Fourier”.

Education and experience

2014 – today Professor at Telecom SudParis
2006 – 2014 Associate Professor at UPMC Sorbonne Université, Regal Team – INRIA/LIP6
Habilitation à dirigée les recherches (HDR) in 2012
Tenured of the scientific award of UPMC between 2010 and 2014
2005 – 2006 PostDoc at Université Joseph Fourrier (Grenoble/France). Adele Team – LSR (today LIG)
2001 – 2005 Ph.D. Thesis under direction of Prof. B. folliot, Université Pierre et Marie Curie (UPMC). SRC Team – LIP6
2004 – 2005: Assistant Professor – UPMC
2001 – 2004: Teaching Assistant – UPMC
1999 – 2001 Masters degree in Computer Science at UPMC
Magistère d’Informatique Appliquée d’Ile de France (MIAIF)
Maîtrise d'Informatique/DEA Système Informatiques Répartis
1997 – 1999 Bachelors/First year of masters degree (M1) in Math. Sciences at UPMC
1994 – 1997 Bachelors degree in Physical Sciences at UPMC

Research

Selected publications

Publications: All DBLP Google Scholar
  • Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler. Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pages 14.  2014. . [Abstract] [BibTeX] [.pdf]
    Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
    @inproceedings{oopsla:14:david:free-lunch,
      author = {David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler},
      booktitle = {Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14},
      publisher = {ACM},
      year = {2014},
      pages = {14}
    }
  • A study of the scalability of stop-the-world garbage collectors on multicores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 229-240.  2013. . [Abstract] [BibTeX] [.pdf]
    Large-scale multicore architectures are problematic for garbage collection (GC). In particular, throughput-oriented stop-the-world algorithms demonstrate excellent performance with a small number of cores, but have been shown to degrade badly beyond approximately 20 cores on OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48 cores.
    @inproceedings{asplos:13:gidra:naps,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {A study of the scalability of stop-the-world garbage collectors on multicores},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13},
      publisher = {ACM},
      year = {2013},
      pages = {229--240}
    }
  • NumaGiC: a garbage collector for big data on big NUMA machines. Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro and Nhan Nguyen. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 14.  2015. . [Abstract] [BibTeX] [.pdf]
    On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
    @inproceedings{asplos:15:gidra:numagic,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc and Nguyen, Nhan},
      title = {NumaGiC: a garbage collector for big data on big NUMA machines},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15},
      publisher = {ACM},
      year = {2015},
      pages = {14}
    }
  • Fast and Portable Locking for Multicore Architectures. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. ACM Transactions on Computer Systems (TOCS). Vol. 33(4), pages 13:1-13:62.  2016. . [Abstract] [BibTeX] [.pdf]
    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks.

    Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.
    @article{tocs:16:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Fast and Portable Locking for Multicore Architectures},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2016},
      volume = {33},
      number = {4},
      pages = {13:1--13:62}
    }
  • Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pages 65-76.  2012. . [Abstract] [BibTeX] [.pdf]
    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache.

    We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.
    @inproceedings{usenix-atc:12:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications},
      booktitle = {Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12},
      publisher = {USENIX Association},
      year = {2012},
      pages = {65--76}
    }
  • Faults in Linux: ten years later. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Julia Lawall and Gilles Muller. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, pages 305-318.  2011. . [Abstract] [BibTeX] [.pdf]
    In 2001, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired a number of development and research efforts on improving the reliability of driver code. Today Linux is used in a much wider range of environments, provides a much wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? Are drivers still a major problem?

    To answer these questions, we have transported the experiments of Chou et al. to Linux versions 2.6.0 to 2.6.33, released between late 2003 and early 2010. We find that Linux has more than doubled in size during this period, but that the number of faults per line of code has been decreasing. And, even though drivers still accounts for a large part of the kernel code and contains the most faults, its fault rate is now below that of other directories, such as arch (HAL) and fs (file systems). These results can guide further development and research efforts. To enable others to continually update these results as Linux evolves, we define our experimental protocol and make our checkers and results available in a public archive.
    @inproceedings{asplos:11:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Lawall, Julia and Muller, Gilles},
      title = {Faults in Linux: ten years later},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {305--318}
    }
  • Faults in Linux 2.6. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller and Julia Lawall. ACM Transactions on Computer Systems (TOCS). Vol. 32(2), pages 4:1-4:40.  2014. . [Abstract] [BibTeX] [.pdf]
    In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality?

    To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.
    @article{tocs:14:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Muller, Gilles and Lawall, Julia},
      title = {Faults in Linux 2.6},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2014},
      volume = {32},
      number = {2},
      pages = {4:1--4:40}
    }
  • An interface to implement NUMA policies in the Xen hypervisor. Gauthier Voron, Gaël Thomas, Vivien Quéma and Pierre Sens. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17, pages 14.  2017. . [Abstract] [BibTeX] [.pdf]
    While virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2.
    @inproceedings{eurosys:17:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Quéma, Vivien and Sens, Pierre},
      title = {An interface to implement NUMA policies in the Xen hypervisor},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17},
      publisher = {ACM},
      year = {2017},
      pages = {14}
    }

Habilitation à Diriger les Recherches and PhD thesis

HDR thesis in 2012, Improving the design and the performance of managed runtime environments , with the following committee:

PhD Thesis in 2005 under the direction of Professor Bertil Folliot, Application actives : Construction dynamique d'environnements flexibles homogènes , with the following committee:

Student Supervisions

If you are interested in a PhD, send me a note. You will need to demonstrate real scientific curiosity, and interest in research topics such as concurrent programming, virtualization, operating systems, language runtimes, etc.

Ongoing ()

  • Remi Dulong, since Oct. 2018, co-advised with Pascal Felber at 50%. Topic: Leveraging a NVRAM to analyze large data sets.
  • Anatole Lefort, since Oct. 2018, co-advised with Pierre Sutra at 50%. (since Oct. 2018). Topic: Persistent data types for a NVRAM.
  • Anton Daumen, Since Oct. 2018, co-advised with Patrick Carribault and François Trahay at 20%. Topic: Performance analysis of HPC applications.
  • Subashiny Tanigassalame, since Sept. 2018. Topic: A language to simplify the development of privacy-preserving applications.
  • Alexis Lescouet, since Sept. 2017. Topic: Systems for NUMA architectures.

Defended ()

  • Gauthier Voron, co-advised with Pierre Sens at 70% (2014-2018). Virtualisation efficace d'architectures NUMA. Currently postdoc at University of Sydney (Australia). [Abstract]

    The virtualization technology and the NUMA architecture both evolved independently to tackle different issues: reduce hardware usage cost for the first, produce more powerful hardware for the second. Nonetheless, nowadays, the hardware used in the cloud data centers uses NUMA architectures and thus, the virtual machines are executed atop such hardware. The virtualization software has, however, not been designed for NUMA architectures. Because of this poor integration, the applications executed inside a virtual machine running atop of a NUMA architecture may have low performance. As the combined use of NUMA architectures and virtualization is relatively recent, because of the cloud computing emergence, only a few works address this performance issue.

    My PhD thesis addresses the challenge of efficiently virtualizing a NUMA architecture in a cloud infrastructure. In detail, my research is twofold. On the first side, my research has the goal of measuring how virtualization behaves on a NUMA architecture, and how and why a NUMA architecture changes the performance of virtualized applications.

  • Mohamed Said Mosli Bouksiaa, co-advised with François Trahay at 30% (2014-2018). Performance variation considered helpful. Currently engineer at Applidium (France). [Abstract]

    Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same hardware resource or the same synchronization primitive, which slows down their execution. Unfortunately, current profiling tools reports the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck.

    In this PhD these, I propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. The metric relies on differential execution, but instead of comparing previously identified inefficient runs with efficient ones, I consider performance variation as a universal indicator of interference problems. With an evaluation of 27 applications I show that the metric can identify interference problems caused by 6 different kinds of interactions in 9 applications.

  • Lokesh Gidra, co-advised with Marc Shapiro at 70% and Julien Sopena (2012-2015). Garbage Collector for memory intensive applications on NUMA architectures. Currently engineer at Google (US). [Abstract]

    Large-scale multicore architectures create new challenges for garbage collectors (GCs). On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. In this thesis, we address this problem with NumaGiC, a GC with a mostly-distributed design.

    In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism and increase memory access imbalance, by allowing threads to steal from other nodes when they are idle. NumaGiC strives to find a perfect balance between local access, memory access balance, and parallelism.

    In this work, we compare NumaGiC with Parallel Scavenge and some of its incrementally improved variants on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves over- all performance by up to 94% over Parallel Scavenge, and increases the performance of the collector itself by up to 5.4× over Parallel Scavenge. In terms of scalability of GC throughput with increasing number of NUMA nodes, NumaGiC scales substantially better than Parallel Scavenge for all the applications. In fact in case of SPECjbb2005, where inter-node object references are the least among all, NumaGiC scales almost linearly.

  • Florian David, co-advised with Gilles Muller at 50% (2011-2015). Continuous and Efficient Lock Profiling for Java on Multicore Architectures. Currently research engineer at Criteo (France). [Abstract]

    Today, Java is regularly used to implement large multithreaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.

  • Jean Pierre Lozi, co-advised with Gilles Muller at 50% (2010–2014). Towards more scalable mutual exclusion for multicore architectures. Currently research engineer at Oracle Lab Zürich (Switzerland). [Abstract]

    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this thesis is a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated hardware thread, which is referred to as the server. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock because such data can typically remain in the server’s cache.

    Other contributions presented in this thesis include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool developed with Julia Lawall that transforms POSIX locks into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. Using RCL locks, performance is improved by up to 2.5 times with respect to POSIX locks on Memcached, and up to 11.6 times with respect to Berkeley DB with the TPC-C client. On an SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to POSIX locks on Memcached, and up to 7.9 times with respect to Berkeley DB with the TPC-C client.

  • Koutheir Attouchi, co-advised with Gilles Muller at 50% (2011-2014). Managing resource sharing conflicts in an open embedded software environment. Currently engineer at MicroDoc (Germany). [Abstract]

    Our homes are becoming smart thanks to the numerous devices, sensors and actuators available in it, providing services, e.g., entertainment, home security, energy efficiency and health care. Various service providers want to take advantage of the smart home opportunity by rapidly developing services to be hosted by an embedded smart home gateway. The gateway is open to applications developed by untrusted service providers, controlling numerous devices, and possibly containing bugs or malicious code. Thus, the gateway should be highly-available and robust enough to handle software problems without restarting abruptly. Sharing the constrained resources of the gateway between service providers allows them to provide richer services. However, resource sharing conflicts happen when an application uses resources “unreasonably” or abusively. This thesis addresses the problem of resource sharing conflicts in the smart home gateway, investigating prevention approaches when possible, and considering detection and resolving approaches when prevention is out of reach.

    Our first contribution, called Jasmin, aims at preventing resource sharing conflicts by isolating applications. Jasmin is a middleware for development, deployment and isolation of native, component-based and service-oriented applications targeted at embedded systems. Jasmin enables fast and easy cross-application communication, and uses Linux containers for lightweight isolation. Our second contribution, called Incinerator, is a subsystem in the Java Virtual Machine (JVM) aiming to resolve the problem of Java stale references, i.e., references to objects that should no more be used. Stale references can cause significant memory leaks in an OSGi-based smart home gateway, hence decreasing the amount of available memory, which increases the risks of memory sharing conflicts. With less than 4% overhead, Incinerator not only detects stale references, making them visible to developers, but also eliminates them, hence lowering the risks of resource sharing conflicts. Even in Java, memory sharing conflicts happen. Thus, in order to detect them, we propose our third contribution: a memory monitoring subsystem integrated into the JVM. Our subsystem is mostly transparent to application developers and also aware of the component model composing smart home applications. The system accurately accounts for resources consumed during cross-application interactions, and provides on-demand snapshots of memory usage statistics for the different service providers sharing the gateway.

  • Thomas Preud’Homme, co-advised with Bertil Folliot and Julien Sopena at 30% (2008–2013). Optimized inter-core communication protocol for stream-oriented parallelism. Currently engineer at ARM (United Kingdom).
  • Nicolas Geoffray, co-advised with Bertil Folliot at 80% (2005-2009). Fostering System Research with VMKit. Currently engineer at Google (United Kingdom). [Abstract]

    Many systems researh projets now target managed runtime environments (MRE) beause they provide better produtivity and safety compared to native environments. Still, developing and optimizing an MRE is a tedious task that requires many years of development. Although MREs share some common functionalities, suh as a Just In Time Compiler or a Garbage Collector, this opportunity for sharing hash not been yet exploited in implementing MREs. This thesis desribes and evaluates VMKit, a first attempt to build a common substrate that eases the development and experimentation of high-level MREs and systems mehanisms. VMKit has been suessfully used to build two MREs, a Java Virtual Machine and a Common Language Runtime, as well as a a new system mechanism that provides better security in the context of servie-oriented architectures.

    We describe the lessons learnt in implementing suh a common infrastruture from a performance and an ease of development standpoint. The performance of VMKit are reasonable compared to industrial MREs, and the high-level MREs are only 20,000 lines of code. Our new system mechanism only requires the addition of 600 lines of code in VMKit, and is a significant step towards better dependable systems.

  • Charles Clément, co-advised with Bertil Folliot at 40% (2004-2009). Isolation of operating system extensions with a managed runtime environment. Currently engineer at Amazon (US).

Engineers ()

  • Harris Bakiras (2011 – 2013): engineer on the VMKit project (INRIA ADT project). Currently engineer at Microsoft (France).

Grants and Contrats

International projects ()

  • 2018 – 2021: H2020 CloudButton, member.
  • 2010 – 2012: Collaboration CNPq/INRIA: Dependable Mechanisms for Dynamic Networks, contributor.
  • 2005 – 2006: ITEA S4ALL: Services for All, scientific coordinator for LSR.
  • 2002 – 2004: IST COACH: Component Based Open Source Architecture for Distributed Telecom Applications, contributor.

French projects ()

  • 2019 – 2022: Scalevisor (ANR PCRE), scientific leader for Telecom SudParis.
  • 2019 – 2022: Pythia (ANR JCJC), member.
  • 2018 – 2021: Primate (ANR PCRI), principal investigator for the french side.
  • 2013 – 2015: Richelieu (FUI), scientific leader for UPMC.
  • 2012 – 2015: Infra-JVM (ANR Infra), principal investigator.
  • 2011 – 2014: CIFRE PhD funding with Orange Labs (Koutheir Attouchi).
  • 2009 – 2012: ABL (ANR Blanc, 275 k\euro): A Bug Life, coordinator of Task 1.

Talks ()

  • Scalevisor: a CPU/memory driver for large multicore architectures (07/2018, IRCICA, France)
  • A study of Garbage Collector Scalability on Multicore Hardware (02/2014, LIAFA, France – 09/2013, EPFL, Suisses – 07/2013, Verimag, France – 03/2013, IRISA, France – 01/2013, Epita, France)
  • Système, Virtualisation, Nuage - Petit tour d’horizon de la recherche en système en France (May. 2012, CNRS, France – Nov. 2011, UPMC, France)
  • VMKit: a substrate for Managed Runtime Environment (Feb. 2012, IRILL, France – Sep. 2011, Purdue University, US – Apr. 2011, Journées Compilation, Dinard, France – Mar. 2011, University of Utah, Salt-Lake City, US)
  • VMKit : un substrat de Machine Virtuelle (Oct. 2010, LaBRI, Bordeaux, France – Apr. 2010, Groupe de Travail Programmation, Lip6, Paris, France)
  • AutoVM : repousser les frontières de la généricité, Séminaire Performance et Généricité (May 2009, Epita, Paris, France)
  • Application d’une approche exo-noyau à la construction d’une machine virtuelle Java : la JnJVM (Apr. 2006, LIFL, Lilles, France – Mar. 2006, IRISA, Rennes, France)
  • Applications Actives : Construction dynamique d’environnements d’exécution flexibles homogènes (Jul. 2005, LIG, Grenoble, France)

PhD and HDR thesis committees ()

  • Reviewers (rapporteur) of theses: Maxime France-Pillois (09/2018), Mohamad Jaafar Nehme (12/2017), Boris Teabe (10/2017), Clément Béra (09/2017), Bo Zhang (12/2016), Julien Pagès (12/2016), Fabien André (11/2016), Nassim Halli (10/2016), François Serman (09/2016), Etienne Brodu (06/2016), Thomas Calmant (10/2015), José Simão (03/2015), Joaquim Perchat (01/2015), Victor Lomüller (11/2014), Camillo Bruni (05/2014), Yufang Dan (05/2014), François Goichon (12/2013), Quentin Sabah (12/2013), Konstantinos Kloudas (03/2013), Geoffroy Cogniaux (12/2012).
  • Examiner (examinateur) of theses: Mickaël Salaün (03/2018), Alain Tchana (HDR, 12/2017), Antoine Capra (12/2015), Inti Gonzalez Herrera (12/2015), Aurèle Maheo (09/2015), Marion Guthmuller (04/2015), Pierre Olivier (12/2014), Baptiste Lepers (01/2014), Sylvain Cotard (12/2013), Jean-Yves vet (11/2013), Preston Francisco Rodrigues (05/2013), Sarni Toufik (10/2012), Rémy Pottier (09/2012), Kiev Santos de Gama (10/2011), Christophe Deleray (10/2006).

Software

Teaching

Since 2001, I have taught around 2600 hours. During the year 2011 – 2012, I did not teach as I had a CRCT (Congés pour Recherche ou Conversion Thématique), i.e., one year for research. Before my associate professor and professor positions, I taught 10 hours at Polytech'Grenoble in 2005 during my Postdoctoral position, 172 hours at UPMC in 2004 when I was assistant professor and 288 hours during my PhD when I was teaching assistant. I teach principally in the domains of systems, languages and software engineering. Between 2010 and 2014, I have been responsible for the middleware track in the System Masters of UPMC. Since 2016, I'm the coordinator of the computer science courses at Telecom SudParis (∼ 40 courses). Each course is around 60 hours.

You can find a portal with the courses given by my department here. I'm currently involved in the following courses:

Course conceptions ()

  • 2017: Initiation to the Java programming language (Telecom SudParis, Bachelor 3, around 200 students)
  • 2016: Multicore programming (University Paris-Saclay, Master 2, around 20 students)
  • 2016: System programming (University Paris-Saclay, Master 1, around 30 students)
  • 2015: Initiation to systems with bash (Telecom SudParis, Bachelor 3, around 200 students)
  • 2009: Initiation to managed runtime environments (UPMC, Bachelor 2, around 100 students)
  • 2009: Multicore Systems and Virtualization (UPMC, Master 2, around 20 students)
  • 2009: Multicore Systems (Polytech'Paris, Master 2, around 20 students)
  • 2008: Advanced system frameworks (UPMC, Master 2, around 40 students)
  • 2007: Component oriented middlewares (UPMC, Master 2, around 100 students)

Course responsibilities ()

  • 2017 – today: Initiation to the Java programming language (Telecom SudParis, Bachelor 3, around 200 students)
  • 2016 – today: Multicore programming (University Paris-Saclay, Master 2, around 20 students)
  • 2015 – today: Initiation to systems with bash (Telecom SudParis, Bachelor 3, around 200 students)
  • 2016 – 2018: System programming (University Paris-Saclay, Master 1, around 30 students)
  • 2009 – 2014: Multicore Systems and Virtualization (UPMC, Master 2, around 30 students)
  • 2013 – 2014: Initiation to operating system (UPMC, Bachelor 2, around 200 students)
  • 2010 – 2014: Research group in systems (UPMC, Master 2, around 30 students)
  • 2009 – 2010: Multicore Systems (Polytech'Paris, Master 2, around 20 students)
  • 2006 – 2010: Client/Server oriented Distributed Systems (UPMC, Master 1, around 70 students)
  • 2006 – 2010: Distributed systems and client/server (UPMC, Master 2, around 10 students)
  • 2006 – 2009: Component oriented middlewares (UPMC, Master 2, around 100 students)

Teaching summary

Degree Teaching unit name Years Hours University
Bachelors 1 Functionnal programming with Scheme 2001 – 2004 192 UPMC
Bachelors 1 Initiation to the C language 2012 – 2013 60 UPMC
Bachelors 2 Initiation to managed runtime environments 2009 – 2014 107 UPMC
Bachelors 2 Initiation to operating system 2012 – 2014 100 UPMC
Bachelors 2 Architecture of microprocessors 2012 – 2013 40 UPMC
Bachelors 3 Operating system principles 2004 – 2011 180 UPMC
Bachelors 3 Introduction to architectures and systems 2014 – 2015 33 Telecom SudParis
Bachelors 3 Initiation to the Java programming language 2015 – 2018 90 Telecom SudParis
Bachelors 3 Initiation to operating systems 2015 – 2018 104 Telecom SudParis
Masters 1 Operating system kernels 2002 – 2014 594 UPMC
Masters 1 Parallel programming 2004 – 2005 20 UPMC
Masters 1 System Projects 2004 – 2014 27 UPMC
Masters 1 Client/server oriented distributed systems 2006 – 2011 172 UPMC
Masters 1 Components 2007 – 2008 2 UPMC
Masters 1 Operating systems principle 2008 – 2010 16 Polytech'Paris
Masters 1 Object oriented programming 2014 – 2016 54 Telecom SudParis
Masters 1 Design and implementation of centralized systems 2014 – 2017 75 Telecom SudParis
Masters 1 System programming 2016 – 2018 90 Paris-Saclay
Masters 1 System programming 2016 – 2018 60 Telecom SudParis
Masters 2 Distributed applications and systems 2005 – 2006 10 Polytech'Grenoble
Masters 2 Distributed systems and client/server 2006 – 2010 112 UPMC
Masters 2 Component oriented middlewares 2006 – 2009 84 UPMC
Masters 2 Advanced system frameworks 2008 – 2014 62 UPMC
Masters 2 Multicore systems 2009 – 2011 40 Polytech'Paris
Masters 2 Multicore systems and virtualization 2009 – 2015 92 UPMC
Masters 2 Research group in system 2009 – 2014 116 UPMC
Masters 2 Cloud computing 2015 – 2018 9 Paris-Saclay
Masters 2 High performance systems 2015 – 2018 15 Telecom SudParis
Masters 2 Cloud computing 2015 – 2018 9 Paris-Saclay
Masters 2 Multicore programming 2016 – 2018 60 Paris-Saclay
Masters 2 Cloud infrastructure 2017 – 2018 6 Telecom SudParis

Professional activities

Member of program committees and organizations of events

I have been a member of program committees of rank A conferences.

  • Member of the Middleware 2018 program committee()
  • Member of the SRDS 2018 program committee ()
  • Member of the Eurosys 2018 program committee ()
  • Member of the MoreVM 2018 program committee (Workshop)
  • Member of the Compas 2018 program committee (French)
  • PC Chair of Compas 2017 (French)
  • Member of the Eurosys 2016 program committee ()
  • Member of the Compas 2016 program committee (French)
  • Member of the VEE 2015 program committee ()
  • Member of the ICOOOLPS 2015 program committee (Workshop)
  • Poster co-chair of the Eurosys 2015 conference
  • Member of the ComPAS 2015 program committee (French)
  • Treasurer and sponsorhsip co-chair of the Middleware 2014 conference
  • Member of the ComPAS/CFSE 2014 program committee (French)
  • Organizer of the informal "Managed Runtimes" workshop at LIP6 (∼ 30 persons)
  • Member of the PLOS 2013 program committee (Workshop)
  • Member of the ComPAS/CFSE 2013 program committee (French)
  • Member of the DAIS 2012 program committee ()
  • Member of the DAIS 2011 program committee ()
  • Program chair of the NOTERE 2011 Workshops (French)
  • Member of the CFSE 2011 program committees (French)
  • Co-organiser of the OSGi User Group French (OUGF) workshop (∼ 20 persons)

Other responsibilities

Publications

For the ranking of venues, I use the 2014 Australian Ranking of ICT Conferences (http://www.core.edu.au), which ranks conferences and journals in computer science with A* (top 4%), A (top 14%), B and C. If the rank is not given, it means that the venue is not ranked.

Major conferences in computer science have low acceptance rates, comparable to high-level journals. They often have higher status and greater impact. They often provide higher visibility (cf. https://homes.cs.washington.edu/~mernst/advice/conferences-vs-journals.html).

Ranked Publications

International conferences ()

  1. Partition participant detector with dynamic paths in mobile networks. Luciana Arantes, Pierre Sens, Gaël Thomas, Denis Conan and Leon Lim. In Proceedings of the international symposium on Network Computing and Applications, NCA'10, pages 224-228.  2010. Short paper. . [Abstract] [BibTeX] [.pdf]
    Mobile ad-hoc networks, MANETs, are self organized and very dynamic systems where processes have no global knowledge of the system. In this paper, we propose a model that characterizes the dynamics of MANETs in the sense that it considers that paths between nodes are dynamically built and the system can have infinitely many processes but the network may present finite stable partitions. We also propose an algorithm that implements an eventually perfect partition participant detector PD which eventually detects the participant nodes of stable partitions.
    @inproceedings{nca:10:arantes:manet,
      author = {Arantes, Luciana and Sens, Pierre and Thomas, Gaël and Conan, Denis and Lim, Leon},
      title = {Partition participant detector with dynamic paths in mobile networks},
      booktitle = {Proceedings of the international symposium on Network Computing and Applications, NCA'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {224--228}
    }
  2. Memory Monitoring on a multi-tenant OSGi execution environment. Koutheir Attouchi, Gaël Thomas, André Bottaro and Gilles Muller. In Proceedings of the international symposium on Component-Based Software Engineering, CBSE'14, pages 107-116.  2014. . [Abstract] [BibTeX] [.pdf]
    Smart Home market players aim to deploy component-based and service-oriented applications from untrusted third party providers on a single OSGi execution environment. This creates the risk of resource abuse by buggy and malicious applications, which raises the need for resource monitoring mechanisms. Existing resource monitoring solutions either are too intrusive or fail to identify the relevant resource consumer in numerous multi-tenant situations. This paper proposes a system to monitor the memory consumed by each tenant, while allowing them to continue communicating directly to render services. We propose a solution based on a list of configurable resource accounting rules between tenants, which is far less intrusive than existing OSGi monitoring systems. We modified an experimental Java Virtual Machine in order to provide the memory monitoring features for the multi-tenant OSGi environment. Our evaluation of the memory monitoring mechanism on the DaCapo benchmarks shows an overhead below 46%.
    @inproceedings{cbse:14:attouchi:monitoring,
      author = {Attouchi, Koutheir and Thomas, Gaël and Bottaro, André and Muller, Gilles},
      title = {Memory Monitoring on a multi-tenant OSGi execution environment},
      booktitle = {Proceedings of the international symposium on Component-Based Software Engineering, CBSE'14},
      publisher = {ACM},
      year = {2014},
      pages = {107--116}
    }
  3. Incinerator - Eliminating Stale References in Dynamic OSGi Applications. Koutheir Attouchi, Gaël Thomas, Gilles Muller, Julia Lawall and André Bottaro. In Proceedings of the international conference on Dependable Systems and Networks, DSN'15, pages 11.  2015. . [Abstract] [BibTeX] [.pdf]
    Java class loaders are commonly used in application servers to load, unload and update a set of classes as a unit. However, unloading or updating a class loader can introduce stale references to the objects of the outdated class loader. A stale reference leads to a memory leak and, for an update, to an inconsistency between the outdated classes and their replacements. To detect and eliminate stale references, we propose Incinerator, a Java virtual machine extension that introduces the notion of an outdated class loader. Incinerator detects stale references and sets them to null during a garbage collection cycle. We evaluate Incinerator in the context of the OSGi framework and show that Incinerator correctly detects and eliminates stale references, including a bug in Knopflerfish. We also evaluate the performance of Incinerator with the DaCapo benchmark on VMKit and show that Incinerator has an overhead of at most 3.3%.
    @inproceedings{dsn:15:attouchi:incinerator,
      author = {Attouchi, Koutheir and Thomas, Gaël and Muller, Gilles and Lawall, Julia and Bottaro, André},
      title = {Incinerator - Eliminating Stale References in Dynamic OSGi Applications},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'15},
      publisher = {IEEE Computer Society},
      year = {2015},
      pages = {11}
    }
  4. EZ: towards efficient asynchronous protocol gateway construction. Yérom-David Bromberg, Morandat Floréal, Réveillère Laurent and Gaël Thomas. In Proceedings of the conference on Distributed Applications and Interoperable Systems, DAIS'13, pages 169-174.  2013. Short paper. . [Abstract] [BibTeX] [.pdf]
    Over the past decade, we have witnessed the emergence of a bulk set of devices, from very different application domains interconnected via Internet to form what is commonly named Internet of Things (IoT). The IoT vision is grounded in the belief that all devices are able to interact seamlessly with each other anytime, anyplace, anywhere. However, devices communicate via a multitude of incompatible protocols, and consequently drastically slow down the IoT vision adoption. Gateways, that are able to translate one protocol to another, appear to be a key enabler of the future of IoT but present a cumbersome challenge for many developers. In this paper, we are providing a framework called EZ that enables to generate gateways for either C or Java platform without requiring from developers any substantial understanding of either relevant protocols or low-level network programming.
    @inproceedings{dais:13:bromberg:ez,
      author = {Bromberg, Yérom-David and Morandat Floréal and Réveillère Laurent and Thomas, Gaël},
      title = {EZ: towards efficient asynchronous protocol gateway construction},
      booktitle = {Proceedings of the conference on Distributed Applications and Interoperable Systems, DAIS'13},
      publisher = {Springer-Verlag},
      year = {2013},
      pages = {169--174}
    }
  5. Evaluating HTM for pauseless garbage collectors in Java. Maria Carpen-Amarie, Dave Dice, Patrick Marlier, Gaël Thomas and Pascal Felber. In Proceedings of the International Symposium on Parallel and Distributed Processing with Applications, ISPA'15, pages 8.  2015. . [Abstract] [BibTeX] [.pdf]
    While garbage collectors (GCs) significantly simplify programmers’ tasks by transparently handling memory management, they also introduce various overheads and sources of unpredictability. Most importantly, GCs typically block the application while reclaiming free memory, which makes them unfit for environments where responsiveness is crucial, such as real-time systems. There have been several approaches for developing concurrent GCs that can exploit the processing capabilities of multi-core architectures, but at the expense of a synchronization overhead between the application and the collector. In this paper, we investigate a novel approach to implementing pauseless moving garbage collection using hardware transactional memory (HTM). We describe the design of a moving GC algorithm that can operate concurrently with the application threads. We study the overheads resulting from using transactional barriers in the Java virtual machine (JVM) and discuss various optimizations. Our findings show that, while the cost of these barriers can be minimized by carefully restricting them to volatile accesses when executing within the interpreter, the actual performance degradation becomes unacceptably high with the just-in-time compiler. The results tend to indicate that current HTM mechanisms cannot be readily used to implement a pauseless GC in Java that can compete with state-of-the-art concurrent GCs.
    @inproceedings{ispa:15:carpen-amarie:stmgc,
      author = {Carpen-Amarie, Maria and Dice, Dave and Marlier, Patrick and Thomas, Gaël and Felber, Pascal},
      title = {Evaluating HTM for pauseless garbage collectors in Java},
      booktitle = {Proceedings of the International Symposium on Parallel and Distributed Processing with Applications, ISPA'15},
      year = {2015},
      pages = {8}
    }
  6. Transactional Pointers: Experiences with HTM-Based Reference Counting in C++. Maria Carpen-Amarie, Dave Dice, Gaël Thomas and Pascal Felber. In Proceedings of the International Conference on Networked Systems, NETYS'16, pages 15.  2016. [Abstract] [BibTeX] [.pdf]
    The most popular programming languages, such as C++ or Java, have libraries and data structures designed to automatically address concurrency hazards in order to run on multiple threads. In particular, this trend has also been adopted in the memory management domain. However, automatic concurrent memory management also comes at a price, leading sometimes to noticeable overhead. In this paper, we experiment with C++ smart pointers and their automatic memory-management technique based on reference counting. More precisely, we study how we can use hardware transactional memory (HTM) to avoid costly and sometimes unnecessary atomic operations. Our results suggest that replacing the systematic counting strategy with HTM could improve application performance in certain scenarios, such as concurrent linked-list traversal.
    @inproceedings{netys:16:carpen-amarie:transactional-pointers,
      author = {Carpen-Amarie, Maria and Dice, Dave and Thomas, Gaël and Felber, Pascal},
      title = {Transactional Pointers: Experiences with HTM-Based Reference Counting in C++},
      booktitle = {Proceedings of the International Conference on Networked Systems, NETYS'16},
      publisher = {Springer-Verlag},
      year = {2016},
      pages = {15}
    }
  7. Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers. Maria Carpen-Amarie, Yaroslav Hayduk, Pascal Felber, Christof Fetzer, Gaël Thomas and David Dice. In Proceedings of the international conference on Managed Languages and Runtimes (formerly PPPJ), ManLang'17, pages 7.  2017. . [Abstract] [BibTeX] [.pdf]
    The garbage collector (GC) is a critical component of any managed runtime environment (MRE), such as the Java virtual machine. While the main goal of the GC is to simplify and automate memory management, it may have a negative impact on the application performance, especially on multi-core systems. This is typically due to stop-the-world pauses, i.e., intervals for which the application threads are blocked during the collection. Existing approaches to concurrent GCs allow the application threads to perform at the same time as the GC at the expense of throughput and simplicity. In this paper we build upon an existing pauseless transactional GC algorithm and design an important optimization that would signi cantly increase its throughput. More precisely, we devise selective access barriers, that de ne multiple paths based on the state of the garbage collector. Preliminary evaluation of the selective barriers shows up to 93% improvement over the initial transactional barriers in the worst case scenario. We estimate the performance of a pauseless GC having selective transactional barriers and nd it to be on par with Java’s concurrent collector.
    @inproceedings{manlang:17:carpen-amarie:gc-htm,
      author = {Carpen-Amarie, Maria and Hayduk, Yaroslav and Felber, Pascal and Fetzer, Christof and Thomas, Gaël and Dice, David},
      title = {Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers},
      booktitle = {Proceedings of the international conference on Managed Languages and Runtimes (formerly PPPJ), ManLang'17},
      publisher = {ACM},
      year = {2017},
      pages = {7}
    }
  8. Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler. Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pages 14.  2014. . [Abstract] [BibTeX] [.pdf]
    Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
    @inproceedings{oopsla:14:david:free-lunch,
      author = {David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler},
      booktitle = {Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14},
      publisher = {ACM},
      year = {2014},
      pages = {14}
    }
  9. A lazy developer approach: building a JVM with third party software. Nicolas Geoffray, Gaël Thomas, Charles Clément and Bertil Folliot. In Proceedings of the international symposium on Principles and Practice of Programming in Java, PPPJ'08, pages 73-82.  2008. . [Abstract] [BibTeX] [.pdf]
    The development of a complete Java Virtual Machine (JVM) implementation is a tedious process which involves knowledge in different areas: garbage collection, just in time compilation, interpretation, file parsing, data structures, etc. The result is that developing its own virtual machine requires a considerable amount of man/year. In this paper we show that one can implement a JVM with third party software and with performance comparable to industrial and top open-source JVMs. Our proof-of-concept implementation uses existing versions of a garbage collector, a just in time compiler, and the base library, and is robust enough to execute complex Java applications such as the OSGi Felix implementation and the Tomcat servlet container.
    @inproceedings{pppj:08:geoffray:ladyvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil},
      title = {A lazy developer approach: building a JVM with third party software},
      booktitle = {Proceedings of the international symposium on Principles and Practice of Programming in Java, PPPJ'08},
      publisher = {ACM},
      year = {2008},
      pages = {73--82}
    }
  10. Transparent and dynamic code offloading for Java Application. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the international conference on Distributed Objects and Applications, DOA'06, pages 1790-1806.  2006. [Abstract] [BibTeX] [.pdf]
    Code ofloading is a promising effort for embedded systems and load-balancing. Embedded systems will be able to offoad computation to nearby computers and large-scale applications will be able to load-balance computation during high load. This paper presents a runtime infrastructure that transparently distributes computation between interconnected workstations. Application source code is not modified: instead, dynamic aspect weaving within an extended virtual machine allows to monitor and distribute entities dynamically. Runtime policies for distribution can be dynamically adapted depending on the environment. A first evaluation of the system shows that our technique increases the transaction rate of a Web server during high load by 73%.
    @inproceedings{doa:06:geoffray:offloading,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Transparent and dynamic code offloading for Java Application},
      booktitle = {Proceedings of the international conference on Distributed Objects and Applications, DOA'06},
      publisher = {LNCS},
      year = {2006},
      pages = {1790--1806}
    }
  11. VMKit: a substrate for managed runtime environments. Nicolas Geoffray, Gaël Thomas, Julia Lawall, Gilles Muller and Bertil Folliot. In Proceedings of the international conference on Virtual Execution Environments, VEE'10, pages 51-62.  2010. . [Abstract] [BibTeX] [.pdf]
    Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages.

    This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparable to the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the DaCapo benchmarks.
    @inproceedings{vee:10:geoffray:vmkit,
      author = {Geoffray, Nicolas and Thomas, Gaël and Lawall, Julia and Muller, Gilles and Folliot, Bertil},
      title = {VMKit: a substrate for managed runtime environments},
      booktitle = {Proceedings of the international conference on Virtual Execution Environments, VEE'10},
      publisher = {ACM},
      year = {2010},
      pages = {51--62}
    }
  12. I-JVM: a Java virtual machine for component isolation in OSGi. Nicolas Geoffray, Gaël Thomas, Gilles Muller, Pierre Parrend, Stéphane Frénot and Bertil Folliot. In Proceedings of the international conference on Dependable Systems and Networks, DSN'09, pages 544-553.  2009. . [Abstract] [BibTeX] [.pdf]
    The OSGi framework is a Java-based, centralized, component oriented platform. It is being widely adopted as an execution environment for the development of extensible applications. However, current Java Virtual Machines are unable to isolate components from each other. For instance, a malicious component can freeze the complete platform by allocating too much memory or alter the behavior of other components by modifying shared variables. This paper presents I-JVM, a Java Virtual Machine that provides a lightweight approach to isolation while preserving compatibility with legacy OSGi applications. Our evaluation of I-JVM shows that it solves the 8 known OSGi vulnerabilities that are due to the Java Virtual Machine and that the overhead of I-JVM compared to the JVM on which it is based is below 20%.
    @inproceedings{dsn:09:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Muller, Gilles and Parrend, Pierre and Frénot, Stéphane and Folliot, Bertil},
      title = {I-JVM: a Java virtual machine for component isolation in OSGi},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'09},
      publisher = {IEEE Computer Society},
      year = {2009},
      pages = {544--553}
    }
  13. A study of the scalability of stop-the-world garbage collectors on multicores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 229-240.  2013. . [Abstract] [BibTeX] [.pdf]
    Large-scale multicore architectures are problematic for garbage collection (GC). In particular, throughput-oriented stop-the-world algorithms demonstrate excellent performance with a small number of cores, but have been shown to degrade badly beyond approximately 20 cores on OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48 cores.
    @inproceedings{asplos:13:gidra:naps,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {A study of the scalability of stop-the-world garbage collectors on multicores},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13},
      publisher = {ACM},
      year = {2013},
      pages = {229--240}
    }
  14. NumaGiC: a garbage collector for big data on big NUMA machines. Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro and Nhan Nguyen. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 14.  2015. . [Abstract] [BibTeX] [.pdf]
    On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
    @inproceedings{asplos:15:gidra:numagic,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc and Nguyen, Nhan},
      title = {NumaGiC: a garbage collector for big data on big NUMA machines},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15},
      publisher = {ACM},
      year = {2015},
      pages = {14}
    }
  15. A generic language for dynamic adaptation. Assia Hachichi, Gaël Thomas, Cyril Martin, Simon Patarin and Bertil Folliot. In Proceedings of the European conference on Parallel processing, EuroPar'05, pages 40-49.  2005. . [Abstract] [BibTeX] [.pdf]
    Today, component oriented middlewares are used to design, develop and deploy distributed applications easily. They ensure the heterogeneity, interoperability, and reuse of software modules. Several standards address this issue: CCM (CORBA Component Model), EJB (Enterprise Java Beans) and .Net. However they offer a limited and fixed number of system services, and their deployment and configuration mechanisms cannot be used by any language nor API dynamically. As a solution, we present a generic high-level language to adapt system services dynamically in existing middlewares. This solution is based on a highly adaptable platform which enforces adaptive behaviours, and offers a means to specify and adapt system services dynamically. A first prototype was achieved for the OpenCCM platform, and good performances were obtained.
    @inproceedings{europar:05:hachichi:cvm,
      author = {Hachichi, Assia and Thomas, Gaël and Martin, Cyril and Patarin, Simon and Folliot, Bertil},
      title = {A generic language for dynamic adaptation},
      booktitle = {Proceedings of the European conference on Parallel processing, EuroPar'05},
      publisher = {LNCS},
      year = {2005},
      pages = {40--49}
    }
  16. A distributed service-oriented mediation tool. Colombe Herault, Gaël Thomas and Philippe Lalanda. In Proceedings of the international Conference on Services Computing, SCC'07, pages 403-409.  2007. . [Abstract] [BibTeX] [.pdf]
    Integration of heterogeneous information becomes again a requirement with the emergence of large-scale distributed applications such as Web-Services based Applications. Enterprise Service Buses (ESB) deals with distribution and communication, but they still do not fix all the mediation issues such as design, deployment and administration of mediators. It turns out however that current solutions are technology-oriented and beyond the scope of most programmers. In this paper, we present an approach that clearly separates the specification of the mediation operations basing on a service component model, and their execution on a distributed ESB. Model and ESB are independent of the targeted middleware used by applications. This work is made within the European-funded S4ALL project (Services For All).
    @inproceedings{scc:07:herault:mediation,
      author = {Herault, Colombe and Thomas, Gaël and Lalanda, Philippe},
      title = {A distributed service-oriented mediation tool},
      booktitle = {Proceedings of the international Conference on Services Computing, SCC'07},
      publisher = {IEEE Computer Society},
      year = {2007},
      pages = {403--409}
    }
  17. Blue banana: resilience to avatar mobility in distributed MMOGs. Sergey Legtchenko, Sébastien Monnet and Gaël Thomas. In Proceedings of the international conference on Dependable Systems and Networks, DSN'10, pages 171-180.  2010. . [Abstract] [BibTeX] [.pdf]
    Massively Multiplayer Online Games (MMOGs) recently emerged as a popular class of applications with millions of users. To offer acceptable gaming experience, such applications need to render the virtual world surrounding the player with a very low latency. However, current state of-the-art MMOGs based on peer-to-peer overlays fail to satisfy these requirements. This happens because avatar mobility implies many data exchanges through the overlay. As state-of-the-art overlays do not anticipate this mobility, the needed data is not delivered on time, which leads to transient failures at the application level. To solve this problem, we propose Blue Banana, a mechanism that models and predicts avatar movement, allowing the overlay to adapt itself by anticipation to the MMOG needs. Our evaluation is based on large-scale traces derived from Second life. It shows that our anticipation mechanism decreases by 20% the number of transient failures with only a network overhead of 2%.
    @inproceedings{dsn:10:legtchenko:bluebanana,
      author = {Legtchenko, Sergey and Monnet, Sébastien and Thomas, Gaël},
      title = {Blue banana: resilience to avatar mobility in distributed MMOGs},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {171--180}
    }
  18. Automatic OpenCL code generation for multi-device heterogeneous architectures. Pei Li, Elisabeth Brunet, François Trahay, Christian Parrot, Gaël Thomas and Raymond Namyst. In Proceedings of the International Conference on Parallel Processing, ICPP'15, pages 10.  2015. . [Abstract] [BibTeX] [.pdf]
    Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non- uniform domain decomposition, inter-accelerator data move- ments, and dynamic load balancing. Writing such code manually is time consuming and error-prone. In this paper, we propose a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for multiple accelerators. We evaluate both the performance and the usefulness of STEPOCL with three applications and show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
    @inproceedings{icpp:15:li:stepocl,
      author = {Li, Pei and Brunet, Elisabeth and Trahay, François and Parrot, Christian and Thomas, Gaël and Namyst, Raymond},
      title = {Automatic OpenCL code generation for multi-device heterogeneous architectures},
      booktitle = {Proceedings of the International Conference on Parallel Processing, ICPP'15},
      year = {2015},
      pages = {10}
    }
  19. Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pages 65-76.  2012. . [Abstract] [BibTeX] [.pdf]
    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache.

    We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.
    @inproceedings{usenix-atc:12:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications},
      booktitle = {Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12},
      publisher = {USENIX Association},
      year = {2012},
      pages = {65--76}
    }
  20. Support efficient dynamic aspects through reflection and dynamic compilation. Frédéric Ogel, Gaël Thomas and Bertil Folliot. In Proceedings of the Symposium on Applied Computing, SAC'05, pages 1351-1356.  2005. . [Abstract] [BibTeX] [.pdf]
    As systems grow more and more complex, raising severe evolution and management difficulties, computational reflection and aspect-orientation have proven to enforce separation of concerns principles and thus to address those issues. However, most of the existing solutions rely either on a static source code manipulation or on the introduction of extra-code (and overhead) to support dynamic adaptation. Whereas those approaches represent the extreme of a spectre, developers are left with this rigid trade-off between performance and dynamism. A first step toward a solution was the introduction of specialized virtual machines to support dynamic aspects into the core of the execution engine. However, using such dedicated runtimes limits applications' portability and interoperability. In order to reconcile dynamism and performance without introducing portability and interoperability issues, we propose a dynamic reflexive runtime that uses reflection and dynamic compilation to allow application-specific dynamic weaving strategics, whithout introducing extra-overhead compared to static monolithic weavers.
    @inproceedings{sac:05:ogel:efficientaspect,
      author = {Ogel, Frédéric and Thomas, Gaël and Folliot, Bertil},
      title = {Support efficient dynamic aspects through reflection and dynamic compilation},
      booktitle = {Proceedings of the Symposium on Applied Computing, SAC'05},
      publisher = {ACM},
      year = {2005},
      pages = {1351--1356}
    }
  21. Faults in Linux: ten years later. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Julia Lawall and Gilles Muller. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, pages 305-318.  2011. . [Abstract] [BibTeX] [.pdf]
    In 2001, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired a number of development and research efforts on improving the reliability of driver code. Today Linux is used in a much wider range of environments, provides a much wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? Are drivers still a major problem?

    To answer these questions, we have transported the experiments of Chou et al. to Linux versions 2.6.0 to 2.6.33, released between late 2003 and early 2010. We find that Linux has more than doubled in size during this period, but that the number of faults per line of code has been decreasing. And, even though drivers still accounts for a large part of the kernel code and contains the most faults, its fault rate is now below that of other directories, such as arch (HAL) and fs (file systems). These results can guide further development and research efforts. To enable others to continually update these results as Linux evolves, we define our experimental protocol and make our checkers and results available in a public archive.
    @inproceedings{asplos:11:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Lawall, Julia and Muller, Gilles},
      title = {Faults in Linux: ten years later},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {305--318}
    }
  22. An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm. Thomas Preud'homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS'12, pages 8.  2012. . [Abstract] [BibTeX] [.pdf]
    In the context of multicore programming, pipeline parallelism is a solution to easily transform a sequential program into a parallel one without requiring a whole rewriting of the code. The OpenMP stream-computing extension presented by Pop and Cohen proposes an extension of OpenMP to handle pipeline parallelism. However, their communication algorithm relies on multiple producer multiple consumer queues, while pipelined application mostly deals with linear chains of communication, i.e., with only a single producer and a single producer.

    To improve the communication performance of the OpenMP stream-extension, we propose to use, when it is possible, a more specialized single producer single consumer communication algorithm called BatchQueue. Our evaluation shows that BatchQueue is then able to improve the throughput by up to 30% for real applications and by up to 200% for an example application which is fully parallelizable communication intensive micro benchmark. Our study shows therefore that using specialized and efficient communication algorithms can have a significant impact on the overall performance of pipelined applications.
    @inproceedings{icpads:12:preudhomme:batchqueue,
      author = {Preud'homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm},
      booktitle = {Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS'12},
      publisher = {IEEE Computer Society},
      year = {2012},
      pages = {8}
    }
  23. BatchQueue: fast and memory-thrifty core to core communication. Thomas Preud'Homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the international Symposium on Computer Architecture and High Performance Computing, SBAC-PAD'10, pages 215-222.  2010. . [Abstract] [BibTeX] [.pdf]
    Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables -- each on a different cache line for optimal performance -- to work. The characteristics of BatchQueue -- high throughput and increased latency resulting from its batch processing -- makes it well suited for highly communicative tasks with no real time requirements such as monitoring.
    @inproceedings{sbac-pad:10:preudhomme:batchqueue,
      author = {Preud'Homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {BatchQueue: fast and memory-thrifty core to core communication},
      booktitle = {Proceedings of the international Symposium on Computer Architecture and High Performance Computing, SBAC-PAD'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {215--222}
    }
  24. Hector: Detecting Resource-Release Omission Faults in Error-Handling Code for Systems Software. Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the international conference on Dependable Systems and Networks, DSN'13, pages 12.  2013. Best paper award. . [Abstract] [BibTeX] [.pdf]
    Omitting resource-release operations in systems error handling code can lead to memory leaks, crashes, and deadlocks. Finding omission faults is challenging due to the difficulty of reproducing system errors, the diversity of system resources, and the lack of appropriate abstractions in the C language. To address these issues, numerous approaches have been proposed that globally scan a code base for common resource-release operations. Such macroscopic approaches are notorious for their many false positives, while also leaving many faults undetected.

    We propose a novel microscopic approach to finding resource-release omission faults in systems software. Rather than generalizing from the entire source code, our approach focuses on the error-handling code of each function. Using our tool, Hector, we have found over 370 faults in six systems software projects, including Linux, with a 23% false positive rate. Some of these faults allow an unprivileged malicious user to crash the entire system.
    @inproceedings{dsn:13:saha:ehctor,
      author = {Saha, Suman and Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Hector: Detecting Resource-Release Omission Faults in Error-Handling Code for Systems Software},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'13},
      publisher = {IEEE Computer Society},
      year = {2013},
      pages = {12}
    }
  25. EActors: Fast and flexible trusted computing using SGX. Vasily A. Sartakov, Stefan Brenner, Sonia Ben Mokhtar, Sara Bouchenak, Gaël Thomas and Rüdiger Kapitza. In Proceedings of the International Conference on Middleware, Middleware'18, pages 12.  2018. Accepted for publication. . [Abstract] [BibTeX] [.pdf]
    Novel trusted execution support, as offered by Intel’s Software Guard eXtensions (SGX), embeds seamlessly into user space applications by establishing regions of encrypted memory, called enclaves. Enclaves comprise code and data that is exe- cuted under special protection of the CPU and can only be accessed via an enclave defined interface. To facilitate the usability of this new system abstraction, Intel offers a soft- ware development kit (SGX SDK). While the SDK eases the use of SGX, it misses appropriate programming support for inter-enclave interaction, and demands to hardcode the exact use of trusted execution into applications, which restricts flexibility.

    This paper proposes EActors, an actor framework that is tailored to SGX and offers a more seamless, flexible and efficient use of trusted execution – especially for applications demanding multiple enclaves. EActors disentangles the interaction with enclaves and, among them, from costly execution mode transitions. It features lightweight fine-grained parallelism based on the concept of actors, thereby avoiding costly SGX SDK provided synchronisation constructs. Finally, EActors offers a high degree of freedom to execute actors, either untrusted or trusted, depending on security requirements and performance demands. We implemented two use cases on top of EActors: (i) a secure instant messaging service, and (ii) a secure multi-party computation service. Both illustrate the ability of EActors to seamlessly and effectively build secure applications. Furthermore, our performance evaluation results show that securing the messaging service with EActors improves performance compared to the vanilla versions of JabberD2 and ejabberd by up to 40×.
    @inproceedings{middleware:18:saratov:eactors,
      author = {Sartakov, Vasily A. and Brenner, Stefan and Ben Mokhtar, Sonia and Bouchenak, Sara and Thomas, Gaël and Kapitza, Rüdiger},
      title = {EActors: Fast and flexible trusted computing using SGX},
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'18},
      publisher = {ACM},
      year = {2018},
      pages = {12}
    }
  26. An interface to implement NUMA policies in the Xen hypervisor. Gauthier Voron, Gaël Thomas, Vivien Quéma and Pierre Sens. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17, pages 14.  2017. . [Abstract] [BibTeX] [.pdf]
    While virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2.
    @inproceedings{eurosys:17:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Quéma, Vivien and Sens, Pierre},
      title = {An interface to implement NUMA policies in the Xen hypervisor},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17},
      publisher = {ACM},
      year = {2017},
      pages = {14}
    }

International journals ()

  1. Fast and Portable Locking for Multicore Architectures. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. ACM Transactions on Computer Systems (TOCS). Vol. 33(4), pages 13:1-13:62.  2016. . [Abstract] [BibTeX] [.pdf]
    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks.

    Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.
    @article{tocs:16:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Fast and Portable Locking for Multicore Architectures},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2016},
      volume = {33},
      number = {4},
      pages = {13:1--13:62}
    }
  2. Faults in Linux 2.6. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller and Julia Lawall. ACM Transactions on Computer Systems (TOCS). Vol. 32(2), pages 4:1-4:40.  2014. . [Abstract] [BibTeX] [.pdf]
    In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality?

    To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.
    @article{tocs:14:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Muller, Gilles and Lawall, Julia},
      title = {Faults in Linux 2.6},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2014},
      volume = {32},
      number = {2},
      pages = {4:1--4:40}
    }
  3. Designing highly flexible virtual machines: the JnJVM experience. Gaël Thomas, Nicolas Geoffray, Charles Clément and Bertil Folliot. Software - Practice & Experience (SP&E). Vol. 38(15), pages 1643-1675.  2008. . [Abstract] [BibTeX] [.pdf]
    Dynamic flexibility is a major challenge in modern system design to react to context or applicative requirements evolutions. Adapting behaviors may impose substantial code modification across the whole system, in the field, without service interruption, and without state loss. This paper presents the JnJVM, a full Java virtual machine (JVM) that satisfies these needs by using dynamic aspect weaving techniques and a component architecture. It supports adding or replacing its own code, while it is running, with no overhead on unmodified code execution. Our measurements reveal similar performance when compared to the monolithic JVM Kaffe. Three illustrative examples show different extension scenarios: (i) modifying the JVMs behavior; (ii) adding capabilities to the JVM; and (iii) modifying applications behavior.
    @article{spe:08:thomas:jnjvm,
      author = {Thomas, Gaël and Geoffray, Nicolas and Clément, Charles and Folliot, Bertil},
      title = {Designing highly flexible virtual machines: the JnJVM experience},
      journal = {Software - Practice & Experience (SP&E)},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2008},
      volume = {38},
      number = {15},
      pages = {1643--1675}
    }

International workshops ()

  1. A Performance Study of Java Garbage Collectors on Multicore Architectures. Maria Carpen-Amarie, Patrick Marlier, Pascal Felber and Gaël Thomas. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'15, pages 10.  2015. [Abstract] [BibTeX] [.pdf]
    In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) represents a critical component of the JVM and has a significant influence on the overall performance and efficiency of the running application. We perform a study on all available Java GCs, both in an academic environment (set of benchmarks), as well as in a simulated real-life situation (client-server application). We mainly focus on the three most widely used collectors: ParallelOld, ConcurrentMarkSweep and G1. We find that they exhibit different behaviours in the two tested environments. In particular, the default Java GC, ParallelOld, proves to be stable and adequate in the first situation, while in the real-life scenario its use results in unacceptable pauses for the application threads. We believe that this is partly due to the memory requirements of the multicore server. G1 GC performs notably bad on the benchmarks when forced to have a full collection between the iterations of the application. Moreover, even though G1 and ConcurrentMarkSweep GCs introduce significantly lower pauses than ParallelOld in the client-server environment, they can still seriously impact the response time on the client. Pauses of around 3 seconds can make a real-time system unusable and may disrupt the communication between nodes in the case of large-scale distributed systems.
    @inproceedings{pmam:15:carpen-amarie:gcanalysis,
      author = {Carpen-Amarie, Maria and Marlier, Patrick and Felber, Pascal and Thomas, Gaël},
      title = {A Performance Study of Java Garbage Collectors on Multicore Architectures},
      booktitle = {Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'15},
      publisher = {ACM},
      year = {2015},
      pages = {10}
    }
  2. Towards a new isolation abstraction for OSGi. Nicolas Geoffray, Gaël Thomas, Charles Clément and Bertil Folliot. In Proceedings of the workshop on Isolation and Integration in Embedded Systems, IIES'08, pages 41-45.  2008. [Abstract] [BibTeX] [.pdf]
    The OSGi specification defines a dynamic Java-based service oriented architecture for networked environments such as home service gateways. To provide isolation between different services, it relies on the Java class loading mechanism. While class loaders have many advantages beside isolation, they are poor in protecting the system against malicious or buggy services. In this paper, we propose a new approach for service isolation. It is based on the Java isolate technology, without a task-oriented architecture. Our approach is more tailored to service-oriented architectures and in particular offers a complete isolation abstraction to the OSGi platform.
    @inproceedings{iies:08:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil},
      title = {Towards a new isolation abstraction for OSGi},
      booktitle = {Proceedings of the workshop on Isolation and Integration in Embedded Systems, IIES'08},
      year = {2008},
      pages = {41--45}
    }
  3. Live and heterogeneous migration of execution environments. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the international workshop on Pervasive Systems, PerSys'06, pages 1254-1263.  2006. [Abstract] [BibTeX] [.pdf]
    Application migration and heterogeneity are inherent issues of pervasive systems. Each implementation of a pervasive system must provide its own migration framework which hides heterogeneity of the different resources. This leads to the development of many frameworks that perform the same functionality. We propose a minimal execution environment, the micro virtual machine, that factorizes process migration implementation and offers heterogeneity, transparency and performance. Systems implemented on top of this micro virtual machine, such as our own Java virtual machine, will therefore automatically inherit process migration capabilities.
    @inproceedings{persys:06:geoffray:migration,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Live and heterogeneous migration of execution environments},
      booktitle = {Proceedings of the international workshop on Pervasive Systems, PerSys'06},
      year = {2006},
      pages = {1254--1263}
    }
  4. Assessing the scalability of garbage collectors on many cores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the SOSP Workshop on Programming Languages and Operating Systems, PLOS'11, pages 1-5.  2011. Best paper award. . [Abstract] [BibTeX] [.pdf]
    Managed Runtime Environments (MRE) are increasingly used for application servers that use large multi-core hardware. We find that the garbage collector is critical for overall performance in this setting. We explore the costs and scalability of the garbage collectors on a contemporary 48-core multiprocessor machine. We present experimental evaluation of the parallel and concurrent garbage collectors present in OpenJDK, a widely-used Java virtual machine. We show that garbage collection represents a substantial amount of an application's execution time, and does not scale well as the number of cores increases. We attempt to identify some critical scalability bottlenecks for garbage collectors.
    @inproceedings{plos:11:gidra:gc,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {Assessing the scalability of garbage collectors on many cores},
      booktitle = {Proceedings of the SOSP Workshop on Programming Languages and Operating Systems, PLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {1--5}
    }
  5. Mediation and enterprise service bus -- A position paper. Colombe Hérault, Gaël Thomas and Philippe Lalanda. In Proceedings of the international workshop on Mediation in Semantic Web Services, Mediate'05, pages 1-13.  2005. [Abstract] [BibTeX] [.pdf]
    Enterprise Service Buses (ESB) are becoming standard to allow communication between Web Services. Different techniques and tools have been proposed to implement and to deploy mediators within ESBs. It turns out however that current solutions are very technology-oriented and beyond the scope of most programmers. In this position paper, we present an approach that clearly separates the specification of the mediation operations and their execution on an ESB. This work is made within the European-funded S4ALL project (Services For All).
    @inproceedings{mediate:05:herault:mediation,
      author = {Hérault, Colombe and Thomas, Gaël and Lalanda, Philippe},
      title = {Mediation and enterprise service bus -- A position paper},
      booktitle = {Proceedings of the international workshop on Mediation in Semantic Web Services, Mediate'05},
      year = {2005},
      pages = {1--13}
    }
  6. A step toward ubiquitous computing: an efficient flexible micro-ORB. Frédéric Ogel, Bertil Folliot and Gaël Thomas. In Proceedings of the 2004 ACM SIGOPS European Workshop, pages 176-181.  2004. . [Abstract] [BibTeX] [.pdf]
    Smart devices, such as personal assistants, mobile phone or smart cards, continuously spread and thus challenge every aspect of our lives. However, such environments exhibit specific constraints, such as mobility, high-level of dynamism and most often restricted resources. Traditional middlewares were not designed for such constraints and, because of their monolithic, static and rigid architectures, are not likely to become a fit.

    In response, we propose a flexible micro-ORB, called FlexORB, that supports on demand export of services as well as their dynamic deployment and reconfiguration. FlexORB supports mobile code through an intermediate code representation. It is built on top of Nevermind, a flexible minimal execution environment, which uses a reflexive dynamic compiler as a central common language substrate upon which to achieve interoperability.

    Preliminary performance measurements show that, while being relatively small (120 KB) and dynamically adaptable, FlexORB outperforms traditional middlewares such as RPC, CORBA and Java RMI.
    @inproceedings{sigopsew:04:ogel:micro-orb,
      author = {Ogel, Frédéric and Folliot, Bertil and Thomas, Gaël},
      title = {A step toward ubiquitous computing: an efficient flexible micro-ORB},
      booktitle = {Proceedings of the 2004 ACM SIGOPS European Workshop},
      publisher = {ACM},
      year = {2004},
      pages = {176--181}
    }
  7. Application-level concurrency management. Frédéric Ogel, Gaël Thomas, Bertil Folliot and Ian Piumarta. In Proceedings of the NATO workshop on Concurrent Information Processing and Computing, CIPC'03, pages 1-13.  2003. [Abstract] [BibTeX] [.pdf]
    Traditionally an execution environment faces a trade-off between providing high-level or low-level concurrency mechanisms. The former trades flexibility for ease-of-use, while the latter results in a concurrency management closer to applications needs at the cost of an increase in the complexity of the applications code. Thus, one way or another, an application programmer has to match his application's semantic to the set of abstractions exported by the target execution environment.

    Most execution environments, such as Java or Corba, are still rigid and closed and thus export high-level and general purpose abstractions that prevent application programmers from having any control or knowledge on the way their applications behave. Because concurrency concerns are closely related to applications semantics, a "one-size-fits-all" approach does hardly work.

    Hence, we propose a flexible and minimal execution environment 1 that allows dynamic construction of dedicated execution environments and dynamic reconfiguration at both the execution environment and the application level. We present its architecture and its utilization to construct a dynamically adaptable Java runtime that exploits this flexibility to overcome some limitations of traditional Java environments.
    @inproceedings{cipc:03:ogel:applicationlevel,
      author = {Ogel, Frédéric and Thomas, Gaël and Folliot, Bertil and Piumarta, Ian},
      title = {Application-level concurrency management},
      booktitle = {Proceedings of the NATO workshop on Concurrent Information Processing and Computing, CIPC'03},
      year = {2003},
      pages = {1--13}
    }
  8. How often do experts make mistakes?. Nicolas Palix, Julia Lawall, Gaël Thomas and Gilles Muller. In Proceedings of the workshop on Aspects, Components, and Patterns for Infrastructure Software, ACP4IS'10, pages 9-16.  2010. . [Abstract] [BibTeX] [.pdf]
    Large open-source software projects involve developers with a wide variety of backgrounds and expertise. Such software projects furthermore include many internal APIs that developers must understand and use properly. According to the intended purpose of these APIs, they are more or less frequently used, and used by developers with more or less expertise. In this paper, we study the impact of usage patterns and developer expertise on the rate of defects occurring in the use of internal APIs. For this preliminary study, we focus on memory management APIs in the Linux kernel, as the use of these has been shown to be highly error prone in previous work. We study defect rates and developer expertise, to consider e.g., whether widely used APIs are more defect prone because they are used by less experienced developers, or whether defects in widely used APIs are more likely to be fixed.
    @inproceedings{acp4is:10:palix:bugs,
      author = {Palix, Nicolas and Lawall, Julia and Thomas, Gaël and Muller, Gilles},
      title = {How often do experts make mistakes?},
      booktitle = {Proceedings of the workshop on Aspects, Components, and Patterns for Infrastructure Software, ACP4IS'10},
      year = {2010},
      pages = {9--16}
    }

Book chapters ()

  1. Virtualisation logicielle : de la machine réelle à la machine virtuelle abstraite. Bertil Folliot and Gaël Thomas. In Techniques de l'Ingénieur, pages 1-15.  2009. [Abstract] [BibTeX] [.pdf]
    Masquer l'hétérogénéité est un des grands challenges de l'informatique moderne : le nombre de configuration matériel est colossal et il est impossible de développer une application pour chacune de ces configurations spécifiques. La virtualisation logicielle apporte une réponse à ce problème en uniformisant l'accès au matériel, que ce soit l'accès au périphérique ou au processeur centrale. Deux domaines de l'informatique s'occupe de virtualisation : le domaine des systèmes d'exploitation s'occupe de masquer l'hétérogénéité des périphériques uniquement et le domaine des machines virtuelles s'occupe de masquer l'hétérogénéité des processeurs centraux.
    @incollection{book:09:thomas:virtualization,
      author = {Folliot, Bertil and Thomas, Gaël},
      title = {Virtualisation logicielle : de la machine réelle à la machine virtuelle abstraite},
      booktitle = {Techniques de l'Ingénieur},
      publisher = {Hermes},
      year = {2009},
      pages = {1--15}
    }
  2. Peer-to-Peer storage. Olivier Marin, Sébastien Monnet and Gaël Thomas. In Distibuted Systems: Design and Algorithms, pages 59-80.  2011. [Abstract] [BibTeX] [.pdf]
    Peer-to-peer storage applications are the main actual implementations of large-scale distributed software. A peer-to-peer storage application offers five main operations: a lookup operation to find a file, a read operation to read a file, a write operation to modify a file, an add operation to inject a new file and a remove operation to delete a file. However, most current peer-to-peer storage applications are limited to file sharing: they do not implement the write and the delete operations.
    @incollection{book:11:marin:peer-to-peer-storage,
      author = {Marin, Olivier and Monnet, Sébastien and Thomas, Gaël},
      title = {Peer-to-Peer storage},
      booktitle = {Distibuted Systems: Design and Algorithms},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2011},
      pages = {59-80}
    }
  3. Large-Scale peer-to-peer game applications. Sébastien Monnet and Gaël Thomas. In Distibuted Systems: Design and Algorithms, pages 81-103.  2011. [Abstract] [BibTeX] [.pdf]
    Massively multiplayer online games (MMOG) recently emerged as a popular class of applications with up to millions of users, spread over the world, connected through the Internet to play together. Most of these games provide a virtual environment in which players evolve, and interact with each other. When a player moves, moves an object, or performs any operation that has an impact on the virtual environment, players around him can see his actions.
    @incollection{book:11:monnet:large-scale-game,
      author = {Monnet, Sébastien and Thomas, Gaël},
      title = {Large-Scale peer-to-peer game applications},
      booktitle = {Distibuted Systems: Design and Algorithms},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2011},
      pages = {81-103}
    }
  4. Towards active applications: the virtual virtual machine approach. Frédéric Ogel, Gaël Thomas, Ian Piumarta, Antoine Galland, Bertil Folliot and Carine Baillarguet. In New Trends in Computer Science and Engineering, pages 1-21.  2003. [Abstract] [BibTeX] [.pdf]
    With the wide acceptance of distributed computing a rapidly growing number of application domains are emerging, leading to a growing number of ad hoc solutions that are rigid and poorly interoperable. Our response to this situation is a platform for building flexible and interoperable execution environments called the Virtual Virtual Machine. This article presents our approach, the architecture of the VVM and some of its primary applications.
    @incollection{book:03:ogel:vvm,
      author = {Ogel,Frédéric and Thomas, Gaël and Piumarta, Ian and Galland, Antoine and Folliot, Bertil and Baillarguet, Carine},
      title = {Towards active applications: the virtual virtual machine approach},
      booktitle = {New Trends in Computer Science and Engineering},
      publisher = {A92 Publishing House},
      year = {2003},
      pages = {1--21},
      edition = {POLIROM Press}
    }
  5. Applications pair-à-pair de partage de données. Emmanuel Saint-James and Gaël Thomas. In Systèmes répartis en action : de l'embarqué aux systèmes à large échelle, pages 223-256.  2008. [Abstract] [BibTeX] [.pdf]
    Les applications réparties à très large échelle ont connu leur principal succès dans les applications de partage de fichiers. Une application de partage de fichiers est fondamentalement un système de fichiers minimal ne possédant qu’un unique répertoire et dans lequel la pérennité des données n’est pas un but, c’est-à-dire que les fichiers peuvent se perdre. Une application de partage de fichiers peut donc se résumer à deux fonctions : une fonction de lecture et une fonction d’ajout (ou d’écriture).
    @incollection{book:08:thomas:p2p,
      author = {Saint-James, Emmanuel and Thomas, Gaël},
      title = {Applications pair-à-pair de partage de données},
      booktitle = {Systèmes répartis en action~: de l'embarqué aux systèmes à large échelle},
      publisher = {Hermes},
      year = {2008},
      pages = {223--256}
    }

French publications ()

  1. Propagation d'événements entre passerelles OSGi. Didier Donsez and Gaël Thomas. In Proceedings of the 2006 Atelier de travail OSGi, pages 1-5.  2006. [Abstract] [BibTeX] [.pdf]
    Le service de communication événementielle d'OSGi offre un cadre standard pour faire communiquer des services co-localisés. Nous proposons de propager ces événements hors de la passerelle à l'aide de ponts qui ne nécessitent aucune modification ni chez les émetteurs/récepteurs, ni dans le service d'événements. Plusieurs ponts peuvent alors cohabiter au sein de la même passerelle ce qui masque l'hétérogénéité des intergiciels à l'application et fait inter-opérer ces intergiciels. Notre infrastructure permet la construction de nouvelles applications reposant sur des intergiciels orientés message et nécessitant la flexibilité apportée par OSGi. Trois implantations de ponts sont présentés dans cet article et valident notre approche.
    @inproceedings{osgiw:06:donsez:event,
      author = {Donsez, Didier and Thomas, Gaël},
      title = {Propagation d'événements entre passerelles OSGi},
      booktitle = {Proceedings of the 2006 Atelier de travail OSGi},
      year = {2006},
      pages = {1--5}
    }
  2. Protocole de membership hautement extensible : conception est expérimentations. Bertil Folliot and Gaël Thomas. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'01, pages 25-36.  2001. [Abstract] [BibTeX] [.pdf]
    La gestion du membership (connaissance des machines actives et détection des machines fautives) est un composant essentiel pour la plupart des serveurs basés sur des grappes de machines. La plupart des protocoles de membership existants sont faiblement extensibles avec le nombre de nœuds, limitant ainsi l'extensibilité du service. Cet article présente la conception d'un protocole de membership basé sur une structure en anneau à deux niveaux de multicast, extensible à plus de 1000 noeuds. Des expérimentations sur 70 machines simulant des grappes jusqu'à 1024 noeuds montrent pour de nombreux cas (démarrage à froid, noeuds fautifs, partitionnement du réseau) l'efficacité de ce protocole.
    @inproceedings{cfse:01:folliot:membership,
      author = {Folliot, Bertil and Thomas, Gaël},
      title = {Protocole de membership hautement extensible : conception est expérimentations},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'01},
      year = {2001},
      pages = {25--36}
    }
  3. Distribution transparente et dynamique de code pour applications Java. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'06, pages 85-96.  2006. [Abstract] [BibTeX] [.pdf]
    La délégation d'exécution de code est un mécanisme prometteur pour les systèmes nomades et la répartition de charge. Les systèmes nomades pourraient profiter des ressources voisines pour leur envoyer du code à exécuter, et les serveurs Web pourraient répartir leur charge durant un pic de charge. Ce papier présente une infrastructure capable de distribuer une application Java existante entre plusieurs machines et de manière transparente. Le code source des applications ne nécessite pas de modification : nous utilisons le tissage dynamique d'aspects pour analyser (monitorage) et distribuer l'application pendant l'exécution. L'infrastructure est composée d'un environnement d'éxécution extensible et adaptable et de JnJVM, une machine virtuelle Java flexible étendue avec un tisseur d'aspect dynamique. Notre système est chargé pendant l'exécution lorsque l'application le nécessite, sans la redémarrer. Une première évaluation montre que notre système augmente le nombre de transactions par secondes d'un serveur Web de 73%.
    @inproceedings{cfse:06:geoffray:distribution,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Distribution transparente et dynamique de code pour applications Java},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'06},
      year = {2006},
      pages = {85--96}
    }
  4. I-JVM: une machine virtuelle Java pour l'isolation de composants dans OSGi. Nicolas Geoffray, Gaël Thomas, Gilles Muller, Pierre Parrend, Stéphane Frénot and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'09, pages 1-12.  2009. [Abstract] [BibTeX] [.pdf]
    OSGi est une plateforme orientée composants implémentée en Java qui est de plus en plus utilisée pour le développement d'applications extensibles. Cependant, les machines virtuelles Java existantes ne sont pas capables d'isoler des composants entre eux. Par exemple, un composant malicieux peut bloquer l'exécution de la plateforme en allouant trop de mémoire ou modifier le comportement d'autres composants en modifiant des variables globales. Nous présentons I-JVM, une machine virtuelle Java qui offre une isolation légère entre composants tout en préservant la compatibilité avec les applications OSGi existantes. I-JVM résoud les 8 vulnérabilités connues sur la plateforme OSGi liées à la machine virtuelle, et ne diminue que de 20% les performances des applications en comparaison avec la machine virtuelle sur laquelle elle est implémentée.
    @inproceedings{cfse:09:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Muller, Gilles and Parrend, Pierre and Frénot, Stéphane and Folliot, Bertil},
      title = {I-JVM: une machine virtuelle Java pour l'isolation de composants dans OSGi},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'09},
      year = {2009},
      pages = {1--12}
    }
  5. Reconfigurations dynamiques de services dans un intergiciel à composants CORBA CCM. Assia Hachichi, Cyril Martin, Gaël Thomas, Simon Patarin and Bertil Folliot. In Proceedings of the conférence francophone sur le Déploiement et la (Re)configuration de logiciels, DECOR'04, pages 159-170.  2004. [Abstract] [BibTeX] [.pdf]
    De nos jours, les intergiciels à composants sont utilisés pour concevoir, développer, et déployer facilement les applications réparties, et assurer l'hétérogénéité, et l'interopérabilité, ainsi que la réutilisation des modules logiciels, et la séparation entre le code métier encapsulé dans des composants et le code système géré par les conteneurs. De nombreux standards répondent à cette définition tels: CCM (CORBA Component Model), EJB (Entreprise Java Beans) et .NET. Cependant ces standards offrent un nombre limité et figé de services systèmes, supprimant ainsi toute possibilité d'ajout de services systèmes ou de reconfiguration dynamiques de l'intergiciel. Nos travaux proposent des mécanismes d'ajout et d'adaptation dynamique des services systèmes, basés sur un langage de reconfiguration adaptable dynamiquement aux besoins de la reconfiguration et sur un outil de reconfiguration dynamique. Un prototype a été réalisé pour la plateforme OpenCCM, qui est une implémentation de la spécification CCM de l'OMG.
    @inproceedings{decor:04:hachichi:reconf,
      author = {Hachichi, Assia and Martin, Cyril and Thomas, Gaël and Patarin, Simon and Folliot, Bertil},
      title = {Reconfigurations dynamiques de services dans un intergiciel à composants CORBA CCM},
      booktitle = {Proceedings of the conférence francophone sur le Déploiement et la (Re)configuration de logiciels, DECOR'04},
      year = {2004},
      pages = {159--170}
    }
  6. Scalevisor : un pilote CPU et mémoire pour les gros multicœurs. Alexis Lescouet, Nicolas Derumigny and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'18, pages 7.  2018. [Abstract] [BibTeX] [.pdf]
    Ces dernières années, le besoin de puissance de calcul a conduit à l’apparition de nouvelles architectures complexes utilisant le parallélisme pour gagner en puissance. Or, ces machines ne produisent que des performances médiocres si la gestion de ces ressources (mémoires, CPUs) ne permet pas de tirer profit du parallélisme. Malheureusement, l’introdution de nouvelles heuristiques de gestion de la mémoire dans les noyaux existants est un travail complexe qui requiert la modification de nombreuses parties du code. Plutôt que de modifier en profondeur le noyau, nous proposons de mettre en œuvre un pilote de périphériques dédié à la gestion de ces ressources et d’utiliser des techniques de virtualisation pour rendre ce pilote transparent pour le noyau. Ce pilote permettra la mise en œuvre de nouvelles heuristiques qui seront adaptables selon les spécificités du matériel et des applications.
    @inproceedings{compas:18:lescouet:scalevisor,
      author = {Lescouet, Alexis and Derumigny, Nicolas and Thomas, Gaël},
      title = {Scalevisor : un pilote CPU et mémoire pour les gros multicœurs},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'18},
      year = {2018},
      pages = {7}
    }
  7. Détection automatique d'interférences entre threads. Mohamed Said Mosli Bouksiaa, François Trahay and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'16, pages 7.  2016. [Abstract] [BibTeX] [.pdf]
    Comprendre les performances des applications multi-threadées peut s'avèrer difficile à cause des interférences entre threads. Alors que certaines interférences sont prévisibles (par exemple l'acquisition d'un verrou), d'autres sont plus subtiles (par exemple, le faux-partage) et complexes à détecter. Dans cet article, nous proposons une méthodologie et une métrique permettant de détecter les intérferences entre des threads et d'en quantifier l'impact sur les performances globales de l'application. Cette méthodologie consiste à étudier la variation de la durée d'exécution du code. Nous avons appliqué cette méthodologie à un ensemble de micro-benchmarks et d'applications. Les résultats montrent que cette méthodologie permet effectivement de détecter les interférences entre les threads d'une application.
    @inproceedings{compas:16:mosli:rdam,
      author = {Mosli Bouksiaa, Mohamed Said and Trahay, François and Thomas, Gaël},
      title = {Détection automatique d'interférences entre threads},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'16},
      year = {2016},
      pages = {7}
    }
  8. Détection automatique d'anomalies de performance. Mohamed Said Mosli Bouksiaa, François Trahay and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15, pages 10.  2015. [Abstract] [BibTeX] [.pdf]
    Le débogage des applications distribuées à large échelle ou encore des applications HPC est difficile. La tâche est encore plus compliquée quand il s’agit d’anomalies de performance. Les outils qui sont largement utilisés pour la détection de ces anomalies ne permettent pas d’en trouver les causes.

    Dans cet article, nous présentons une approche basée sur l’analyse des traces d’exécution de programmes distribués. Notre approche permet de détecter des motifs récurrents dans les traces d’exécution et de les exploiter pour isoler les anomalies de performance. Les anomalies sont ensuite utilisées pour en trouver les causes. Les résultats préliminaires montrent que nos algorithmes arrivent à détecter automatiquement de nombreuses anomalies et à les associer avec leurs causes.
    @inproceedings{compas:15:mosli:perf-analysis,
      author = {Mosli Bouksiaa, Mohamed Said and Trahay, François and Thomas, Gaël},
      title = {Détection automatique d'anomalies de performance},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15},
      year = {2015},
      pages = {10}
    }
  9. MVV : une plate-forme à composants dynamiquement reconfigurables -- La machine virtuelle virtuelle. Frédéric Ogel, Gaël Thomas, Antoine Galland and Bertil Folliot. Technique et Science Informatiques (TSI). Vol. 23(10/2004), pages 1269-1299.  2004. [Abstract] [BibTeX] [.pdf]
    Le nombre toujours croissant de domaine d'application émergent entraîne un nombre croissant de solutions ad hoc, rigides et faiblement interopérables. Notre réponse à cette situation est une plate-forme pour la construction d'applications et d'environnements d'exécutions flexibles et interopérables appelée machine virtuelle virtuelle. Cet article présente notre approche, l'architecture de la plate-forme ainsi que les premières applications.
    @article{book:04:ogel:tsi,
      author = {Ogel, Frédéric and Thomas, Gaël and Galland, Antoine and Folliot, Bertil},
      title = {MVV : une plate-forme à composants dynamiquement reconfigurables -- La machine virtuelle virtuelle},
      journal = {Technique et Science Informatiques (TSI)},
      publisher = {Hermes},
      year = {2004},
      volume = {23},
      number = {10/2004},
      pages = {1269--1299}
    }
  10. BatchQueue : file producteur / consommateur optimisée pour les multi-coeurs. Thomas Preud'Homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'11, pages 1-12.  2011. [Abstract] [BibTeX] [.pdf]
    Les applications séquentielles peuvent tirer partie des systèmes multi-coeurs en utilisant le parallélisme pipeline pour accroître leur performance. Dans un tel schéma de parallélisme, l'accélération possible est limitée par le surcoût dû à la communication coeur à coeur. Ce papier présente l'algorithme BatchQueue, un système de communication rapide concu pour optimiser l'utilisation du cache matériel, notamment au regard du pré-chargement. BatchQueue propose des performances améliorées d'un facteur 2 : il est capable d'envoyer un mot de données en 3,5 nanosecondes sur un système 64 bits, représentant un débit de 2 Gio/s.
    @inproceedings{cfse:11:preudhomme:batchqueue,
      author = {Preud'Homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {BatchQueue : file producteur / consommateur optimisée pour les multi-coeurs},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'11},
      year = {2011},
      pages = {1--12}
    }
  11. Jnjvm : une plateforme Java adaptable pour applications actives. Gaël Thomas, Bertil Folliot and Frédéric Ogel. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'03, pages 1-12.  2003. [Abstract] [BibTeX] [.pdf]
    Le nombre de machines virtuelles Java dédiées à des domaines applicatifs particuliers ne cesse d'augmenter. Chacune de ces machines virtuelles modifie ou enrichit la sémantique de la machine virtuelle standard de Sun pour implanter des mécanismes dédiés. Ces mécanismes ne modifient pas fondamentalement la structure de cette machine virtuelle (ramasse-miettes, JIT, chargeur etc...).

    Nous proposons dans cet article une solution alternative : une machine virtuelle Java ouverte permettant à une application de spécifier précisément son environnement d'exécution. La partie fonctionnelle de l'application reste écrite en Java et les mécanismes non fonctionnels permettent de construire à la volée une machine virtuelle Java adaptée à l'application.
    @inproceedings{cfse:03:thomas:jnjvm,
      author = {Thomas, Gaël and Folliot, Bertil and Ogel, Frédéric},
      title = {Jnjvm : une plateforme Java adaptable pour applications actives},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'03},
      year = {2003},
      pages = {1--12}
    }
  12. Les Documents actifs basés sur une machine virtuelle. Gaël Thomas, Bertil Folliot and Ian Piumarta. In Proceedings of the 2002 Atelier journées des Jeunes Chercheurs en Systèmes, chapitre francais de l'ACM-SIGOPS, pages 441-447.  2002. [Abstract] [BibTeX] [.pdf]
    Les documents numériques et les réseaux permettent d'améliorer la qualité et la diffusion des documents, mais ces progrès entraînent de nouveaux probles : multimédia, cohérence entre répliquats, QoS, droits d'auteur... Ces difficultés peuvent être résolues par des normes et des protocoles pour des cas génériques mais ne peuvent pas l'être pour les cas particuliers. Dans cet article, nous présentons les documents actifs qui introduisent du code exécutable dans les documents pour permettre à chaque document de choisir les solutions adéquates à ses besoins.
    @inproceedings{asf:02:thomas:docactif,
      author = {Thomas, Gaël and Folliot, Bertil and Piumarta, Ian},
      title = {Les Documents actifs basés sur une machine virtuelle},
      booktitle = {Proceedings of the 2002 Atelier journées des Jeunes Chercheurs en Systèmes, chapitre francais de l'ACM-SIGOPS},
      year = {2002},
      pages = {441--447}
    }
  13. Optimisation mémoire dans une architecture NUMA : comparaison des gains entre natif et virtualisé. Gauthier Voron, Gaël Thomas, Pierre Sens and Vivien Quema. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15, pages 10.  2015. Best paper award. [Abstract] [BibTeX] [.pdf]
    L’exécution d’applications dans une architecture NUMA nécessite la mise en œuvre de poli- tiques adaptées pour pouvoir utiliser efficacement les ressources matérielles disponibles. Dif- férentes techniques qui permettent à un système d’exploitation d’assurer une bonne latence mémoire sur de telles machines ont déjà été étudiées. Cependant, dans le cloud, ces systèmes d’exploitation s’exécutent dans des machines virtuelles sous la responsabilité d’un hyperviseur, qui est soumis à des contraintes qui lui sont propres. Dans cet article, nous nous intéressons à ces contraintes et à la manière dont elles affectent les politiques NUMA déjà existantes. Nous étudions pour cela les effets d’une technique d’optimisation mémoire connue dans un système virtualisé et les comparons avec ceux obtenus dans un système d’exploitation.
    @inproceedings{compas:15:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Sens, Pierre and Quema, Vivien},
      title = {Optimisation mémoire dans une architecture NUMA~: comparaison des gains entre natif et virtualisé},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15},
      year = {2015},
      pages = {10}
    }

PhD and HDR ()

  1. Improving the design and the performance of managed runtime environments. Gaël Thomas. School: UPMC Sorbonne Université.  2012. [Abstract] [BibTeX] [.pdf]
    With the advent of the Web and the need to protect users against malicious applications, Managed Runtime Environments (MREs), such as Java or .Net virtual machines, have become the norm to execute programs. Over the last years, my research contributions have targeted three aspects of MREs: their design, their safety, and their performance on multicore hardware. My first contribution is VMKit, a library that eases the development of new efficient MREs by hiding their complexity in a set of reusable components. My second contribution is I-JVM, a Java virtual machine that eliminates the eight known vulnerabilities that a component of the OSGi framework was able to exploit. My third contribution targets the improvement of the performance of MREs on multicore hardware, focusing on the efficiency of locks and garbage collectors: with a new locking mechanism that outperforms all other known locking mechanisms when the number of cores increases, and with a study of the bottlenecks incurred by garbage collectors on multicore hardware. My research has been carried out in collaboration with seven PhD students, two of which having already defended. Building on these contributions, in a future work, I propose to explore the design of the next generation of MREs that will have to adapt the application at runtime to the actual multicore hardware on which it is executed.
    @phdthesis{hdr:12:thomas,
      author = {Thomas, Gaël},
      title = {Improving the design and the performance of managed runtime environments},
      school = {UPMC Sorbonne Université},
      year = {2012}
    }
  2. Applications actives : construction dynamique d'environnements d'exécution flexibles homogène. Gaël Thomas. School: Université Pierre et Marie Curie.  2005. [Abstract] [BibTeX] [.pdf]
    L'émergence de nouveaux domaines informatiques entraîne de nouveaux besoins en terme de mécanismes systèmes que les environnements traditionnels ne couvrent pas. Actuellement, il n'existe pas de solution pour introduire ces mécanismes sans introduire d'hétérogénéité entre les plate-formes d'exécution. Pour résoudre ce problème, nous proposons de placer le code spécialisé dans l'application et d'exécuter l'application, qui devient active, dans un environnement générique et standard.

    Cette architecture repose sur une plate-forme hautement adaptable développée pendant ces travaux, la micro machine virtuelle. Elle a été testée avec une machine virtuelle Java réflexive et adaptable appelée la JnJVM. Pour valider notre approche, trois spécialisations de la JnJVM ont été implantées. Elles construisent des JVM dédiées au tissage d'aspects, à la migration d'un fil d'exécution et à de l'analyse d'échappement.
    @phdthesis{thesis:05:thomas,
      author = {Thomas, Gaël},
      title = {Applications actives : construction dynamique d'environnements d'exécution flexibles homogène},
      school = {Université Pierre et Marie Curie},
      year = {2005}
    }

Other publications ()

  1. VMKit: a substrate for virtual machines. Nicolas Geoffray, Gaël Thomas, Charles Clément, Bertil Folliot and Gilles Muller. Poster at the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '09.  2009. [BibTeX] [.pdf]
    @misc{poster-asplos:09:geoffray:vmkit,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil and Muller, Gilles},
      title = {VMKit: a substrate for virtual machines},
      year = {2009}
    }
  2. Assessing the scalability of garbage collectors on many cores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. Best papers from PLOS '11, ACM SIGOPS Operating System Review (OSR). Vol. 45(3), pages 15-19.  2011. [Abstract] [BibTeX] [.pdf]
    Managed Runtime Environments (MRE) are increasingly used for application servers that use large multi-core hardware. We find that the garbage collector is critical for overall performance in this setting. We explore the costs and scalability of the garbage collectors on a contemporary 48-core multiprocessor machine. We present experimental evaluation of the parallel and concurrent garbage collectors present in OpenJDK, a widely-used Java virtual machine. We show that garbage collection represents a substantial amount of an application's execution time, and does not scale well as the number of cores increases. We attempt to identify some critical scalability bottlenecks for garbage collectors.
    @article{osr:11:gidra:gc,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {Assessing the scalability of garbage collectors on many cores},
      journal = { Best papers from PLOS~'11, ACM SIGOPS Operating System Review (OSR)},
      publisher = {ACM},
      year = {2011},
      volume = {45},
      number = {3},
      pages = {15--19}
    }
  3. Remote Core Locking (RCL): migration of critical section execution to improve performance. Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. Poster at the EuroSys European Conference on Computer Systems, EuroSys '11.  2011. [BibTeX] [.pdf]
    @misc{poster-eurosys:11:lozi:rcl,
      author = {Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking (RCL): migration of critical section execution to improve performance},
      year = {2011}
    }
  4. Remote Core Locking: Migrating critical section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. Work in progress at the Symposium on Operating Systems Principles, SOSP '11.  2011. [BibTeX]
    @misc{wip-sosp:11:lozi:rcl,
      author = {Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: Migrating critical section execution to improve the performance of multithreaded applications},
      year = {2011}
    }
  5. Building a flexible Java runtime upon a flexible compiler. Gaël Thomas, Frédéric Ogel, Antoine Galland, Bertil Folliot and Ian Piumarta. Special Issue on `System & Networking for Smart Objects' of IASTED International Journal on Computers and Applications. Vol. 27, pages 28-47.  2005. [Abstract] [BibTeX] [.pdf]
    While Java has become a de facto standard for mobile code and distributed programming, it is still a rigid and closed execution environment. Not only does this lack of flexibility severely limit the deployment of innovations, but it imposes artificial constraints to application developers. Therefore, many extensions to the JVM have been proposed, each of them dealing with specific limitations, such as emerging devices (mobile phones, smart cards), or constraints (real-time, fault tolerance). It leads to a proliferation of ad hoc solutions requiring the design of new virtual machines. Furthermore, those solutions are still rigid, closed and poorly interoperable. In response to this problem, we propose a flexible Java execution environment, called the JnJVM, that can be dynamically adapted to applications' needs as well as to available resources.
    @article{ijca:05:thomas:javaflexible,
      author = {Thomas, Gaël and Ogel, Frédéric and Galland, Antoine and Folliot, Bertil and Piumarta, Ian},
      title = {Building a flexible Java runtime upon a flexible compiler},
      journal = {Special Issue on `System & Networking for Smart Objects' of IASTED International Journal on Computers and Applications},
      publisher = {ACTA Press},
      year = {2005},
      volume = {27},
      pages = {28--47}
    }