You are viewing bgoglin

Brice Goglin's Blog - MMU notifiers brings into Linux what we've been wanted for HPC for a while

Jul. 29th, 2008

19:06 - MMU notifiers brings into Linux what we've been wanted for HPC for a while

Previous Entry Share Next Entry

After the addition of ioremap_wc() in 2.6.26, MMU notifiers have now been merged in 2.6.27-rc1. It means that everything we have been wanting in the past to help HPC support is finally available upstream. We thought IB being merged (back in 2.6.11) would make things go fast, but it looks like these important features were not that obvious to people that did not work on HPC for a long time.

Back in 2004, I was trying to get a safe registration cache working in the kernel for distributed storage over Myrinet. User-space regcaches are known to be a mess because they need to intercept malloc/free/munmap to invalidate cached segments. It works sometimes, but it is often a mess. In the kernel, you just can't intercept anything. So I wrote a patch called VMASpy which allowed other subsystems to be notified when part of a "registered" VMA is unmapped or forked. I never submitted it since it couldn't be accepted unless somebody in the kernel (i.e. IB) used it. Given posts like this, we see that IB people weren't conscious of the problem (nowadays they are interested but something in the IB specs apparently prevents them from using this).

KVM needed some kernel support for its shadow pages, so MMU notifiers were written by Andrea Arcangeli (thanks a lot to him for keeping working on this despite many people not liking it). After a couple months of trolls, here we go with 2.6.27-rc1, we can now register a notifier per mm_struct and get a callback when part of the address space is unmapped. The implementation is very different from my VMASpy and of course much better :) But the final API provides similar features, so it should be great news for people working on registration caches or so.

(Permanent link

Tags: , ,

Comments:

From:(Anonymous)
Date:August 5th, 2008 14:05 (UTC)
(Link)
Came across your site and wondering if we're related. My grandfather was August Goglin from Germany, came to NY around 1900. Didn't think it was a very common name. Donna Goglin Gustaitis grgus@aol.com
(Deleted comment)
From:(Anonymous)
Date:June 25th, 2010 19:26 (UTC)

Re: Is MX, an other interconnect API or an MPI implementation take advantage of this ?

(Link)
Oups I forgot to type the message...

Since memory registration and deregistration is cited since long by many as an important source of overhead for message passing application, do you know if any message passing middleware now takes advantage of this on newer Linux kernels ?

Especially:

If the current version of MX takes advantage of this ?

Did the Infiniband folks found out how to take advantage of it ?

Is there any MPI implementation that takes advantage of this convenient feature ?

With all dirty tricks that developers have used to cope with this problem (registration cache, special malloc()/free() functions and kernel patches), I guess these developers are eager to use this cleaner Linux feature.

Thanks,

Martin Audet
From:bgoglin
Date:July 2nd, 2010 07:40 (UTC)

Re: Is MX, an other interconnect API or an MPI implementation take advantage of this ?

(Link)
OpenMPI will use MMU notifiers when the ummunotify interface will be accepted in the kernel (OpenMPI is implemented is user-space, so it needs somebody to expose MMU notifiers to user-space, that's what ummunotify does).

MX does not use it, IB neither. But both will be able to benefit from it through Open MPI for instance.