Suppose you are interviewing a candidate–let’s call him Ben–for a programming job and you ask the well-worn question:
What’s the difference between a mutex and a semaphore?
Ben:
The difference between a mutex and a semaphore? Well, a mutex allows only one thread at a time, while a semaphore can allow several threads to run at the same time.
Now, you’d like to probe a little deeper, so you follow up with:
OK. Can you give me an example of when you would use a semaphore instead of a mutex?
Ben thinks for a moment. This one’s a little harder to answer, but he remembers something he read once on the Internets:
Yeah, sure. You would use a semaphore when you need to control access to more than one resource. For example, if you’ve got a gas station with one restroom, you would only need one key, since only one person at a time can use the bathroom. You could use a mutex for that, which is basically just a binary semaphore.
Now, if your gas station had more than one restroom, you would need to use a semaphore so that more than one person could take care of business at the same time.
Not bad. But there’s a better answer.
The differences between a semaphore and a mutex are more subtle than most programmers realize. And, as we all know, Subtle Things have a way of turning themselves into Bad Things when nobody happens to be paying attention to them. Fortunately, you can (mostly) avoid the whole mess by following two simple rules of thumb:
Thow shalt use the mutex to protect all thy possessions that thou hast in common among you. This is the first and greatest commandment, and the second is like unto it. Thou shalt use the mutex and the condition variable to signal all thy workers in the field.
On these two commandments hang all the threads and the programs.
Semaphores have two traits that are particularly problematic. First, semaphores–as opposed to mutexes–have no notion of lock ownership. In other words, one thread can unlock a semaphore that was previously locked by a different thread. This is bad mojo.
In my house I have a very active 2-year old son. Now, as you may or may not know, 2-year olds are (a) too old to sleep in cribs (they just climb out), and (b) too young to stay put in a regular bed. This poses some unfortunate challenges at bed time. The solution? Turn out the lights and lock the bedroom door! Hooray! Eventually the kid will get bored and go to sleep! (This is what sleep-deprived parents everywhere desperately pray, as they lie silently in their beds.)
Unfortunately, 2-year olds have one more trait I didn’t mention: they are, without a doubt, too smart for their own good. This situation is compounded by the fact that the toddler in question happens to share his room with a big-brother (6 years old), who understandably finds it less-than-ideal to be locked in his own room, and who is also too smart for his own good.
It doesn’t take long for Big Brother to hack his way out of the room using a clip-on tie (hey, they’re stylish and practical!).
Semaphores, as it turns out, are surprisingly similar to locks on bedroom doors of ambitious children. It is really hard to ensure that once one thread (Dad) locks a semaphore, you don’t “accidentally” get another thread (Big Brother) unlocking it.
But wait! There’s more! Semaphores have yet one more issue, related to the first. It’s called Priority Inversion™. There’s actually a great little Windows CE article (yes, Windows CE of all things) over at MSDN that does a decent job explaining the problem:
Priority inversion occurs when a mutex, critical section, or semaphore resource held by a lower-priority thread delays the execution of a higher-priority thread when both are contending for the same resource.
For mutexes and critical sections, to correct this situation and free the higher-priority thread, with priority inheritance, Windows CE enables the lower-priority thread to inherit the more critical thread’s priority and run at the higher priority until it releases its use of the resource. This mechanism does not work for semaphores, which do not have a specific owner associated with them. Therefore, priority inversion may occur when using semaphores.
Because an unbounded amount of time is needed to relinquish the inverted thread and it is out of the control of the kernel, the OEM loses control of the scheduling process. To guarantee real-time performance, OEMs should ensure that a priority inversion condition does not occur.
Note that last line: “…[you] should ensure that a priority inversion condition does not occur.” Right. That sounds like a blast.
The bottom line is this: semaphores can be a royal pain. Fortunately, there are ways to avoid ever having to employ the little beasties:
- For managing access to a shared resource where only ONE thread should be working with the resource at any one time, simply use a mutex. Note here that a mutex != a semaphore with a count of 1.
- If you are implementing a classic producer/consumer model with a thread pool, strongly consider using a combination of a condition variable and mutex. Chances are that the threading library you are using supports this natively (e.g., pthread, Boost, Win32, .NET).
- Do something wild and crazy, like switch to Erlang.
Bonus! You can choose from several specialized locks that can be more performant in certain situations, such as critical sections and multi-reader/single-writer locks. Any threading library worth its salt will support at least one or two specialized locks.
So, now that we’ve covered when not to use a semaphore, the question becomes: when should you use a semaphore? Honestly, I have no idea.
Here is a good question for you. Imagine you are writing code in C using POSIX threads. Let’s say you have a single thread responsible for writing to a variable representing a boolean flag. You have a couple of other threads reading this variable and using this boolean flag to determine their behavior.
Should read-only access to this boolean flag be protected by a mutex? Or is it kosher to read the value of the variable w/o any protection?