Unfair lock optimisation

If you experience the following in your multithreaded application, you might be a victim of unfair lock optimisation:

threads wait for a long time to obtain a lock, as if there was a contention problem.
looks like a deadlock, but is not, because the every thread eventually gets the lock.
when a lock is obtained by a certain thread, it keeps getting it for an unusually long time, starving other threads.
increasing the number of threads makes problem worse, decreasing palliates it.
Visual Studio built-in performance test (C# only) runs the application just fine and shows no problem.
can exibit hard to explain behaviour, for instance: thread starvation at the begining, then when it "warms up", continue to work fine.

Fair locking usually means that when a thread wants to obtain a lock, it goes to the back of the queue and waits for its turn. If all threads in front of it correctly release the lock, it will get it next. That way no thread is starved and all get equal opportunity to entering a critical section. To switch to the next thread, if there are more of them than available processors, a context switch is needed and they are expensive. When deciding which thread to let in next, mutex will give bigger priority to a thread that already had the lock, if that thread tries to obtain the same lock again in the same time slice, after it had just released it. That way it avoids an obviously uneccessary context switch. I think this was first introduced in Vista, I don't think Linux uses this optimisation. It makes sense, but might surprise you if you ever do something I did.

I designed application in this way: one shared container, one inserter thread, and plenty of taker threads. Each taker thread was trying to pop an item from the container, do some time consuming stuff with it, and return for more when it's finished. The problem was that when the container was empty, the thread that currently owned the mutex would unlock it, do nothing because there was nothing to work on, and returned at the start. That created a really tight loop around the mutex protecting the container, and usually the same thread would try to obtain the same lock again after just a short while. Additional confusion arose because the inserter thread was sufficiently faster than taker threads to keep container full from the time it first came round to it, so it looked like an application just needs time to "warm up", which is ridiculous.

This is not a glitch how mutex works, of course. This is a bad design. Threads should generally not spend majority of their time obtaining and releasing mutexes. In dumbest case that means it will work fine if you simply added 10ms wait after releasing the mutex, just enough to miss the end of time slice before returning for another go at the lock. That fixed my problem.

Previous: An ulta-paranoid tinfoil hat implementation of reasonably safe storage for numerical types
Next: Strict type-safety framework