Before we start, let me ask you a question:
Have you ever needed a Lock-Free algorithm to solve some problem? Why did it have to be Lock-Free?
This is a loaded question, in the sense that we're trying to separate the problems that need or benefit from Lock-Free guarantees, from the problems for which a Blocking solution is enough. So what kind of problems can take advantage of a Lock-Free algorithm?
Scalable under high-contention, you say? Yes, that does sound like an appropriate candidate for a Lock-Free algorithm, but in practice it depends a lot on certain characteristics of the algorithm.
For example, if there are many CAS operations in loops, as you increase the number of threads, there will be a lot more retries of those CAS, and remember that CAS are expensive and most likely the bottleneck of the algorithm. What this means in practice is, that the more threads compete on CAS operations the slower your code will run overall, regardless of how many threads you add.
If you don't have too many threads to begin with, then why not just use a blocking approach? It's usually much simpler and faster to implement, with locks for example.
When the problem allows it, you can even use read-write locks which, unlike most Lock-Free algorithms, have been show to be scalable with the number of threads doing Reads.
Low Latency under high-contention, I hear some of you say? Well, Lock-Free only gives you the guarantee that some thread is doing progress, not necessarily that the current thread is doing progress on its own task, but let us assume that you don't have a high-priority thread, or a thread with more priority that the others (in that case you would need a Wait-Free algorithm to achieve decent low latency guarantees).
For this one, Lock-Free does give you better assurances than Blocking, but don't expect too much out of it. First, you will most likely lose some performance (but not much), and second, a certain task may take infinite time to complete if another thread keeps jumping in and causing the current operation to restart. This implies that Lock-Free algorithms that have many CAS in loops are impractically slow due to the number of retries, not to mention complexity.
Now that we know why to use Lock-Free, let me remind you that there are many Lock-Free algorithms that implicitly allocate or release memory during some of its steps.
That I'm aware of, there is no system with a Lock-Free memory manager, and in fact, in most systems, memory allocation can on occasion be an extremely slow process. Imagine for example you allocate a new object and the system needs to get a new memory page for you, of even worse, has ran out of RAM and needs to swap some of its old contents to the hard drive.
Having an occasional blocking and slow memory allocation call isn't too bad if you're concerned with scalability, but you're problem requires low-latency then, using Lock-Free with memory allocation/release in the middle takes away whatever guarantees the Lock-Free algorithm itself might provide. This is problematic regardless of using a managed or unmanaged memory system, i.e. with or without Garbage Collection.
I believe this has very strong implications on the applicability of Lock-Free algorithms, so strong that we should have two distinct classes of naming for Lock-Free algorithms:
Lock-Free-Memory-Unbounded: The algorithm is Lock-Free but some or all of its functions require the allocation of new objects in memory, or release of same objects.
Lock-Free-Memory-Bounded: The algorithm is Lock-Free and none of its functions require allocation of new objects, or the usage is limited and can be pre-allocated once, and then used with a pool pattern that is itself also implemented in a Lock-Free way.
If we're fundamentalist about it, we can even say that Lock-Free-Memory-Unbounded algorithms are in fact not Lock-Free at all, because in the middle of it there will be operations that will be Blocking, and can cause the task to block.
I'm not going to be so fundamentalist. After all, the memory allocation is usually fast, and the garbage collector only jumps in once in a while.
One thing is for sure, given a problem that needs low-latency with a choice between two solutions, one being a Lock-Free-Memory-Unbounded and the other a Lock-Free-Memory-Bounded, I would go for the Memory-Bounded one, even if it has lower performance.
By the way, replace in all the sentences above the words Lock-Free with Wait-Free, and it still holds true!
Having said that, I would like to restate my order of preference for concurrent algorithms, with the topmost one being the best possible kind of algorithms you can achieve: