STM (Software Transactional Memory) appears like a good way to make use of multicore processors. It has the advantage that writing complex concurrent software is very easy for certain types of problems. But the overhead of implementing STM means that it may actually run slower on an 8-processor system (say hyperthreaded dual core i7) than a single threaded dedicated process.
The advantage of writing a single threaded solution is that there is no need for blocking and hence task-switching; there is no reason to synchronize local cache with main memory; and the JIT optimizer can do maximum optimization as there are no volatile variables that can't be cached and no reason to not re-order instructions optimally.
The problem with STM is that memory is shared among many processes, so values in main memory can be different to local cache. These will need to be synchronized. There is also code tracking reads and more complex for writes. There are issues with writing to a variable being read by different transactions. These normally cause a transaction to retry, yet more overhead. As you add more processors, you add more chance of contention for a variable and hence retrying. Since useful transactions will end up writing a value, the chance of it retrying increases with more transactions.
After having written a read-tracking STM and a MVCC-based STM to experiment with these concepts, it seems that the contention problem will be the biggest issue over all. The overhead of running the STM is petty much constant for each thread. It makes things slower, but adding more processors makes things appear faster. So a 1024-processor 3 GHz chip should run these solutions much faster. But as you increase the number of processors you increase the amount of contention for shared variables and hence one or more transactions have to retry. So a transaction wanting to write to a specific variable is more likely to find that one of the other 1023+ transactions has read the value and hence the writing transaction will have to retry.
Making systems where contention for a variable is highly unlikely seems to be the most approach when using STM. If you expect low or reasonable contention for a value then STM is probably not a good solution.
This depends on the type of problem you are trying to solve. I'm looking at solving a mathematical algorithm faster by using a mutlicore processor. STM does not seem to be a suitable approach for this due to the overhead and the retrying issues. It won't be common for different threads to use the same variable, but it will occur often enough. Writing the algorithm as atomic sections for STM is very appealing but overall the STM solution will be slow to start with and is unlikely to scale well due to collisions causing transaction retrying.
It would seem that mathematical algorithms may be more suited for traditional concurrency with mostly-persistent objects.





