Revision 3771e48...
Go back to digest for 13th January 2013Optimization in Development Tools
Introduce a thread local memory pool cache and refactor the code.
This optimizes the speed of our allocator, especially for the case
of multiple repeated constructions of memory pools as is being
done in the ExpressionParser. Funnily enough, I also managed to
increase the performance of the 'normal' operation for a single pool
with many allocations as is done in the full file parsing.
The way it works is that we keep a list of free, allocated blocks
around per thread. Right now, we keep up to 32 blocks of 64K around,
i.e. up to a total of 2MB per thread. This memory is only reclaimed
once the thread finishes. In KDevelop's case this means: never until
the app is closed. Still, I think up to 2MB per thread is not a big
deal and we could potentially decrease this.
Furthermore we should try to share these memory blocks between the
different language plugins. Right now most other lang plugins use
KDevelop-PG-Qt's allocator. I'll investigate how to change that.
New numbers are:
RESULT : TestPool::benchManyPools():
0.0043 msecs per iteration (total: 71, iterations: 16384)
(approx 9x as fast as before)
RESULT : TestPool::benchManyAllocations():
0.000038 msecs per iteration (total: 80, iterations: 2097152)
(approx 1.8x as fast as before)
For the expression parser benchmark the numbers are now:
RESULT : TestExpressionParser::benchEvaluateType():"global-int-number":
- 0.055 msecs per iteration (total: 57, iterations: 1024)
+ 0.017 msecs per iteration (total: 70, iterations: 4096)
RESULT : TestExpressionParser::benchEvaluateType():"global-long-number":
- 0.056 msecs per iteration (total: 58, iterations: 1024)
+ 0.018 msecs per iteration (total: 74, iterations: 4096)
RESULT : TestExpressionParser::benchEvaluateType():"global-long-long-number":
- 0.057 msecs per iteration (total: 59, iterations: 1024)
+ 0.019 msecs per iteration (total: 78, iterations: 4096)
RESULT : TestExpressionParser::benchEvaluateType():"main-a.b":
- 0.074 msecs per iteration (total: 76, iterations: 1024)
+ 0.035 msecs per iteration (total: 72, iterations: 2048)
The other benchmarks stay the same, mostly because they manage to
circumvent the need for a full text parsing and thus the allocator.
Also note: In both callgrind and perf the rxx_allocator::allocator
was a hotspot when looking at "duchainify path/to/kdevelop" and
that mostly due to the repeated calls to ExpressionParser for simple
one-word strings which still allocated a full block of memory. Now
this can be reused and thus operates faster. Saldy, timing the run
of duchainify though does not show any noticeable speedup.
File Changes
- /cpp/parser
- languages/rxx_allocator.h
- languages/tests/test_pool.cpp