vector<bool>
vector<bool>
has taken a lot of heat over the past decade,
and not without reason. However I believe it is way past time to draw back
some of the criticism and explore this area with a dispassionate scrutiny of
detail.
There are really two issues here:
vector<bool>
?
I have strong opinions on both of these questions. And to get this out of the way up front:
The array of bits data structure is a wonderful data structure. It is often both a space and speed optimization over the array of bools data structure if properly implemented. However it does not behave exactly as an array of bools, and so should not pretend to be one.
vector<bool>
?
Because vector<bool>
holds bits instead of bools, it can't
return a bool&
from its indexing operator or iterator dereference.
This can play havoc on quite innocent looking generic code. For example:
template <class T> void process(T& t) { // do something with t } template <class T, class A> void test(std::vector<T, A>& v) { for (auto& t : v) process(t); }
The above code works for all T
except bool
.
When instantiated with bool
, you will receive a compile time
error along the lines of:
error: non-const lvalue reference to type 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>' cannot bind to a temporary of type 'reference' (aka 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>') for (auto& t : v) ^ ~ note: in instantiation of function template specialization 'test<bool, std::allocator<bool>>' requested here test(v); ^ vector:2124:14: note: selected 'begin' function with iterator type 'iterator' (aka '__bit_iterator<std::vector<bool, std::allocator<bool>>, false>') iterator begin() ^ 1 error generated.
This is not a great error message. But it is about the best the compiler can
do. The user is confronted with implementation details of vector
and in a nutshell says that the vector
is not working with a
perfectly valid ranged-based for statement. The conclusion the client
comes to here is that the implementation of vector
is broken.
And he would be at least partially correct.
But consider if instead of vector<bool>
being a
specialization instead there existed a separate class template
std::bit_vector<A = std::allocator<bool>>
and the coder
had written:
template <class A> void test(bit_vector<A>& v) { for (auto& t : v) process(t); }
Now one gets a similar error message:
error: non-const lvalue reference to type 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>' cannot bind to a temporary of type 'reference' (aka 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>') for (auto& t : v) ^ ~ note: in instantiation of function template specialization 'test<std::allocator<bool>>' requested here test(v); ^ bit_vector:2124:14: note: selected 'begin' function with iterator type 'iterator' (aka '__bit_iterator<std::bit_vector<std::allocator<bool>>, false>') iterator begin() ^ 1 error generated.
And although the error message is similar, the coder is far more likely to see that he is using a dynamic array of bits data structure and it is understandable that you can't form a reference to a bit.
I.e. names are important. And creating a specialization that has different behavior than the primary, when the primary template would have worked, is poor practice.
vector<bool>
?std::bit_vector<A = std::allocator<bool>>
and that
vector
was not specialized on bool
.
bit_vector<>
can be much more than simply a space
optimization over vector<bool>
, it can also be a very
significant performance optimization. But to achieve this higher
performance, your vendor has to adapt many of the std::algorithms to have
specialized code (optimizations) when processing sequences defined by
bit_vector<>::iterator
s.
find
For example consider this code:
template <class C> typename C::iterator test() { C c(100000); c[95000] = true; return std::find(c.begin(), c.end(), true); }
How long does std::find
take in the above example for:
vector<bool>
?bit_vector<>
using an optimized
find
?bit_vector<>
using the unoptimized
generic find
?I'm testing on an Intel Core i5 in 64 bit mode. I am normalizing all answers such that the speed of A is 1 (smaller is faster):
An array of bits can be a very fast data structure for a sequential
search! The optimized find
is inspecting 64 bits at a time. And
due to the space optimization, it is much less likely to cause a cache miss.
However if the implementation fails to do this, and naively checks one bit at a
time, then this giant 75X optimization turns into a significant pessimization.
count
std::count
can be optimized much like std::find
to
process a word of bits at a time:
template <class C> typename C::difference_type test() { C c(100000); c[95000] = true; return std::count(c.begin(), c.end(), true); }
My results are:
Here the results are not quite as dramatic as for the std::find
case. However any time you can speed up your code by a factor of 20, one
should do so!
fill
std::fill
is yet another example:
template <class C> void test() { C c(100000); std::fill(c.begin(), c.end(), true); }
My results are:
The optimized fill is over twice as fast as the non-specialized
vector<bool>
. But if the vendor neglects to specialize
fill
for bit-iterators the results are disastrous! Naturally
the results are identical for the closely related fill_n
.
copy
std::copy
is yet another example:
template <class C> void test() { C c1(100000); C c2(100000); std::copy(c1.begin(), c1.end(), c2.begin()); }
My results are:
The optimized copy is approaches three times as fast as the non-specialized
vector<bool>
. But if the vendor neglects to specialize
fill
for bit-iterators the results are not good. If the copy is
not aligned on word boundaries (as in the above example), then the optimized
copy slows down to the same speed as the copy for A. Results for
copy_backward
, move
and move_backward
are
similar.
swap_ranges
std::swap_ranges
is yet another example:
template <class C> void test() { C c1(100000); C c2(100000); std::swap_ranges(c1.begin(), c1.end(), c2.begin()); }
My results are:
Yet another example of really good results with an optimized algorithm and really poor results without this extra attention.
rotate
std::rotate
is yet another example:
template <class C> void test() { C c(100000); std::rotate(c.begin(), c.begin()+c.size()/4, c.end()); }
My results are:
Yet another example of good results with an optimized algorithm and very poor results without this extra attention.
equal
std::equal
is yet another example:
template <class C> bool test() { C c1(100000); C c2(100000); return std::equal(c1.begin(), c1.end(), c2.begin()); }
My results are:
Yet another example of excellent results with an optimized algorithm and very poor results without this extra attention.
The dynamic array of bits is a very good data structure if attention is paid to optimizing algorithms that can process up to a word of bits at a time. In this case it becomes not only a space optimization but a very significant speed optimization. If such attention to detail is not given, then the space optimization leads to a very significant speed pessimization.
But it is a shame that the C++ committee gave this excellent data structure
the name vector<bool>
and that it gives no guidance nor
encouragement on the critical generic algorithms that need to be optimized for
this data structure. Consequently, few std::lib implementations go to this
trouble.