On std::list::size

There is considerable debate on whether std::list<T>::size() should be O(1) or O(N). The usual argument notes that it is a tradeoff with:

splice(iterator position, list& x, iterator first, iterator last);

If size() is O(1) and this != &x, then this method must perform a linear operation so that it can adjust the size member in each list:

std::distance(first, last);

Note that often this argument is stated as: "O(1) size leads to O(N) splice." This is an exaggeration. There are 5 possible valid splice situations: splicing from self or splicing from another list, crossed with splicing 1 element, some elements, or all elements. The O(1) size leads to an O(some) splice in one of these five situations.

	splice from self	splice from other
one	O(1)	O(1)
some	O(1)	O(some) - distance(first, last)
all	Not valid	O(1)

With on O(1) size, the splice method that splices "some" from "other" must count the number of elements specified by "some".

However, having a O(1) size() allows for optimizations in the following list methods:

list& operator=(const list&);
iterator assign(iterator, iterator);  // forward or better
void assign(size_type, const T&);

For the case where itertor is forward or better, and using a O(1) size, one can increase the exception safety gaurantee from basic to strong for the case that T's assignment has a nothrow gaurantee.

void resize(size_type, T);

For the case where size is shrinking, using O(1) size one can choose whether it is better to iterate from begin to the start of the erase range, or from end to the start of the erase range.

bool operator==(list, list);
bool operator!=(list, list);

Using an O(1) size one can check the size first, befor proceeding with an element by element check.

Summary:

O(1) size()

Pro

Con

Faster size()

Slower splice some from other

Faster resize()

One more word of overhead

Faster == and !=

Increased exception safety on op= and assign for some special but common cases.

splice Proposal

void splice(iterator position, list& x, iterator first, iterator last, size_type n);

Effects: Inserts elements in the range [first, last) before position and removes the elements from x.

Requires: [first, last) is a valid range in x. The result is undefined if position is an iterator in the range [first, last). Invalidates only the iterators and references to the spliced elements. n == distance(first, last).

Throws: Nothing.

Complexity: Constant time.

This new splice signature allows the client to pass the distance of the input range in. This information is often available at the call site. If it is passed in, then the operation is constant time, even with an O(1) size. Debugging builds can easily double check this precondition.

boost survey

I conducted a survey of a snap shot of boost, just after the release of version 1.33, on Nov. 23, 2005. I looked for places list::size() was used, and for places list::splice (some from other) was used. This is not meant to be a criticism of boost. It is meant to show the dangers of having an O(N) size, and contrast that with the ease with which splice can usually be made O(1). The survey of list::size was found by compiling the boost libs with list::size commented out to trigger errors. So this survey misses uses which are not instantiated as the boost libs are compiled. Finding those uninstanted uses of list::size is a much more difficult task. The search for list::splice was done with a simple "find" in my editor (compiling with splice commented out turned up 0 use cases). Below is what I found:

--------------------------------------------
boost/graph/detail/adjacency_list.hpp

    // O(1)
    template <class Config>
    inline typename Config::edges_size_type
    num_edges(const bidirectional_graph_helper_with_property<Config>& g_) 
    {
      typedef typename Config::graph_type graph_type;
      const graph_type& g = static_cast<const graph_type&>(g_);
      return g.m_edges.size();  // list::size()
    }

Used from:

boost/libs/python/build/../src/object/inheritance.cpp

    for (cast_graph*const* p = g + (is_downcast ? 1 : 0); p < g + 2; ++p)
    {
        edge_t e;
        bool added;

        tie(e, added) = add_edge(src, dst, **p);
        assert(added);

        put(get(edge_cast, **p), e, cast);
        put(get(edge_index, **p), e, num_edges(full_graph().topology()) - 1);
    }

Loop appears to be O(N²).
--------------------------------------------
boost/regex/pending/object_cache.hpp

template <class Key, class Object>
boost::shared_ptr<Object> object_cache<Key, Object>::do_get(const Key& k, size_type max_cache_size)
{
   ...
   list_size_type s = s_data.cont.size();  // list::size()
   ...
}

Used from:

boost/regex/pending/object_cache.hpp

   boost::static_mutex::scoped_lock l(mut);
   if(l)
   {
      return do_get(k, max_cache_size);
   }

O(N) operation under lock.
--------------------------------------------
boost/libs/serialization/build/../src/basic_iarchive.cpp

    while(created_pointers.size() > 0){  // list::size()

Loop appears to be O(N²).  This one could easily be fixed with empty().
--------------------------------------------
/Volumes/Data/Development/boost-dev/boost/libs/thread/build/../src/thread.cpp

int thread_group::size()
{
        return m_threads.size();  // list::size()
}

There is no thread_group::empty.  Clients have no choices.

--------------------------------------------
splice-some-from-other is much easier to search for:
--------------------------------------------

boost/multi_index/detail/seq_index_ops.hpp

template <typename SequencedIndex,typename Compare>
void sequenced_index_merge(SequencedIndex& x,SequencedIndex& y,Compare comp)
{
  typedef typename SequencedIndex::iterator iterator;
  if(x!=y){
    iterator first0=x.begin(),last0=x.end();
    iterator first1=y.begin(),last1=y.end();
    while(first0!=last0&&first1!=last1){
      if(comp(*first1,*first0))x.splice(first0,y,first1++);
      else ++first0;
    }
    x.splice(last0,y,first1,last1);  // distance(first1,last1) can easily be 
                                     // computed by client if list::size is O(1).
  }
}

The function could be rewritten like:

template <typename SequencedIndex,typename Compare>
void sequenced_index_merge(SequencedIndex& x,SequencedIndex& y,Compare comp)
{
  typedef typename SequencedIndex::iterator iterator;
  typedef typename SequencedIndex::size_type size_type;
  if(x!=y){
    iterator first0=x.begin(),last0=x.end();
    iterator first1=y.begin(),last1=y.end();
    size_type n = y.size();           // assume O(1) size
    while(first0!=last0&&first1!=last1){
      if(comp(*first1,*first0))
      {
        x.splice(first0,y,first1++);
        --n;
      }
      else ++first0;
    }
    x.splice(last0,y,first1,last1,n);  // using proposed O(1) splice
  }
}
--------------------------------------------
boost/wave/util/cpp_macromap.hpp

	// splice the last token back into the input queue
	typename ContainerT::reverse_iterator rit = pending.rbegin();

		first.get_unput_queue().splice(
			first.get_unput_queue().begin(), pending,
			(++rit).base(), pending.end());

The splice range is known to be of length 1.
--------------------------------------------

If you use std::list at all in your code, I encourage you to edit your standard headers, commenting out list::size, and recompile your code. Closely inspect each use of list::size that the compiler highlights. Would you still use it if list::size is implemented with distance(begin(), end())?

As the standard is written today, none of the containers are required to have an O(1) size(). I believe that this should be corrected in C++0X. If a container has a size() member, it should be required to have O(1) complexity. If a container can not manage this, it should not have a size() member at all. Clients can always compute distance(begin(),end()).

For backwards compatibility reasons, all existing standard containers should have an O(1) size(). New containers which are introduced (slist?) may not have a size() member.

Pro	Con
Faster size()	Slower splice some from other
Faster resize()	One more word of overhead
Faster == and !=
Increased exception safety on op= and assign for some special but common cases.