Pseudo Associative

Random thoughts on software development

Posts tagged STL

4 notes

When pointers are better than iterators

The STL is a very well designed library, and iterators is a powerful concept. Unfortunately you can not fully utilize the power of iterators unless you write generic code:

template<typename Iterator>
myfunc(Iterator begin, Iterator end) {
  // Do stuff
}

Now the function does not depend on the type of the iterator being used in the call. However C++, being a multi-paradigm language means you might be writing OO code and not generic code. E.g. you might have defined an interface on one of your classes:

class MyInterface {
  virtual void GetStuff(int count, vector<int>::iterator out) const = 0;
};

I can’t define interface methods to be template based. The problem with the interface above is that I force the receiver to use a particular kind of iterator. E.g. I have to write code like this:

vector<int> buffer(count);
myvar->GetStuff(count, buffer.begin());

But what if I need buffer to be of any of these alternative types?

 int * 
 QVector<int> 
 vector<int, myallocator>  

Then I would be screwed, because they all have a different concrete iterator type. However if we change the interface to:

class MyInterface {
  virtual void GetStuff(int count, int *out) const = 0;
};

Then we can use any kind of vector type:

vector<int> buffer(count);
myvar->GetStuff(count, &buffer[0]);

This code does not care which type buffer is as long as it represents an array of data in contiguous memory.

These observations give a few rules of thumb for when to use iterators and when to use pointers. * Never use iterators to receive data. Use pointers. * Let your internal data be accessed only through STL style iterators, but only ever use these iterators with generic functions. This will allow you to change internal container type without breaking code. If you use non-template functions you will break code.
* Never return pointers to your internal data. Instead push or pull data to/from external buffers through pointers. Non-generic functions should interface with pointer based methods.

So say I have a polygon class I should be able to do stuff like:

// generic function so use iterators
std::sort(poly.beginVertices(), poly.endVertices());
std::vector<Vertex> buffer(10);

// setting and getting values, so use pointers
poly.getVertices(&buffer[0], poly.size());
poly.setVertices(&buffer[0], poly.size());

// Non-template based class, so use pointers
vertexProcessor->doStuff(&buffer[0], poly.size());

// If class was template based, then we could to it like this
vertexProcessor.doStuff(poly.beginVertices(), poly.endVertices());

I know the first objection you probably have is that the pointer solution leads to a lot of unnecessary copying of data. That is potentially a performance bottleneck, right? Not necessarily. Typically you don’t need to try to optimize this. If it proves to be a problem the solution is fairly simple. Copy out only small chunks at a time. E.g. fetch 10 elements, do some processing, fetch the next 10 elements. Rinse repeat.

If you have a small fixed size buffer you keep pull data into, it will most likely stay in cache and it will be very fast to copy the data into it.

Filed under C++ iterator pointers STL

5 notes

Performance of STL vector vs plain C arrays

Inserting and retrieving elements in an STL vector is just as fast as doing it in a regular plain C array. This of course requires that you have turn on optimization.

However a STL vector has a number of attributes which potentially makes it slower than a C array:

  • A vector does not store its data in a continuous chunk of memory. Pointers to the start and end of the vector will be stored in a different location from the data itself.

  • An STL vector always allocates memory for its data on the heap. Heap allocation is slower than stack allocation.

To test how this would work in practice I created a program that would iterate over a large array, do some calculations on it and return small parts of it into either a plain C array or a STL vector.

  for (int k = 0; k < rep; ++k) {
    for (int i = 0; i < count; i += inc) {
      double buffer[inc];      
      base->GetData(Range(i, i + inc - 1), buffer);
    } 
  }

base is an object containing and array of data. We access parts of this array by specifying a range. The data in the range is returned in buffer. To test the STL vector we use this code in the inner loop instead>

  vector<double> buffer(inc);
  base->GetData(Range(i, i + inc - 1), buffer.begin());

A simple calculation is performed on each element

void Data::GetData(Range r, double *buffer) {
  double *p = d + r.min;
  double *q = d + r.max + 1;
  while (p != q) {
    double a = ((*p)+3) * (*p) + 4; // simple calc
    *buffer++ = a;
    ++p;     
  }
}

The rep variable is used to perform everything several times over so that the cache will be warmed up. inc defines the size of the buffer used to store results from GetData. count is the total number of element. In all the tests we have kept count to a constant of 1 million double values. By varying the size inc of the buffer we get these results:

cache-study.png

The scale is logarithmic along X and Y axis. At low buffer size we get an overhead in the virtual function call for the C array buffer. That is why larger buffers give better performance. However only buffers of 5-10 in size is needed for maximum performance. For STL vector we need buffers with size 1000 before we get comparable performance to plain C array. The reason for this is the overhead of the memory allocation and deallocation on the heap. This creates a very high overhead for small buffer sizes.

As the buffers get above 150 000 elements we see performance degrading again. This is likely because at that size the buffers can not fit into the cache. With smaller buffer sizes both the array and vector version will typically reuse the same memory already present on the stack. I tried printing out the address of the arrays allocated for STL vector and for small arrays we get the same address each time. This means that at each loop iteration we can reuse memory already in cache to write to.

Bottom line is:

  • At very small buffer sizes (1-5) overhead of function call (virtual) is noticeable.

  • At small buffers sizes (1 - 1000) we have overhead of memory allocation and deallocation for STL vector.

  • For large buffers we get overhead of cache misses

Filed under STL C C++ Cache Performance vector