Pseudo Associative

Random thoughts on software development

9 notes

Performance of STL vector vs plain C arrays

Inserting and retrieving elements in an STL vector is just as fast as doing it in a regular plain C array. This of course requires that you have turn on optimization.

However a STL vector has a number of attributes which potentially makes it slower than a C array:

  • A vector does not store its data in a continuous chunk of memory. Pointers to the start and end of the vector will be stored in a different location from the data itself.

  • An STL vector always allocates memory for its data on the heap. Heap allocation is slower than stack allocation.

To test how this would work in practice I created a program that would iterate over a large array, do some calculations on it and return small parts of it into either a plain C array or a STL vector.

  for (int k = 0; k < rep; ++k) {
    for (int i = 0; i < count; i += inc) {
      double buffer[inc];      
      base->GetData(Range(i, i + inc - 1), buffer);
    } 
  }

base is an object containing and array of data. We access parts of this array by specifying a range. The data in the range is returned in buffer. To test the STL vector we use this code in the inner loop instead>

  vector<double> buffer(inc);
  base->GetData(Range(i, i + inc - 1), buffer.begin());

A simple calculation is performed on each element

void Data::GetData(Range r, double *buffer) {
  double *p = d + r.min;
  double *q = d + r.max + 1;
  while (p != q) {
    double a = ((*p)+3) * (*p) + 4; // simple calc
    *buffer++ = a;
    ++p;     
  }
}

The rep variable is used to perform everything several times over so that the cache will be warmed up. inc defines the size of the buffer used to store results from GetData. count is the total number of element. In all the tests we have kept count to a constant of 1 million double values. By varying the size inc of the buffer we get these results:

cache-study.png

The scale is logarithmic along X and Y axis. At low buffer size we get an overhead in the virtual function call for the C array buffer. That is why larger buffers give better performance. However only buffers of 5-10 in size is needed for maximum performance. For STL vector we need buffers with size 1000 before we get comparable performance to plain C array. The reason for this is the overhead of the memory allocation and deallocation on the heap. This creates a very high overhead for small buffer sizes.

As the buffers get above 150 000 elements we see performance degrading again. This is likely because at that size the buffers can not fit into the cache. With smaller buffer sizes both the array and vector version will typically reuse the same memory already present on the stack. I tried printing out the address of the arrays allocated for STL vector and for small arrays we get the same address each time. This means that at each loop iteration we can reuse memory already in cache to write to.

Bottom line is:

  • At very small buffer sizes (1-5) overhead of function call (virtual) is noticeable.

  • At small buffers sizes (1 - 1000) we have overhead of memory allocation and deallocation for STL vector.

  • For large buffers we get overhead of cache misses

Filed under STL C C++ Cache Performance vector

  1. assoc posted this