Home » Questions » Computers [ Ask a new question ]

What is the best way to create a sparse array in C++?

What is the best way to create a sparse array in C++?

"I am working on a project that requires the manipulation of enormous matrices, specifically pyramidal summation for a copula calculation.

In short, I need to keep track of a relatively small number of values (usually a value of 1, and in rare cases more than 1) in a sea of zeros in the matrix (multidimensional array).

A sparse array allows the user to store a small number of values, and assume all undefined records to be a preset value. Since it is not physically possibly to store all values in memory, I need to store only the few non-zero elements. This could be several million entries.

Speed is a huge priority, and I would also like to dynamically choose the number of variables in the class at runtime.

I currently work on a system that uses a binary search tree (b-tree) to store entries. Does anyone know of a better system?"

Asked by: Guest | Views: 42
Total answers/comments: 4
Guest [Entry]

"For C++, a map works well. Several million objects won't be a problem. 10 million items took about 4.4 seconds and about 57 meg on my computer.

My test application is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <map>

class triple {
public:
int x;
int y;
int z;
bool operator<(const triple &other) const {
if (x < other.x) return true;
if (other.x < x) return false;
if (y < other.y) return true;
if (other.y < y) return false;
return z < other.z;
}
};

int main(int, char**)
{
std::map<triple,int> data;
triple point;
int i;

for (i = 0; i < 10000000; ++i) {
point.x = rand();
point.y = rand();
point.z = rand();
//printf(""%d %d %d %d\n"", i, point.x, point.y, point.z);
data[point] = i;
}
return 0;
}

Now to dynamically choose the number of variables, the easiest solution is to represent index as a string, and then use string as a key for the map. For instance, an item located at [23][55] can be represented via ""23,55"" string. We can also extend this solution for higher dimensions; such as for three dimensions an arbitrary index will look like ""34,45,56"". A simple implementation of this technique is as follows:

std::map data<string,int> data;
char ix[100];

sprintf(ix, ""%d,%d"", x, y); // 2 vars
data[ix] = i;

sprintf(ix, ""%d,%d,%d"", x, y, z); // 3 vars
data[ix] = i;"
Guest [Entry]

"The accepted answer recommends using strings to represent multi-dimensional indices.

However, constructing strings is needlessly wasteful for this. If the size isn’t known at compile time (and thus std::tuple doesn’t work), std::vector works well as an index, both with hash maps and ordered trees. For std::map, this is almost trivial:

#include <vector>
#include <map>

using index_type = std::vector<int>;

template <typename T>
using sparse_array = std::map<index_type, T>;

For std::unordered_map (or similar hash table-based dictionaries) it’s slightly more work, since std::vector does not specialise std::hash:

#include <vector>
#include <unordered_map>
#include <numeric>

using index_type = std::vector<int>;

struct index_hash {
std::size_t operator()(index_type const& i) const noexcept {
// Like boost::hash_combine; there might be some caveats, see
// <https://stackoverflow.com/a/50978188/1968>
auto const hash_combine = [](auto seed, auto x) {
return std::hash<int>()(x) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
};
return std::accumulate(i.begin() + 1, i.end(), i[0], hash_combine);
}
};

template <typename T>
using sparse_array = std::unordered_map<index_type, T, index_hash>;

Either way, the usage is the same:

int main() {
using i = index_type;

auto x = sparse_array<int>();
x[i{1, 2, 3}] = 42;
x[i{4, 3, 2}] = 23;

std::cout << x[i{1, 2, 3}] + x[i{4, 3, 2}] << '\n'; // 65
}"
Guest [Entry]

"Boost has a templated implementation of BLAS called uBLAS that contains a sparse matrix.

https://www.boost.org/doc/libs/release/libs/numeric/ublas/doc/index.htm"
Guest [Entry]

Eigen is a C++ linear algebra library that has an implementation of a sparse matrix. It even supports matrix operations and solvers (LU factorization etc) that are optimized for sparse matrices.