Skip to content

Fast Sparse Coding Features

Sergey Kosov edited this page Mar 19, 2017 · 4 revisions

Introduction

The sparse coding approach is based on the estimation of the dictionary words (bases) , such that arbitrary data vector (sample) could be represented via a linear combination:

where is the sparse dictionary and are the weighting coefficients (features) for every word and sample.

The core of the sparse coding approach is the estimation of the the sparse dictionary , that we can use a linear combination of its words to represent any data vector . For better capture structures and patterns inherent in the data vectors, we use an over-complete dictionary, moreover, we want the linear combination to be sparse, which means there are only a very small part of the weighting coefficients in it are non-zero.

We turn the problem of the Sparse Dictionary Learning into the following minimization problem:

here, for convenience, we designated through the set of all weighting coefficients and through the set of all data vectors. The first term of our objective function is the error in reconstructing the data from the features using the basis; the second term is a sparsity penalty term to encourage the learned features to be sparse and the third term is the weight decay term, designed to keep the entries of small.

Please note, that the sets , and the dictionary are represented as matrices with the following dimensions:

Method

As the minimization problem described above can be solved with respect to either dictionary or features while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and the other:

There has been developed a number of algorithms to solve it, but let us concentrate on the Gradient descent algorithm, implemented in the DGM C++ library:

Gradient Descent

Code

Dictionary training

using namespace DirectGraphicalModels;
using namespace DirectGraphicalModels::fex;

	...
	Mat img = imread("D:\\Data\\SparseCoding\\test_img_org_00.jpg", 0);
	CSparseCoding sc(img);

	Mat X;
	char  fileName[256];

	for (int i = 0; i < 100 /*438*/; i++) {
		printf("%d\n", i);
		sprintf(fileName, "D:\\Data\\SparseCoding\\test_img_org_%02d.jpg", i);
		Mat _img = imread(fileName, 0);
		if (_img.empty()) continue;
		Mat _X = CSparseDictionary::img2data(_img, 15);
		if (!_X.empty()) {
			if (X.empty()) _X.copyTo(X);
			X.push_back(Mat(_X));
		}
	} // i

	//X = CSparseDictionary::img2data(img, 7);
	printf("nSamples = %d\n", X.rows);
	parallel::shuffleRows(X);					
	sc.train(X, 1024, 2000, 1000);
	sc.save("D:\\Dictionaries\\dictionary_1024.dic");
	...

Feature Extraction

	char str[256];

	CSparseDictionary sDictionary;
	sDictionary.load("D:\\Projects\\DGM\\etc\\scdictionaries\\dictionary_256_7.dic");
	Mat dictionary = sDictionary.getDictionary();
	int r = (sDictionary.getBlockSize() - 1) / 2;

	for (int m = 20; m <= 21; m++) {
		for (int i = 1; i <= 10; i++) {
			for (int a = 1; a < 12; a++) {
				int angle = a * 30;

				sprintf(str, "D:\\Data\\EMDS4\\Original_EM_Images\\t4-g%02d-%02d-%d.png", m, i, angle);
				printf("%s\n", str);
				Mat img = imread(str);

				vec_mat_t vFeatures = CSparseCoding::get_v(img, dictionary, sqNeighbourhood(r));

				Mat empty(img.size(), CV_8UC1, Scalar(0));
				while (vFeatures.size() % 3 != 0) vFeatures.push_back(empty);

				sprintf(str, "D:\\Data\\EMDS4\\Original_EM_Images\\t4-g%02d-%02d-%d_256", m, i, angle);
				_mkdir(str);

				int j = 0;
				for (size_t f = 0; f < vFeatures.size(); f += 3) {
					vec_mat_t::const_iterator first = vFeatures.begin() + f;
					vec_mat_t::const_iterator last = vFeatures.begin() + f + 3;
					vec_mat_t tmpVec(first, last);
					Mat tmpImg;
					merge(tmpVec, tmpImg);

					sprintf(str, "D:\\Data\\EMDS4\\Original_EM_Images\\t4-g%02d-%02d-%d_256\\t4-g%02d-%02d-fv%04d.bmp", m, i, angle, m, i, j);
					imwrite(str, tmpImg);
					j++;
				}
			} // a
		} // i
	} // m