10 July 2012

GNU C++ hash_set vs STL std::set: my notebook

A set is a C++ container that stores unique elements. The C++ Standard Template library  (STL) defines a C++ template set<T> that is typically implemented as a binary search tree.

#include<set>

But the GNU C++ library also provides a (non-standard) hash-based set:

#include<ext/hash_set>

In the following code I've created some random #rs numbers and I print the time needed to insert/remove them from a (GNU/hash_set or STL ) set:
#include <string>
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <sstream>
#if defined(__GNUC__) && defined(WITH_HASHSET)
#include <ext/hash_set>
namespace __gnu_cxx
{
/* hack from http://www.moosechips.com/2008/10/using-gcc-c-hash-classes-wit
h-strings/ */
template<> struct hash< std::string >
{
size_t operator()( const std::string& x ) const
{
return hash< const char* >()( x.c_str() );
}
};
}
typedef __gnu_cxx::hash_set<std::string, __gnu_cxx::hash<std::string> > set_of_string;
#else
#include <set>
typedef std::set<std::string> set_of_string;
#endif
using namespace std;
int main(int argc,char** argv)
{
set_of_string rs_list;
for(int t=0;t<2;++t)
{
srand(0U);
for(int i=0;i< 10000000;++i)
{
ostringstream os;
os << "rs" << rand();
string rs(os.str());
if(t==0)
{
rs_list.insert(rs);
}
else
{
set_of_string::iterator r=rs_list.find(rs);
if(r!=rs_list.end()) rs_list.erase(r);
}
}
}
cout << "Time: "<< (((float)std::clock())/CLOCKS_PER_SEC) << "seconds.\n";
return 0;
}
view raw testset.cpp hosted with ❤ by GitHub

STL version

$ g++  -Wall -O3 testset.cpp
$ ./a.out 
Time: 109.27seconds.

GNU hash_set version

$  g++ -DWITH_HASHSET=1 -Wall -O3 testset.cpp
In file included from /usr/include/c++/4.6/ext/hash_set:61:0,
                 from jeter.cpp:7:
/usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed (...)
$ ./a.out 
Time: 49.69seconds.

That's it,


Pierre




1 comment:

Rob said...

g++ also now includes the (standard) unordered_set, from the header , that provides a tr1 (and c++11) compatible interface to a random-access unordered set. This should have the same (or very similar) performance characteristics to the hash set with the added benefit of being standard.