Asp Forum - Making hash table in CL fast

Malice

2/8/2016 7:04:00 PM

So I am toying with hash table performance in CL. I am curious how I can make it as fast as possible(and optimization in CL in general).

I created few scripts in different languages to compare their speed:
Each program puts 10kk(millions) of numbers from 0 to 10kk-1 into hash table. Number at ith position is i. Afterwards, I print the value of 150 000th number.
I run each test 1000 times to get average time of execution.

Common Lisp:

(defun test ()
(declare (optimize (speed 3) (safety 0)))
(let* ((size 10000000)
(x (make-hash-table :test #'eq :size size)))
(loop for i from 0 to size do
(setf (gethash i x) i))
(print (gethash 150000 x))))

(save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)

command used to test: time for i in {1.1000}; do ./test.elf ; done
Result: ~1.059 seconds
SBCL version: 1.3.1

Python:

#!/usr/bin/python

if __name__ == "__main__":
x = {}
size = 10000000
for i in xrange(size):
x[i] = i
print(x[150000])

command used to test: time for i in {1.1000}; do ./hash.py ; done
Result: ~1.333 seconds
Interpreter version: 2.7.10

C++11:

#include <iostream>
#include <unordered_map>

int main(int argc, char* argv[])
{
unsigned long size = 10000000L;
std::unordered_map<unsigned long, unsigned long> x;
x.reserve(size);
for(unsigned long i = 0; i < size; ++i)
{
x[i] = i;
}
std::cout << x[150000] << std::endl;
return 0;
}

Compiled with: g++ hash.cpp -o hashcpp.elf -std=c++11 -O3
Compiler version: gcc 4.8.5
Command used to test: time for i in {1..1000}; do ./hashcpp.elf ; done
Result: 0.775 seconds

I tried Ruby, but some obvious ways were too slow(~7 seconds).

So, as you can see - CL is faster from python, but C++ is still faster from CL. How can I make it even faster, so the difference is minimal?

I didn't try to compare with other languages, but you're free to take this code and run on your machine to present benchmarks; preferably functional languages.

What could I do to speed CL up?

11 Answers

Barry Margolin

2/8/2016 7:14:00 PM

In article <ad8a4aeb-830b-42cd-a912-0e0b758114c0@googlegroups.com>,
Malice <gwmaniak@gmail.com> wrote:

> So I am toying with hash table performance in CL. I am curious how I can make
> it as fast as possible(and optimization in CL in general).
>
> I created few scripts in different languages to compare their speed:
> Each program puts 10kk(millions) of numbers from 0 to 10kk-1 into hash table.
> Number at ith position is i. Afterwards, I print the value of 150 000th
> number.
> I run each test 1000 times to get average time of execution.
>
> Common Lisp:
>
> (defun test ()
> (declare (optimize (speed 3) (safety 0)))
> (let* ((size 10000000)
> (x (make-hash-table :test #'eq :size size)))
> (loop for i from 0 to size do
> (setf (gethash i x) i))
> (print (gethash 150000 x))))
>
> (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
>
> command used to test: time for i in {1.1000}; do ./test.elf ; done
> Result: ~1.059 seconds
> SBCL version: 1.3.1
>
> Python:
>
> #!/usr/bin/python
>
> if __name__ == "__main__":
> x = {}
> size = 10000000
> for i in xrange(size):
> x[i] = i
> print(x[150000])
>
> command used to test: time for i in {1.1000}; do ./hash.py ; done
> Result: ~1.333 seconds
> Interpreter version: 2.7.10
>
> C++11:
>
> #include <iostream>
> #include <unordered_map>
>
> int main(int argc, char* argv[])
> {
> unsigned long size = 10000000L;
> std::unordered_map<unsigned long, unsigned long> x;
> x.reserve(size);
> for(unsigned long i = 0; i < size; ++i)
> {
> x[i] = i;
> }
> std::cout << x[150000] << std::endl;
> return 0;
> }
>
> Compiled with: g++ hash.cpp -o hashcpp.elf -std=c++11 -O3
> Compiler version: gcc 4.8.5
> Command used to test: time for i in {1..1000}; do ./hashcpp.elf ; done
> Result: 0.775 seconds
>
> I tried Ruby, but some obvious ways were too slow(~7 seconds).
>
>
> So, as you can see - CL is faster from python, but C++ is still faster from
> CL. How can I make it even faster, so the difference is minimal?
>
> I didn't try to compare with other languages, but you're free to take this
> code and run on your machine to present benchmarks; preferably functional
> languages.
>
> What could I do to speed CL up?

Check the disassembly of the function, to see if it's doing hardware or
generic arithmetic on I. If it's doing generic arithmetic, declare the
variables to be FIXNUM.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Malice

2/8/2016 7:33:00 PM

I'm sorry, but how can I check this? I know that I can check disassembly of the function by calling #'disassemble, but I don't know how to check whether the arithmetic is generic, and if I know how to declare variables correctly. This approach:

(defun test ()
(declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0)))
(let* ((size 10000000)
(x (make-hash-table :test #'eq :size size)))
(declare (type fixnum size))
(do ((i 0 (1+ i)))
((> i size))
(declare (type fixnum i))
(setf (gethash i x) i))
(print (gethash 150000 x))))

yields the same result as the function in OP.

Paul Rubin

2/8/2016 7:43:00 PM

Malice <gwmaniak@gmail.com> writes:
> (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
> command used to test: time for i in {1.1000}; do ./test.elf ; done
> Result: ~1.059 seconds
> SBCL version: 1.3.1

1.059 seconds is not just the computation time in the program, but also
the time to launch and load 1000 processes that run the program. That
load time will partly depend on the binary size including the language
runtime. That will be much larger for Lisp than C++, and probably
larger than for Python. I suggest changing the benchmark to run in just
one process. You might also have to add some type annotations to the
Lisp code: I don't know if SBCL will infer fixnums automatically.

Malice

2/8/2016 10:07:00 PM

On Monday, 8 February 2016 20:43:05 UTC+1, Paul Rubin wrote:
> Malice <gwmaniak@gmail.com> writes:
> > (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
> > command used to test: time for i in {1.1000}; do ./test.elf ; done
> > Result: ~1.059 seconds
> > SBCL version: 1.3.1
>
> 1.059 seconds is not just the computation time in the program, but also
> the time to launch and load 1000 processes that run the program. That
> load time will partly depend on the binary size including the language
> runtime. That will be much larger for Lisp than C++, and probably
> larger than for Python. I suggest changing the benchmark to run in just
> one process. You might also have to add some type annotations to the
> Lisp code: I don't know if SBCL will infer fixnums automatically.

I agree that there's time included that is needed to load everythin; however, the time is still divided, so it doesn't really matter whether I perform one or 1000 tests - time to load/1 vs 1000*time to load/1000 is the same anyway - besides, numbers are similar, but I believe that the comparison with 1000 tests is more fair.

Antsan

2/8/2016 11:05:00 PM

Am Montag, 8. Februar 2016 20:04:18 UTC+1 schrieb Malice:
> So I am toying with hash table performance in CL. I am curious how I can make it as fast as possible(and optimization in CL in general).
>
> I created few scripts in different languages to compare their speed:
> Each program puts 10kk(millions) of numbers from 0 to 10kk-1 into hash table. Number at ith position is i. Afterwards, I print the value of 150 000th number.
> I run each test 1000 times to get average time of execution.
>
> Common Lisp:
>
> (defun test ()
> (declare (optimize (speed 3) (safety 0)))
> (let* ((size 10000000)
> (x (make-hash-table :test #'eq :size size)))
> (loop for i from 0 to size do
> (setf (gethash i x) i))
> (print (gethash 150000 x))))
>
> (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
>
> command used to test: time for i in {1.1000}; do ./test.elf ; done
> Result: ~1.059 seconds
> SBCL version: 1.3.1
>
> Python:
>
> #!/usr/bin/python
>
> if __name__ == "__main__":
> x = {}
> size = 10000000
> for i in xrange(size):
> x[i] = i
> print(x[150000])
>
> command used to test: time for i in {1.1000}; do ./hash.py ; done
> Result: ~1.333 seconds
> Interpreter version: 2.7.10
>
> C++11:
>
> #include <iostream>
> #include <unordered_map>
>
> int main(int argc, char* argv[])
> {
> unsigned long size = 10000000L;
> std::unordered_map<unsigned long, unsigned long> x;
> x.reserve(size);
> for(unsigned long i = 0; i < size; ++i)
> {
> x[i] = i;
> }
> std::cout << x[150000] << std::endl;
> return 0;
> }
>
> Compiled with: g++ hash.cpp -o hashcpp.elf -std=c++11 -O3
> Compiler version: gcc 4.8.5
> Command used to test: time for i in {1..1000}; do ./hashcpp.elf ; done
> Result: 0.775 seconds
>
> I tried Ruby, but some obvious ways were too slow(~7 seconds).
>
>
> So, as you can see - CL is faster from python, but C++ is still faster from CL. How can I make it even faster, so the difference is minimal?
>
> I didn't try to compare with other languages, but you're free to take this code and run on your machine to present benchmarks; preferably functional languages.
>
> What could I do to speed CL up?

Your test is non-conforming. Two numbers must not be considered equal under EQ.
The hash-table needs to use EQL as a test predicate.

There is a print expression, but did you actually check that it returns what you
expect?

William James

2/8/2016 11:32:00 PM

Antsan wrote:

> Am Montag, 8. Februar 2016 20:04:18 UTC+1 schrieb Malice:
> > So I am toying with hash table performance in CL. I am curious how I can make it as fast as possible(and optimization in CL in general).
> >
> > I created few scripts in different languages to compare their speed:
> > Each program puts 10kk(millions) of numbers from 0 to 10kk-1 into hash table. Number at ith position is i. Afterwards, I print the value of 150 000th number.
> > I run each test 1000 times to get average time of execution.
> >
> > Common Lisp:
> >
> > (defun test ()
> > (declare (optimize (speed 3) (safety 0)))
> > (let* ((size 10000000)
> > (x (make-hash-table :test #'eq :size size)))
> > (loop for i from 0 to size do
> > (setf (gethash i x) i))
> > (print (gethash 150000 x))))
> >
> > (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
> >
> > command used to test: time for i in {1.1000}; do ./test.elf ; done
> > Result: ~1.059 seconds
> > SBCL version: 1.3.1
> >
> > Python:
> >
> > #!/usr/bin/python
> >
> > if name == "__main__":
> > x = {}
> > size = 10000000
> > for i in xrange(size):
> > x[i] = i
> > print(x[150000])
> >
> > command used to test: time for i in {1.1000}; do ./hash.py ; done
> > Result: ~1.333 seconds
> > Interpreter version: 2.7.10
> >
> > C++11:
> >
> > #include <iostream>
> > #include <unordered_map>
> >
> > int main(int argc, char* argv[])
> > {
> > unsigned long size = 10000000L;
> > std::unordered_map<unsigned long, unsigned long> x;
> > x.reserve(size);
> > for(unsigned long i = 0; i < size; ++i)
> > {
> > x[i] = i;
> > }
> > std::cout << x[150000] << std::endl;
> > return 0;
> > }
> >
> > Compiled with: g++ hash.cpp -o hashcpp.elf -std=c++11 -O3
> > Compiler version: gcc 4.8.5
> > Command used to test: time for i in {1..1000}; do ./hashcpp.elf ; done
> > Result: 0.775 seconds
> >
> > I tried Ruby, but some obvious ways were too slow(~7 seconds).
> >
> >
> > So, as you can see - CL is faster from python, but C++ is still faster from CL. How can I make it even faster, so the difference is minimal?
> >
> > I didn't try to compare with other languages, but you're free to take this code and run on your machine to present benchmarks; preferably functional languages.
> >
> > What could I do to speed CL up?
>
> Your test is non-conforming. Two numbers must not be considered equal under EQ.
> The hash-table needs to use EQL as a test predicate.
>
> There is a print expression, but did you actually check that it returns what you
> expect?

Armed Bear Common Lisp 0.12.0

CL-USER(1): (defun test ()
(declare (optimize (speed 3) (safety 0)))
(let* ((size 99888)
(x (make-hash-table :test #'eq :size size)))
(loop for i from 0 to size do
(setf (gethash i x) i))
(print (gethash 999 x))))
TEST
CL-USER(2): (test)

NIL

Malice

2/8/2016 11:58:00 PM

On Tuesday, 9 February 2016 00:05:18 UTC+1, Antsan wrote:
> Am Montag, 8. Februar 2016 20:04:18 UTC+1 schrieb Malice:
> > So I am toying with hash table performance in CL. I am curious how I can make it as fast as possible(and optimization in CL in general).
> >
> > I created few scripts in different languages to compare their speed:
> > Each program puts 10kk(millions) of numbers from 0 to 10kk-1 into hash table. Number at ith position is i. Afterwards, I print the value of 150 000th number.
> > I run each test 1000 times to get average time of execution.
> >
> > Common Lisp:
> >
> > (defun test ()
> > (declare (optimize (speed 3) (safety 0)))
> > (let* ((size 10000000)
> > (x (make-hash-table :test #'eq :size size)))
> > (loop for i from 0 to size do
> > (setf (gethash i x) i))
> > (print (gethash 150000 x))))
> >
> > (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
> >
> > command used to test: time for i in {1.1000}; do ./test.elf ; done
> > Result: ~1.059 seconds
> > SBCL version: 1.3.1
> >
> > Python:
> >
> > #!/usr/bin/python
> >
> > if __name__ == "__main__":
> > x = {}
> > size = 10000000
> > for i in xrange(size):
> > x[i] = i
> > print(x[150000])
> >
> > command used to test: time for i in {1.1000}; do ./hash.py ; done
> > Result: ~1.333 seconds
> > Interpreter version: 2.7.10
> >
> > C++11:
> >
> > #include <iostream>
> > #include <unordered_map>
> >
> > int main(int argc, char* argv[])
> > {
> > unsigned long size = 10000000L;
> > std::unordered_map<unsigned long, unsigned long> x;
> > x.reserve(size);
> > for(unsigned long i = 0; i < size; ++i)
> > {
> > x[i] = i;
> > }
> > std::cout << x[150000] << std::endl;
> > return 0;
> > }
> >
> > Compiled with: g++ hash.cpp -o hashcpp.elf -std=c++11 -O3
> > Compiler version: gcc 4.8.5
> > Command used to test: time for i in {1..1000}; do ./hashcpp.elf ; done
> > Result: 0.775 seconds
> >
> > I tried Ruby, but some obvious ways were too slow(~7 seconds).
> >
> >
> > So, as you can see - CL is faster from python, but C++ is still faster from CL. How can I make it even faster, so the difference is minimal?
> >
> > I didn't try to compare with other languages, but you're free to take this code and run on your machine to present benchmarks; preferably functional languages.
> >
> > What could I do to speed CL up?
>
> Your test is non-conforming. Two numbers must not be considered equal under EQ.
> The hash-table needs to use EQL as a test predicate.
>
> There is a print expression, but did you actually check that it returns what you
> expect?

My bad then. However, in my implementation eq works fine, and all the programs have the same observable behaviour - they print 150000 and a newline.

But am I missing something, or is it that numbers *might* be (sometimes) considered equal under eq, but that's implementation-dependant?

taruss

2/9/2016 12:33:00 AM

On Monday, February 8, 2016 at 3:57:54 PM UTC-8, Malice wrote:

> But am I missing something, or is it that numbers *might* be (sometimes)
> considered equal under eq, but that's implementation-dependant?

That is exactly correct. And sometimes even in the same implementation, some
integers could be equal under EQ and others not. To guarantee numeric
equivalence you need to use EQL.

Numeric equivalence in this case also means the numbers must be of the same
type, so that (eql 1 1.0) => NIL. There is also = for numbers, which will do
type coercion but it will fail on non-numbers.

Background: The idea was to allow implementations the freedom to use immediate
data values for (generally) integers so as to be more efficient. That means
that you might have integers with the same value physically stored in more than
one place, typically directly in the machine instruction. EQ in Lisp does
exact object equality, sort of like the Python "is" operator. EQL tests for
object or same-type numeric equality, so that is what you need to use to be
sure that you get a hash table that uses the correct comparison.

taruss

2/9/2016 12:35:00 AM

On Monday, February 8, 2016 at 2:06:45 PM UTC-8, Malice wrote:
> On Monday, 8 February 2016 20:43:05 UTC+1, Paul Rubin wrote:
> > Malice <gwmaniak@gmail.com> writes:
> > > (save-lisp-and-die "test.elf" :toplevel #'test :executable t :compression 9)
> > > command used to test: time for i in {1.1000}; do ./test.elf ; done
> > > Result: ~1.059 seconds
> > > SBCL version: 1.3.1
> >
> > 1.059 seconds is not just the computation time in the program, but also
> > the time to launch and load 1000 processes that run the program. That
> > load time will partly depend on the binary size including the language
> > runtime. That will be much larger for Lisp than C++, and probably
> > larger than for Python. I suggest changing the benchmark to run in just
> > one process. You might also have to add some type annotations to the
> > Lisp code: I don't know if SBCL will infer fixnums automatically.
>
> I agree that there's time included that is needed to load everythin; however, the time is still divided, so it doesn't really matter whether I perform one or 1000 tests - time to load/1 vs 1000*time to load/1000 is the same anyway - besides, numbers are similar, but I believe that the comparison with 1000 tests is more fair.

As part of this, you might also want to run tests in the different languages
with no-op test functions so you can get a sense of what that overhead is
likely to be.

But for more precise Lisp tuning, you would also want to use the built-in
(TIME ...) function. And make sure that any functions you use are compiled.

Malice

2/9/2016 10:14:00 AM

On Tuesday, 9 February 2016 01:32:56 UTC+1, tar...@google.com wrote:
> On Monday, February 8, 2016 at 3:57:54 PM UTC-8, Malice wrote:
>
> > But am I missing something, or is it that numbers *might* be (sometimes)
> > considered equal under eq, but that's implementation-dependant?
>
> That is exactly correct. And sometimes even in the same implementation, some
> integers could be equal under EQ and others not. To guarantee numeric
> equivalence you need to use EQL.
>
> Numeric equivalence in this case also means the numbers must be of the same
> type, so that (eql 1 1.0) => NIL. There is also = for numbers, which will do
> type coercion but it will fail on non-numbers.
>
> Background: The idea was to allow implementations the freedom to use immediate
> data values for (generally) integers so as to be more efficient. That means
> that you might have integers with the same value physically stored in more than
> one place, typically directly in the machine instruction. EQ in Lisp does
> exact object equality, sort of like the Python "is" operator. EQL tests for
> object or same-type numeric equality, so that is what you need to use to be
> sure that you get a hash table that uses the correct comparison.

I thought that #'eq was just comparing the pointers.
The function is compiled, and using (time) in CL it indeed gets closer to C++(~0.85s); is loading the binary causing so much overhead(for such a trivial program)?

comp.lang.lisp

Making hash table in CL fast

Malice

Barry Margolin

Malice

Paul Rubin

Malice

Antsan

William James

Malice

taruss

taruss

Malice

x Login to ForumsZone