Asp Forum - pySQLite Insert speed

mdboldin

2/29/2008 3:35:00 AM

I hav read on this forum that SQL coding (A) below is preferred over
(B), but I find (B) is much faster (20-40% faster)

(A)

sqla= 'INSERT INTO DTABLE1 VALUES (%d, %d, %d, %f)' % values
curs.execute(sqla)

(B)
pf= '?, ?, ?, ?'
sqlxb= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
curs.execute( sqlxb, values )

Any intution on why (A) is slower?

25 Answers

Carsten Haese

2/29/2008 4:18:00 AM

On Thu, 28 Feb 2008 19:35:03 -0800 (PST), mdboldin wrote
> I hav read on this forum that SQL coding (A) below is preferred over
> (B), but I find (B) is much faster (20-40% faster)
>
> (A)
>
> sqla= 'INSERT INTO DTABLE1 VALUES (%d, %d, %d, %f)' % values
> curs.execute(sqla)
>
> (B)
> pf= '?, ?, ?, ?'
> sqlxb= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
> curs.execute( sqlxb, values )
>
> Any intution on why (A) is slower?

My only problem with (B) is that it should really be this:

sqlxb= 'INSERT INTO DTABLE2 VALUES (?, ?, ?, ?)'
curs.execute( sqlxb, values )

Apart from that, (B) is better than (A). The parameter binding employed in (B)
is not only faster on many databases, but more secure. See, for example,
http://informixdb.blogspot.com/2007/07/filling-in-b... for some
in-depth explanations of why parameter binding is better than string
formatting for performing SQL queries with variable values.

HTH,

--
Carsten Haese
http://informixdb.sourc...

mdboldin

2/29/2008 1:45:00 PM

> (B) is better than (A). The parameter binding employed in (B)
> is not only faster on many databases, but more secure.
See, for example,http://informixdb.blogspot.com/2007/07/f...
blanks.html

Thx. The link was helpful, and I think I have read similar things
before-- that B is faster.
So ... I just rewrote the test code from scratch and B is faster. I
must have had something wrong in my original timing.

Steve Holden

2/29/2008 2:38:00 PM

mdboldin@gmail.com wrote:
>> (B) is better than (A). The parameter binding employed in (B)
>> is not only faster on many databases, but more secure.
> See, for example,http://informixdb.blogspot.com/2007/07/f...
> blanks.html
>
> Thx. The link was helpful, and I think I have read similar things
> before-- that B is faster.
> So ... I just rewrote the test code from scratch and B is faster. I
> must have had something wrong in my original timing.

Don't forget, by the way, that your original (B) code was performing the
string substitution of the parameter markers into the SQL statement each
time the statement was executed. As Carsten pointed out, this overhead
should be performed at most once, and only then if you want your code to
be portable over database backends with different paramstyles. Forget (A).

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

Tim Roberts

3/1/2008 7:03:00 AM

mdboldin@gmail.com wrote:
>
>I hav read on this forum that SQL coding (A) below is preferred over
>(B), but I find (B) is much faster (20-40% faster)
>
>(A)
>
> sqla= 'INSERT INTO DTABLE1 VALUES (%d, %d, %d, %f)' % values
> curs.execute(sqla)
>
>(B)
> pf= '?, ?, ?, ?'
> sqlxb= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
> curs.execute( sqlxb, values )
>
>Any intution on why (A) is slower?

I think you misunderstood. (B) is *ALWAYS* the proper way of doing
parameterized SQL queries. Unconditionally. The (A) style is way too
vulnerable to SQL injection attacks.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

mdboldin

3/1/2008 3:00:00 PM

Steve, I want to make sure I understand. My test code is below, where
ph serves as a placeholder. I am preparing for a case where the number
of ? will be driven by the length of the insert record (dx)

dtable= 'DTABLE3'
print 'Insert data into table %s, version #3' % dtable
ph= '?, ?, ?, ?'
sqlx= 'INSERT INTO %s VALUES ( %s ) ' % (dtable,ph)
t0a=time.time()
for dx in d1:
curs1.execute(sqlx,dx)
print (time.time()-t0a)
print curs1.lastrowid
conn1.commit()

I think you are saying that sqlx is re-evaluated in each loop, i.e.
not the same as pure hard coding of
sqlx= 'INSERT INTO DTABLE3 VALUES ( ?, ?, ?, ? ) '
Is that right? Hence (if I understand python convention), this can be
solved by adding
sqlx= copy.copy(sqlx)
before the looping. And in tests adding this step saved about 5-10% in
time.

And yes, I can see why (B) is always better from a security
standpoint. The python solutions for problems such as there are a
great help for people like me, in the sense that the most secure way
does not have a speed penalty (and in this case is 3-4x faster).

Steve Holden

3/1/2008 5:48:00 PM

mdboldin@gmail.com wrote:
> Steve, I want to make sure I understand. My test code is below, where
> ph serves as a placeholder. I am preparing for a case where the number
> of ? will be driven by the length of the insert record (dx)
>
> dtable= 'DTABLE3'
> print 'Insert data into table %s, version #3' % dtable
> ph= '?, ?, ?, ?'
> sqlx= 'INSERT INTO %s VALUES ( %s ) ' % (dtable,ph)
> t0a=time.time()
> for dx in d1:
> curs1.execute(sqlx,dx)
> print (time.time()-t0a)
> print curs1.lastrowid
> conn1.commit()
>
> I think you are saying that sqlx is re-evaluated in each loop, i.e.
> not the same as pure hard coding of
> sqlx= 'INSERT INTO DTABLE3 VALUES ( ?, ?, ?, ? ) '
> Is that right?

Yes. If the sql is constant then you would be performing an unnecessary
computation inside the loop. Not a biggie, but it all takes time. Is the
loop above your original code? If so I was wrong about the loop.

> Hence (if I understand python convention), this can be
> solved by adding
> sqlx= copy.copy(sqlx)
> before the looping. And in tests adding this step saved about 5-10% in
> time.
>
Now this I don;t really understand at all. What's the point of trying to
replace sqlx with a copy of itself? Perhaps if you explained what you
hope this will achieve I could comment more intelligently.

> And yes, I can see why (B) is always better from a security
> standpoint. The python solutions for problems such as there are a
> great help for people like me, in the sense that the most secure way
> does not have a speed penalty (and in this case is 3-4x faster).

Yes, it's a real win-win. Since both the table and the number of
arguments appear to be variable one possible solution is to build a dict
that would allow you to look up the right SQL using the table name. So,
suppose you have the following tables and number of arguments:

tables = (("table1", 3),
("table2", 5),
("table3", 2)
)

you could create a suitable dict as (untested):

tdict = {}
for tbl, ct in tables:
tdict[tbl] = "INSERT INTO %s VALUES (%s)" % (tbl, ", ".join(["?"] * ct))

Then you can use the table to look up the right SQL, quite a fast
operation compared with actually building it.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

Diez B. Roggisch

3/1/2008 7:44:00 PM

mdboldin@gmail.com schrieb:
> I hav read on this forum that SQL coding (A) below is preferred over
> (B), but I find (B) is much faster (20-40% faster)
>
> (A)
>
> sqla= 'INSERT INTO DTABLE1 VALUES (%d, %d, %d, %f)' % values
> curs.execute(sqla)
>
> (B)
> pf= '?, ?, ?, ?'
> sqlxb= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
> curs.execute( sqlxb, values )
>
> Any intution on why (A) is slower?

You most certainly have not found that A is the preferred over B - the
opposite is true. Using A will make you vulnerable against
SQL-injection-attacks. B OTOH will ensure that the parameters are
properly escaped or otherwise dealt with.

Regarding the intuition - that depends on what actually happens inside
B. If B works in a way that it

- converts arguments to strings

- escapes these where necessary

- builts one SQL-statement out of it

- excutes the SQL

then B is slower than A because A is just string-interpolation, whereas
B is sanitizing + string-interpolation. So it must be slower.

But a "sane" DB will instead directly use the SQL passed in B, and
transmit the parameter as binary into the backend, resulting in more
compact representation + lesser or now marshalling overhead plus
possible query parsing overhead reduction due to cached execution plans.
Which could explain B being more performant.

Diez

mdboldin

3/4/2008 3:42:00 AM

> > Hence (if I understand python convention), this can be
> > solved by adding
> > sqlx= copy.copy(sqlx)
> > before the looping. And in tests adding this step saved about 5-10% in
> > time.
>
> Now this I don;t really understand at all. What's the point of trying to
> replace sqlx with a copy of itself? Perhaps if you explained what you
> hope this will achieve I could comment more intelligently.
>

I am/was attempting to convert

sqlx= 'INSERT INTO %s VALUES ( %s ) ' % (dtable,ph)

to code that did to need to be re-evaluated. i.e. to insert the
dtable and ph values as if they were hard coded.

copy.copy --> A shallow copy constructs a new compound object and
then (to the extent possible) inserts references into it to the
objects found in the original.

Steve Holden

3/4/2008 4:16:00 AM

mmm wrote:
>>> Hence (if I understand python convention), this can be
>>> solved by adding
>>> sqlx= copy.copy(sqlx)
>>> before the looping. And in tests adding this step saved about 5-10% in
>>> time.
>> Now this I don;t really understand at all. What's the point of trying to
>> replace sqlx with a copy of itself? Perhaps if you explained what you
>> hope this will achieve I could comment more intelligently.
>>
>
> I am/was attempting to convert
>
> sqlx= 'INSERT INTO %s VALUES ( %s ) ' % (dtable,ph)
>
> to code that did to need to be re-evaluated. i.e. to insert the
> dtable and ph values as if they were hard coded.
>
> copy.copy --> A shallow copy constructs a new compound object and
> then (to the extent possible) inserts references into it to the
> objects found in the original.

Unfortunately you weren't dealing with a compound object object here, so
all you are doing is creating a copy of the string you've just created
and replacing the original with it. Copy.copy() is meant for creating
(say) lists, tuples and dicts where the elements are references to the
same objects that the elements of the original structure referred to.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

mdboldin

3/4/2008 12:54:00 PM

Steve, I think you were right the first time is saying

> it should really be this:
> sqlxb= 'INSERT INTO DTABLE2 VALUES (?, ?, ?, ?)'

my copy.copy() has the equivalent effect.

Running this test code produces the output below

import copy

print 'Test 1'
pf= '?,?,?,?'
sqlx1= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
print sqlx1

print
print 'Test 2'
sqlx2= copy.copy(sqlx1)
sqlx3= sqlx1
pf= '?,?,?, ****'
sqlx1= 'INSERT INTO DTABLE2 VALUES ( %s ) ' % pf
print 'sqlx1= ', sqlx1
print 'sqlx2= ', sqlx2
print 'sqlx3= ', sqlx2

== output
Test group 1
INSERT INTO DTABLE2 VALUES ( ?,?,?,? )

Test group 2
sqlx1= INSERT INTO DTABLE2 VALUES ( ?,?,?, **** )
sqlx2= INSERT INTO DTABLE2 VALUES ( ?,?,?,? )
sqlx3= INSERT INTO DTABLE2 VALUES ( ?,?,?,? )

I interpret this to mean that sqlx1 is not a simple string

comp.lang.python

pySQLite Insert speed

mdboldin

Carsten Haese

mdboldin

Steve Holden

Tim Roberts

mdboldin

Steve Holden

Diez B. Roggisch

mdboldin

Steve Holden

mdboldin

x Login to ForumsZone