[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

microsoft.public.vb.general.discussion

Which procedure is faster?

(Mike Mitchell)

6/7/2012 7:19:00 PM

Here's a function to replace certain characters in a string with a
space:

Function ReplaceSpecialCharacters2(ByVal s As String) As String

Dim j As Long
Dim c As String
Dim ln As Long
Dim p As String

ln = Len(s)

For j = 1 To ln
c = Mid$(s, j, 1)
If InStr(StopChars, c) Then
p = p & " " ' Append a space to replace any stop character
Else
p = p & c ' else append the character
End If
Next

ReplaceSpecialCharacters2 = p

End Function

For the Stop characters, see below.

And here is the original sub:

Sub ReplaceSpecialCharacters(s As String)

' Remove/change certain characters
s = Replace(s, "*", " ")
s = Replace(s, "<", " ")
s = Replace(s, ">", " ")
s = Replace(s, "&", " ")
s = Replace(s, "{", " ")
s = Replace(s, "}", " ")
s = Replace(s, "_", " ")
s = Replace(s, "?", " ")
s = Replace(s, ".", " ")
s = Replace(s, "!", " ")
s = Replace(s, ":", " ")
s = Replace(s, ";", " ")
s = Replace(s, ",", " ")
s = Replace(s, "(", " ")
s = Replace(s, ")", " ")
s = Replace(s, "[", " ")
s = Replace(s, "]", " ")
s = Replace(s, "-", " ")
s = Replace(s, Chr$(34), " ")
s = Replace(s, "'", " ")

End Sub

I've done a few speed comparisons and the function is always slower by
between 7 and 16 seconds.

The speed test consists of reading in the Subject line of 640,000
Usenet messages and passing each subject text through either the
function or the sub, depending on a temporary checkbox setting on the
form.

The reason I rewrote the Sub as a Function is because I assumed that
using Replace umpteen times would definitely be slower, but it isn't,
it's faster.

Why?

MM
46 Answers

Karl E. Peterson

6/7/2012 7:55:00 PM

0

The concatenation is killing you! Compare this:

For j = 1 To ln
If InStr(StopChars, Mid$(s, j, 1)) Then
Mid$(s, j, 1) = " "
End If
Next





After serious thinking MM wrote :
> Here's a function to replace certain characters in a string with a
> space:
>
> Function ReplaceSpecialCharacters2(ByVal s As String) As String
>
> Dim j As Long
> Dim c As String
> Dim ln As Long
> Dim p As String
>
> ln = Len(s)
>
> For j = 1 To ln
> c = Mid$(s, j, 1)
> If InStr(StopChars, c) Then
> p = p & " " ' Append a space to replace any stop character
> Else
> p = p & c ' else append the character
> End If
> Next
>
> ReplaceSpecialCharacters2 = p
>
> End Function
>
> For the Stop characters, see below.
>
> And here is the original sub:
>
> Sub ReplaceSpecialCharacters(s As String)
>
> ' Remove/change certain characters
> s = Replace(s, "*", " ")
> s = Replace(s, "<", " ")
> s = Replace(s, ">", " ")
> s = Replace(s, "&", " ")
> s = Replace(s, "{", " ")
> s = Replace(s, "}", " ")
> s = Replace(s, "_", " ")
> s = Replace(s, "?", " ")
> s = Replace(s, ".", " ")
> s = Replace(s, "!", " ")
> s = Replace(s, ":", " ")
> s = Replace(s, ";", " ")
> s = Replace(s, ",", " ")
> s = Replace(s, "(", " ")
> s = Replace(s, ")", " ")
> s = Replace(s, "[", " ")
> s = Replace(s, "]", " ")
> s = Replace(s, "-", " ")
> s = Replace(s, Chr$(34), " ")
> s = Replace(s, "'", " ")
>
> End Sub
>
> I've done a few speed comparisons and the function is always slower by
> between 7 and 16 seconds.
>
> The speed test consists of reading in the Subject line of 640,000
> Usenet messages and passing each subject text through either the
> function or the sub, depending on a temporary checkbox setting on the
> form.
>
> The reason I rewrote the Sub as a Function is because I assumed that
> using Replace umpteen times would definitely be slower, but it isn't,
> it's faster.
>
> Why?
>
> MM

--
..NET: It's About Trust!
http://vfre...


(Mike Mitchell)

6/7/2012 8:36:00 PM

0

On Thu, 07 Jun 2012 12:54:35 -0700, Karl E. Peterson <karl@exmvps.org>
wrote:

>The concatenation is killing you! Compare this:
>
> For j = 1 To ln
> If InStr(StopChars, Mid$(s, j, 1)) Then
> Mid$(s, j, 1) = " "
> End If
> Next

Ah!

MM

Schmidt

6/7/2012 10:06:00 PM

0

Am 07.06.2012 21:18, schrieb MM:

> The reason I rewrote the Sub as a Function is because I assumed that
> using Replace umpteen times would definitely be slower, but it isn't,
> it's faster.

Karl gave you a reason for the slowness of your first function...

And as said already in a posting not that long ago,
also the Replace-Function will have to loop through
the whole string-chars in *each* call...

So better to replace the replace with something that
can do the whole thing in one single pass - and it's
especially easy in your case, since you do only lookups
on single chars - easy to use a LookUpTable-approach then:

'***into a TestForm
Option Explicit

Private Sub Form_Load()
Debug.Print ReplaceSpecialCharacters("A,B:C?D")
End Sub

'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
Function ReplaceSpecialCharacters(S As String) As String
Dim i As Long, B() As Byte: Static LUT(0 To 255) As Byte

If LUT(255) = 0 Then 'fill the LookUpTable once
For i = 0 To 255
LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
Next i
End If

If Len(S) Then B = StrConv(S, vbFromUnicode) Else Exit Function

For i = 0 To UBound(B) 'now that's the whole replace-functionality
B(i) = LUT(B(i))
Next i

ReplaceSpecialCharacters = StrConv(B, vbUnicode)
End Function

Olaf

Dee Earley

6/8/2012 8:02:00 AM

0

On 07/06/2012 23:06, Schmidt wrote:
> So better to replace the replace with something that
> can do the whole thing in one single pass - and it's
> especially easy in your case, since you do only lookups
> on single chars - easy to use a LookUpTable-approach then:
>
> '***into a TestForm
> Option Explicit
>
> Private Sub Form_Load()
> Debug.Print ReplaceSpecialCharacters("A,B:C?D")
> End Sub
>
> 'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
> Function ReplaceSpecialCharacters(S As String) As String
> Dim i As Long, B() As Byte: Static LUT(0 To 255) As Byte
>
> If LUT(255) = 0 Then 'fill the LookUpTable once
> For i = 0 To 255
> LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
> Next i
> End If
>
> If Len(S) Then B = StrConv(S, vbFromUnicode) Else Exit Function
>
> For i = 0 To UBound(B) 'now that's the whole replace-functionality
> B(i) = LUT(B(i))
> Next i
>
> ReplaceSpecialCharacters = StrConv(B, vbUnicode)
> End Function

Just be aware that this isn't unicode safe as it converts to MBCS using
the local code page which is lossy, and it may break up DBCS characters.

--
Deanna Earley (dee.earley@icode.co.uk)
i-Catcher Development Team
http://www.icode.co.uk...

iCode Systems

(Replies direct to my email address will be ignored.
Please reply to the group.)

mm

6/8/2012 11:04:00 AM

0

Schmidt

6/8/2012 5:08:00 PM

0

Am 08.06.2012 10:01, schrieb Deanna Earley:

> Just be aware that this isn't unicode safe as it converts to MBCS using
> the local code page which is lossy, and it may break up DBCS characters.

Yep, that's what I tried to hint at in my comment on top
of the routine:
'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
Function ReplaceSpecialCharacters(S As String) As String

Given the context MM is using that (mail- or usenet-message parsing),
the very first "read-in" of such a message-content should already
test, what to convert the usual "8bit-content" of such a
message-"ByteArray" to... (according to the Codepage, which
was given in the message-header).

In case these message-contents are read from the FileSystem
with VBs Open- or Line-Input functionality directly into
VBs WideChar-Strings, then the (perhaps wrong) AutoANSI-
conversion already took place.

So that's what MM has basically to ensure already beforehand
(a correct MessageBytes-To-VB-WChar-String conversion, according
to the specified Charset of the message).

Either that (and then my Routine would need to be enhanced to 16Bit-
Wide-String-Parsing) ... or MM will remain entirely in "ByteArray-
Space", passing only the *original* 8Bit-message-content (a
ByteArray-Parameter) around into the different Parsing-Routines
(ByRef B() As Byte, BOffsStart As Long, BLen As Long)
.... with the usual accompanying Offset-Params into this
original ByteArray.

In this case the ByteArray-parsing (the Replace-Loop) in
my routine would not need to be changed basically -
only the StrConv-Calls could be completely omitted.

But in case MM wants to operate on BStrings, and has
done the needed message-conversion already correctly
(using the MultiByteToWideChar-API)... here's the even
faster "W-Version", which is implemented as a Sub,
changing the contents of the passed ByRef-String
"in place", without any allocation-ops, because the
Integer-Array II is spanned virtually over the
passed ByRef String-Params content:

Sub ReplaceSpecialCharsW(S As String)
Dim i As Long: Static LUT(1 To 127) As Integer
Static II() As Integer, saII As SAFEARRAY1D

If saII.cDims = 0 Then 'this block of code is entered only once

saII.cDims = 1 'set the Array-Dimension to 1
saII.cbElements = 2 '2 Bytes per Element, matching a 16Bit WChar

BindArray II, VarPtr(saII)

For i = 1 To 127 'init the 16Bit WChar-LUT
LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
Next i
End If

saII.cElements = Len(S) 'set the ElementCount of the Array
If saII.cElements Then saII.pvData = StrPtr(S) Else Exit Sub

For i = 0 To UBound(II) 'now that's the whole replace-functionality
If i > 0 And i < 128 Then II(i) = LUT(II(i))
Next i

saII.pvData = 0 'reset the Data-Pointer
End Sub

Here come the needed API-Declares:

Private Type SAFEARRAY1D
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
cElements As Long
lLbound As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
(PArr() As Any, PSrc&, Optional ByVal cb& = 4)


Olaf

(Mike Mitchell)

6/8/2012 8:20:00 PM

0

On Fri, 08 Jun 2012 00:06:25 +0200, Schmidt <sss@online.de> wrote:

>Am 07.06.2012 21:18, schrieb MM:
>
>> The reason I rewrote the Sub as a Function is because I assumed that
>> using Replace umpteen times would definitely be slower, but it isn't,
>> it's faster.
>
>Karl gave you a reason for the slowness of your first function...
>
>And as said already in a posting not that long ago,
>also the Replace-Function will have to loop through
>the whole string-chars in *each* call...
>
>So better to replace the replace with something that
>can do the whole thing in one single pass - and it's
>especially easy in your case, since you do only lookups
>on single chars - easy to use a LookUpTable-approach then:
>
>'***into a TestForm
>Option Explicit
>
>Private Sub Form_Load()
> Debug.Print ReplaceSpecialCharacters("A,B:C?D")
>End Sub
>
>'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
>Function ReplaceSpecialCharacters(S As String) As String
>Dim i As Long, B() As Byte: Static LUT(0 To 255) As Byte
>
> If LUT(255) = 0 Then 'fill the LookUpTable once
> For i = 0 To 255
> LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
> Next i
> End If
>
> If Len(S) Then B = StrConv(S, vbFromUnicode) Else Exit Function
>
> For i = 0 To UBound(B) 'now that's the whole replace-functionality
> B(i) = LUT(B(i))
> Next i
>
> ReplaceSpecialCharacters = StrConv(B, vbUnicode)
>End Function
>
>Olaf

I've now done some comparison timings. There's not a lot of
difference.

alt.binaries.bbc.comedy, 44371 articles
Routine Secs
Mine* 71
Karl 69
Olaf 68


alt.binaries.automobiles, 80371 articles
Routine Secs
Mine* 45
Karl 44
Olaf 45


alt.binaries.sounds.radio.bbc, 1361250 articles
Routine Secs
Mine* 2195
Karl 2191
Olaf 2186

* original, using intrinisic Replace function

MM

(Mike Mitchell)

6/8/2012 8:28:00 PM

0

On Fri, 08 Jun 2012 19:07:44 +0200, Schmidt <sss@online.de> wrote:

>Am 08.06.2012 10:01, schrieb Deanna Earley:
>
>> Just be aware that this isn't unicode safe as it converts to MBCS using
>> the local code page which is lossy, and it may break up DBCS characters.
>
>Yep, that's what I tried to hint at in my comment on top
>of the routine:
> 'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
> Function ReplaceSpecialCharacters(S As String) As String
>
>Given the context MM is using that (mail- or usenet-message parsing),
>the very first "read-in" of such a message-content should already
>test, what to convert the usual "8bit-content" of such a
>message-"ByteArray" to... (according to the Codepage, which
>was given in the message-header).
>
>In case these message-contents are read from the FileSystem
>with VBs Open- or Line-Input functionality directly into
>VBs WideChar-Strings, then the (perhaps wrong) AutoANSI-
>conversion already took place.
>
>So that's what MM has basically to ensure already beforehand
>(a correct MessageBytes-To-VB-WChar-String conversion, according
> to the specified Charset of the message).
>
>Either that (and then my Routine would need to be enhanced to 16Bit-
>Wide-String-Parsing) ... or MM will remain entirely in "ByteArray-
>Space", passing only the *original* 8Bit-message-content (a
>ByteArray-Parameter) around into the different Parsing-Routines
>(ByRef B() As Byte, BOffsStart As Long, BLen As Long)
>... with the usual accompanying Offset-Params into this
>original ByteArray.
>
>In this case the ByteArray-parsing (the Replace-Loop) in
>my routine would not need to be changed basically -
>only the StrConv-Calls could be completely omitted.
>
>But in case MM wants to operate on BStrings, and has
>done the needed message-conversion already correctly
>(using the MultiByteToWideChar-API)... here's the even
>faster "W-Version", which is implemented as a Sub,
>changing the contents of the passed ByRef-String
>"in place", without any allocation-ops, because the
>Integer-Array II is spanned virtually over the
>passed ByRef String-Params content:
>
>Sub ReplaceSpecialCharsW(S As String)
>Dim i As Long: Static LUT(1 To 127) As Integer
>Static II() As Integer, saII As SAFEARRAY1D
>
> If saII.cDims = 0 Then 'this block of code is entered only once
>
> saII.cDims = 1 'set the Array-Dimension to 1
> saII.cbElements = 2 '2 Bytes per Element, matching a 16Bit WChar
>
> BindArray II, VarPtr(saII)
>
> For i = 1 To 127 'init the 16Bit WChar-LUT
> LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
> Next i
> End If
>
> saII.cElements = Len(S) 'set the ElementCount of the Array
> If saII.cElements Then saII.pvData = StrPtr(S) Else Exit Sub
>
> For i = 0 To UBound(II) 'now that's the whole replace-functionality
> If i > 0 And i < 128 Then II(i) = LUT(II(i))
> Next i
>
> saII.pvData = 0 'reset the Data-Pointer
>End Sub
>
>Here come the needed API-Declares:
>
>Private Type SAFEARRAY1D
> cDims As Integer
> fFeatures As Integer
> cbElements As Long
> cLocks As Long
> pvData As Long
> cElements As Long
> lLbound As Long
>End Type
>
>Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
> (PArr() As Any, PSrc&, Optional ByVal cb& = 4)
>
>
>Olaf

I haven't looked at the above alternative yet. Tomorrow!

MM

Karl E. Peterson

6/8/2012 8:32:00 PM

0

It happens that MM formulated :
> On Fri, 08 Jun 2012 00:06:25 +0200, Schmidt <sss@online.de> wrote:
>
>> Am 07.06.2012 21:18, schrieb MM:
>>
>>> The reason I rewrote the Sub as a Function is because I assumed that
>>> using Replace umpteen times would definitely be slower, but it isn't,
>>> it's faster.
>>
>> Karl gave you a reason for the slowness of your first function...
>>
>> And as said already in a posting not that long ago,
>> also the Replace-Function will have to loop through
>> the whole string-chars in *each* call...
>>
>> So better to replace the replace with something that
>> can do the whole thing in one single pass - and it's
>> especially easy in your case, since you do only lookups
>> on single chars - easy to use a LookUpTable-approach then:
>>
>> '***into a TestForm
>> Option Explicit
>>
>> Private Sub Form_Load()
>> Debug.Print ReplaceSpecialCharacters("A,B:C?D")
>> End Sub
>>
>> 'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
>> Function ReplaceSpecialCharacters(S As String) As String
>> Dim i As Long, B() As Byte: Static LUT(0 To 255) As Byte
>>
>> If LUT(255) = 0 Then 'fill the LookUpTable once
>> For i = 0 To 255
>> LUT(i) = IIf(InStr("*<>&{}_?.!:;,()[]-'""", Chr(i)), 32, i)
>> Next i
>> End If
>>
>> If Len(S) Then B = StrConv(S, vbFromUnicode) Else Exit Function
>>
>> For i = 0 To UBound(B) 'now that's the whole replace-functionality
>> B(i) = LUT(B(i))
>> Next i
>>
>> ReplaceSpecialCharacters = StrConv(B, vbUnicode)
>> End Function
>>
>> Olaf
>
> I've now done some comparison timings. There's not a lot of
> difference.
>
> alt.binaries.bbc.comedy, 44371 articles
> Routine Secs
> Mine* 71
> Karl 69
> Olaf 68
>
>
> alt.binaries.automobiles, 80371 articles
> Routine Secs
> Mine* 45
> Karl 44
> Olaf 45
>
>
> alt.binaries.sounds.radio.bbc, 1361250 articles
> Routine Secs
> Mine* 2195
> Karl 2191
> Olaf 2186
>
> * original, using intrinisic Replace function

Crazy. There should be far more difference, or so it would seem.
These are compiled timings, right?

There must be something else in play, if so. The way you're doing file
i/o, or something. Those numbers couldn't be the case if they were
truly isolating just the replace calls. Or, "so it would seem." :-)

--
..NET: It's About Trust!
http://vfre...


Schmidt

6/8/2012 10:29:00 PM

0

Am 08.06.2012 22:27, schrieb MM:
[Unicode-capable routine, working on VBs 16Bit WideStrings)
> I haven't looked at the above alternative yet. Tomorrow!

When you are interested in the performance of a certain
routine, then you should test it isolated (separate
it from File-IO and anything else).

The Performance-Testcode below gives the following output:

IDE:
Mid$-Statement: 387,94ms
ByteArr+StrConv: 210,08ms
IntArr+SafeArray: 103,03ms

Native (all extended options checked in):
Mid$-Statement: 248,15ms
ByteArr+StrConv: 85,96ms
IntArr+SafeArray: 11,43ms

Meaning, native compiled the Unicode-Routine is
about 20 times as fast as the Mid$-approach,
and about 8 times as fast as ByteArray + StrConv.

If you're interested in more "over-all-speed", then
you should check your whole "queue of instructions",
beginning with the File-Read, which could be done
asynchronously for example, reading only into ByteArrays
and not into Strings, *all* the Parsing then directly
on Bytes etc.

Here the TestCode:

'***the three different routines into a *.bas module
Option Explicit

'*** here the used Declares and the SafeArray-Type ***
Private Type SAFEARRAY1D
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
cElements As Long
lLbound As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
(PArr() As Any, PSrc&, Optional ByVal cb& = 4)

Private Const StopChars$ = "*<>&{}_?.!:;,()[]-'"""

'using Karls suggestion, to use Mid$ as the Replacing-Instruction
Sub ReplaceSpecialCharacters(S As String)
Dim j As Long
For j = 1 To Len(S)
If InStr(StopChars, Mid$(S, j, 1)) Then Mid$(S, j, 1) = " "
Next j
End Sub

'only working in ANSI-Mode, but not that difficult to enhance to 16Bit
Sub ReplaceSpecialCharsA(S As String)
Dim i As Long, B() As Byte: Static LUT(0 To 255) As Byte

If LUT(255) = 0 Then 'fill the LookUpTable once
For i = 0 To 255
LUT(i) = IIf(InStr(StopChars, Chr(i)), 32, i)
Next i
End If

If Len(S) Then B = StrConv(S, vbFromUnicode) Else Exit Sub

For i = 0 To UBound(B) 'now that's the whole replace-functionality
B(i) = LUT(B(i))
Next i

S = StrConv(B, vbUnicode)
End Sub

Sub ReplaceSpecialCharsW(S As String)
Dim i As Long: Static LUT(1 To 127) As Integer
Static II() As Integer, saII As SAFEARRAY1D

If saII.cDims = 0 Then 'this block of code is entered only once

saII.cDims = 1 'set the Array-Dimension to 1
saII.cbElements = 2 '2 Bytes per Element, matching a 16Bit WChar

BindArray II, VarPtr(saII)

For i = 1 To 127 'init the 16Bit WChar-LUT
LUT(i) = IIf(InStr(StopChars, Chr(i)), 32, i)
Next i
End If

saII.cElements = Len(S) 'set the ElementCount of the Array
If saII.cElements Then saII.pvData = StrPtr(S) Else Exit Sub

For i = 0 To UBound(II) 'now that's the whole replace-functionality
If i > 0 And i < 128 Then II(i) = LUT(II(i))
Next i

saII.pvData = 0 'reset the Data-Pointer
End Sub


'***and this into a Form
Option Explicit

Private Const TestCount& = 50000
Private TestArr() As String

Private Sub Form_Load()
AutoRedraw = True
ReDim TestArr(1 To TestCount)
End Sub

Private Sub Form_Click()
Dim i As Long, T!, S As String
Cls

'Testrun 1
For i = 1 To TestCount
TestArr(i) = "A-Test_String(with.special'chars):" & i
Next i
DoEvents
T = Timer
For i = 1 To TestCount 'replace directly on the Array-Position
ReplaceSpecialCharacters TestArr(i)
Next i
Print "Mid$-Statement:", Format$(1000 * (Timer - T), "0.00ms")


'Testrun 2
For i = 1 To TestCount
TestArr(i) = "A-Test_String(with.special'chars):" & i
Next i
DoEvents
T = Timer
For i = 1 To TestCount 'replace directly on the Array-Position
ReplaceSpecialCharsA TestArr(i)
Next i
Print "ByteArr+StrConv:", Format$(1000 * (Timer - T), "0.00ms")


'Testrun 3
For i = 1 To TestCount
TestArr(i) = "A-Test_String(with.special'chars):" & i
Next i
DoEvents
T = Timer
For i = 1 To TestCount 'replace directly on the Array-Position
ReplaceSpecialCharsW TestArr(i)
Next i
Print "IntArr+SafeArray:", Format$(1000 * (Timer - T), "0.00ms")
End Sub


Olaf