[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

microsoft.public.vb.general.discussion

Fast String Store and Exists

BeeJ

8/25/2011 3:39:00 PM

I need fastest String Store and Exists Check.
Need Remove but speed not as critical.
Qty of strings would be in many thousands.
Just a single string gets stored (i.e. no key, item pairs)

So far the list of candidates is:

ListBox(not visible), SendMessage
ListBox Sorted(not visible), SendMessage
Dictionary (but may want to avoid FileSystemObject)
Collection
Sorted String Array Insert, Binary Search
Fast String Append, Instr

I can write code for any of these but was wondering if anyone has
already done testing.

Also are there other methods to consider.


19 Answers

BeeJ

8/26/2011 12:52:00 AM

0

Dictionary is by far the fastest.
Any other candidates?

Only the "Instr and Mid$" method improves slightly when compiled vs
IDE.
Using Exists with Dictionary to see if string is in dictionary.

System Test: Query Performance Counter minimum resolution: 1/3,579,545
sec. Or 0.279 uSec.
COMPILED
Aug 25, 2011 Thu 05:39:35p
=== COLLECTION ===
20,000 Strings To Add

0.232 Fill Secs.
0.055 Find Secs; 363,636 Items/Sec; 2.750 uSec/Item
0.048 Remove Secs; 416,667 Items/Sec; 2.40 uSec/Item
=== COLLECTION FINI ===

=== DICTIONARY ===
20,000 Strings To Add
0.041 Fill Secs.
0.021 Find Secs; 952,381 Items/Sec; 1.050 uSec/Item
0.038 Remove Secs; 526,316 Items/Sec; 1.90 uSec/Item
=== DICTIONARY FINI ===

=== LISTBOX ===
20,000 Strings To Add
0.310 Fill Secs.
25.219 Find Secs; 793 Items/Sec; 1,260.950 uSec/Item
26.561 Remove Secs; 753 Items/Sec; 1,328.05 uSec/Item
=== LISTBOX FINI ===

=== Instr and Mid$ ===
20,000 Strings To Add
1,000 Buffer Allocation Size
Remove Method Using Mid$()
0.017 Fill Secs.
15.243 Find Secs; 1,312 Items/Sec; 762.150 uSec/Item
2.208 Remove Secs; 9,058 Items/Sec; 110.40 uSec/Item

Delete Method Using Instr and Replace
0.017 Fill Secs.
5.167 Delete Secs; 3,871 Items/Sec; 258.350 uSec/Item

=== Instr and Mid$ Fini ===

*** RUN DONE ***


Schmidt

8/26/2011 1:15:00 PM

0

Am 26.08.2011 02:52, schrieb BeeJ:

> Dictionary is by far the fastest.
> Any other candidates?

There's my cSortedDictionary, which you can find
either in dhRichClient3:
http://www.datenhaus.de/Downloads/dhRichC...

but also in the recently released vbRichClient4:
http://www.datenhaus.de/Downloads/vbRC...
(which package I'd recommend, since all the
maintenance does happen here - you can remove
the WebKitCairo-Folder from this package,
when you don't need the WebKit-Browser).

There's also an older implementation of dhSortedDictionary
(in a smaller standalone-Dll) which can be downloaded here:
http://www.datenhaus.de/Downloads/dhSortedDict...
(not recommended though, due to some bugfixes with regards
to For-Each-Enumeration, which were not backported from
the newer RichClient-Implementations)


As for performance (long time no testing against
the Sripting.Dictionary)...

I knew, that the SortedDisctionary was faster in
*all* regards compared with the Sripting.Dictionary.

as some years ago...:
Add, Keyed-access, Exists-Checks and RemoveByKey:
in casesensitive BinaryCompare-mode
SortedDictionary about 20-30% faster
in casesinensitive TextCompare-mode:
SortedDictionary about 3-4 times as fast

now - some C/C++ compiler-generations later - MS apparently
*did* some "homework" (whilst the VB6-native compiler
remained at the level of VC++ 6):
Add, Keyed-access, Exists-Checks and RemoveByKey:
in casesensitive BinaryCompare-mode
now Scripting.Dictionary about 50% faster
in caseinsensitive TextCompare-mode
SortedDictionary still about 1.5-2 times as fast

This latest test here was done on a current Win7
(all SPs, all Updates) and on a fast CPU.

So there's yet an advantage when using the cSortedDictionary
in case you need case-insensitive StringCompares...
But in case of exact (binary) StringKey-comparisons, the
Scripting.Dictionary has the advantage now.

Below is my testcode:

.... producing the following results on Win7-64Bit
for the (often preferred) TextCompare-Mode:
ScriptD RndAdd: 43 20000
ScriptD Exists: 107 20000
ScriptD Remove: 43 0

SortedD RndAdd: 21 20000
SortedD Exists: 31 20000
SortedD Remove: 21 0


Here the results, when I've change to BinaryCompare
on both Dictionaries:
ScriptD RndAdd: 17 20000
ScriptD Exists: 36 20000
ScriptD Remove: 16 0

SortedD RndAdd: 20 20000
SortedD Exists: 30 20000
SortedD Remove: 20 0


Option Explicit

Private Declare Function timeGetTime Lib "winmm" () As Long
Private Declare Function timeBeginPeriod Lib "winmm" _
(ByVal uPer As Long) As Long
Private Declare Function timeEndPeriod Lib "winmm" _
(ByVal uPer As Long) As Long

Public Sub Test()
Dim SortedD As cSortedDictionary, ScriptD As scripting.Dictionary
Dim i As Long, SArr() As String, T As Long, bExists As Boolean, EmptyV
Const Count& = 20000

ReDim SArr(1 To 2 * Count)
Rnd -12345 'init the Rnd-Gen to produce always the same "randomness"
For i = 1 To UBound(SArr)
SArr(i) = "S_" & Rnd * 55555
Next i

'---- instantiation of the two contestants ;-) ----
Set ScriptD = New scripting.Dictionary
ScriptD.CompareMode = TextCompare

Set SortedD = New cSortedDictionary
SortedD.StringCompareMode = TextCompare
SortedD.UniqueKeys = True

timeBeginPeriod 1

'--------- Scripting Dictionary Test -------------
T = timeGetTime
For i = 1 To Count
ScriptD.Add SArr(i), EmptyV
Next i
Debug.Print "ScriptD RndAdd:", timeGetTime - T, ScriptD.Count

T = timeGetTime
For i = 1 To 2 * Count
bExists = ScriptD.Exists(SArr(i))
If bExists Then
If i > Count Then MsgBox "shouldn't happen"
Else
If i <= Count Then MsgBox "shouldn't happen"
End If
Next i
Debug.Print "ScriptD Exists:", timeGetTime - T, ScriptD.Count

T = timeGetTime
For i = 1 To Count
ScriptD.Remove SArr(i)
Next i
Debug.Print "ScriptD Remove:", timeGetTime - T, ScriptD.Count

Debug.Print

'--------- Sorted Dictionary Test -------------
T = timeGetTime
For i = 1 To Count
SortedD.Add SArr(i)
Next i
Debug.Print "SortedD RndAdd:", timeGetTime - T, SortedD.Count

T = timeGetTime
For i = 1 To 2 * Count
bExists = SortedD.Exists(SArr(i))
If bExists Then
If i > Count Then MsgBox "shouldn't happen"
Else
If i <= Count Then MsgBox "shouldn't happen"
End If
Next i
Debug.Print "SortedD Exists:", timeGetTime - T, SortedD.Count

T = timeGetTime
For i = 1 To Count
SortedD.Remove SArr(i)
Next i
Debug.Print "SortedD Remove:", timeGetTime - T, SortedD.Count

timeEndPeriod 1
End Sub


Olaf

BeeJ

8/26/2011 2:17:00 PM

0

Thanks I will try this out.


Schmidt

8/26/2011 2:20:00 PM

0

Am 26.08.2011 15:15, schrieb Schmidt:


FWIW:

When the Entry-Count increases, then the SortedDictionary
becomes much faster again (since it scales nearly linearly),
whilst the HashList-based algo behind the Scripting.Dictionary
apparently does not keep up that well with higher Word- or
String-Counts:

So, when I change the Count&-constant in the already posted
TestRoutine to:

40000 in TextCompare-mode:
ScriptD RndAdd: 159 40000
ScriptD Exists: 456 40000
ScriptD Remove: 158 0

SortedD RndAdd: 44 40000
SortedD Exists: 69 40000
SortedD Remove: 55 0

40000 in BinaryCompare-mode:
ScriptD RndAdd: 72 40000
ScriptD Exists: 200 40000
ScriptD Remove: 68 0

SortedD RndAdd: 42 40000
SortedD Exists: 66 40000
SortedD Remove: 53 0


100000 in TextCompare-mode:
ScriptD RndAdd: 1259 100000
ScriptD Exists: 4128 100000
ScriptD Remove: 1283 0

SortedD RndAdd: 134 100000
SortedD Exists: 235 100000
SortedD Remove: 259 0

100000 in BinaryCompare-mode:
ScriptD RndAdd: 746 100000
ScriptD Exists: 2572 100000
ScriptD Remove: 725 0

SortedD RndAdd: 131 100000
SortedD Exists: 223 100000
SortedD Remove: 255 0

then the results look much friendlier for cSortedDictionary
....now I'm at peace again... ;-)


Olaf

BeeJ

8/28/2011 7:06:00 PM

0

Thanks for the feedback!
Fast but I am looking for code I can get into and modify as needed.


Mike Williams

8/28/2011 8:26:00 PM

0

"BeeJ" <nospam@spamnot.com> wrote in message
news:j3e3i9$aa0$1@dont-email.me...

> Thanks for the feedback! Fast but I am looking for code
> I can get into and modify as needed.

Well, you shouldn't mess with the Germans. They're driven entirely by logic
and have no sense of humour.

Only kiddin' Olaf ;-)

Mike


BeeJ

8/29/2011 2:23:00 AM

0

P.S. the string sort VB6 code within your sample routines do not sort
strings of different lengths correctly. Would be better if it did a
normalized sort.
It only produces "correct" results if all the strings are the same
length.
Therefore cannot use those sorted results to do a binary search.
Any suggestions?


Schmidt

8/29/2011 1:59:00 PM

0

Am 29.08.2011 04:22, schrieb BeeJ:

> P.S. the string sort VB6 code within your sample routines
> do not sort strings of different lengths correctly.
> Would be better if it did a normalized sort.
> It only produces "correct" results if all the strings are the
> same length. Therefore cannot use those sorted results to do
> a binary search.
> Any suggestions?

Is this in reply to "sample routines" of Mike
or to some stuff I've posted?

Olaf

BeeJ

8/29/2011 2:17:00 PM

0

Schmidt used his keyboard to write :
> Am 29.08.2011 04:22, schrieb BeeJ:
>
>> P.S. the string sort VB6 code within your sample routines
> > do not sort strings of different lengths correctly.
> > Would be better if it did a normalized sort.
>> It only produces "correct" results if all the strings are the
>> same length. Therefore cannot use those sorted results to do
> > a binary search.
>> Any suggestions?
>
> Is this in reply to "sample routines" of Mike
> or to some stuff I've posted?
>
> Olaf

Your very fast, and it is very fast, string QSort.


Schmidt

8/29/2011 3:32:00 PM

0

Am 28.08.2011 21:05, schrieb BeeJ:

> Thanks for the feedback!
> Fast but I am looking for code ...
Ah, but since in your first post you were asking:

"I need fastest String Store and Exists Check."

I thought that the RC4s cSortedDictionary would
worth mentioning.

> ... I can get into and modify as needed.
Ok, but in this case you should perhaps also
refrain from the usage of the Scripting.Dictionary,
since you cannot modify its sourcecode either. ;-)

Just in case, these your planned modifications are
related to Sorting and a following binary-search
on the sorted contents: this already comes "for free"
in cSortedDictionary.

For example, if you want to achieve something like
the VB-IDE does with Intellisense (up-popping areas,
which give you the "nearest entry within a sorted list"
whilst typing the beginning of the searchterm).

If that's what you're after, then there's another
method of the cSortedDictionary (.IndexByKey), you
maybe should know about (before you throw it away ;-).

The following example scans "c:\windows\system32"
for all dlls, puts the results sorted into a listbox -
and then allows you in a textbox, to type the beginnings
of the entry you're searching for - the listbox is updating
its contents then appropriately (listing/filtering only the
entries, which start with the sequence you're typing.

The "binary chop" is done in this case by the
SortedDictionarys .IndexByKey-method, which offers
a second (optional) parameter, which you would need
to turn on - but look at the example...
(needs RichClient4 for the new cDirList-stuff, which
the older version 3 does not have)

'*** Into a Form
'*** with a TextBox named: txtStartsWith
'*** and a ListBox name: lstFiles

Option Explicit

Const FolderToScan$ = "c:\windows\system32"

Private New_c As New cConstructor
Private SD As cSortedDictionary, DL As cDirList

Private Sub Form_Load()
'perform a FolderScan, and store the results in our DL-Object
Dim Flt As String
Flt = "*.dll"
With New_c.FSO
Set DL = .GetDirList(FolderToScan, dlSortNone, Flt, True, True)
End With

'copy the filenames (from the already filled DL) over into
'our sorting Dictionary for faster Filtering later on
Set SD = GetNamesFromDirList(DL)

Caption = SD.Count & " (" & Flt & ") Files found in " & FolderToScan

FillListFromSD lstFiles, SD
End Sub

Function GetNamesFromDirList(DL As cDirList) As cSortedDictionary
Dim i As Long
Set GetNamesFromDirList = New_c.SortedDictionary(TextCompare, True)
For i = 0 To DL.FilesCount - 1
GetNamesFromDirList.Add DL.FileName(i)
Next i
End Function

Private Sub FillListFromSD(LB As ListBox, SD As cSortedDictionary, _
Optional ByVal StartsWith As String)
Dim i As Long, LenSW As Long, Idx As Long
StartsWith = LCase$(StartsWith)
LenSW = Len(StartsWith)

LB.Clear
If SD.Count = 0 Then Exit Sub

If LenSW = 0 Then 'copy everything over into the LB
For i = 0 To SD.Count - 1
LB.AddItem SD.KeyByIndex(i)
Next i

Else 'the SortedDictionaries "StartsWith-scan-feature", forced by...
Idx = SD.IndexByKey(StartsWith, True) '<- ..the Optional Param here
If Idx >= SD.Count Then Exit Sub

Do While LCase$(Left$(SD.KeyByIndex(Idx), LenSW)) = StartsWith
LB.AddItem SD.KeyByIndex(Idx)
Idx = Idx + 1
If Idx = SD.Count Then Exit Do
Loop
End If
End Sub

Private Sub txtStartsWith_Change()
FillListFromSD lstFiles, SD, txtStartsWith.Text
End Sub

Olaf