Asp Forum - ORDER BY a second letter of a field - microsoft.public.vb.general.discussion

mm

8/21/2011 6:51:00 PM

Hello guys, I'm using DAO 3.6 with an access 2003 database in a program.

I would like to generate recordsets ordered by a second (or third)
letter of a text field of a table. Is it possible?

Thanks.

44 Answers

Tony Toews

8/21/2011 7:47:00 PM

On Sun, 21 Aug 2011 15:50:56 -0300, Eduardo <mm@mm.com> wrote:

>Hello guys, I'm using DAO 3.6 with an access 2003 database in a program.
>
>I would like to generate recordsets ordered by a second (or third)
>letter of a text field of a table. Is it possible?

In your SQL try
ORDER BY Mid([FieldName],2,1)

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/ac...
Tony's Microsoft Access Blog - http://msmvps.com/blo...
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeup...

(nobody)

8/21/2011 7:49:00 PM

"Eduardo" <mm@mm.com> wrote in message
news:j2rk3c$n7m$1@speranza.aioe.org...
> Hello guys, I'm using DAO 3.6 with an access 2003 database in a program.
>
> I would like to generate recordsets ordered by a second (or third) letter
> of a text field of a table. Is it possible?

Air code:

SELECT * FROM Customers ORDER BY Mid(FullName, 2)

Needless to say this would be slow because a temporary index will be
created.

mm

8/22/2011 4:23:00 AM

El 21/08/2011 03:50 p.m., Eduardo escribió:
> Hello guys, I'm using DAO 3.6 with an access 2003 database in a program.
>
> I would like to generate recordsets ordered by a second (or third)
> letter of a text field of a table. Is it possible?
>
> Thanks.

OK, guys. I wanted to go ahead with the work and what I did is to create
additional fields without the first letters.

It's a database with fixed data, so there is not data additions by the
program. It's a dictionary.

I need to consult 6 fields in a table that have different
transliterations of Greeks words, and I need to find a word by aproximation.

So I added additional fields for earch of the 6 fields (the
transliterations), one without the first letter, other one without 2
letters, and so on until without the 9 first letters. And I filled the
new data using a routine.

Then, for consulting, I make a number of recordsets ordered each one by
a different field. They are about 60 recordsets.

But if your suggestion work, I can delete all the additional fields and
generate the recordsets by SQL.

What do you think is better? (I have it already working)

Thanks.

Tony Toews

8/22/2011 6:52:00 AM

On Mon, 22 Aug 2011 01:22:46 -0300, Eduardo <mm@mm.com> wrote:

>It's a database with fixed data, so there is not data additions by the
>program. It's a dictionary.
>
>I need to consult 6 fields in a table that have different
>transliterations of Greeks words, and I need to find a word by aproximation.
>
>So I added additional fields for earch of the 6 fields (the
>transliterations), one without the first letter, other one without 2
>letters, and so on until without the 9 first letters. And I filled the
>new data using a routine.

Did you index each of those additional 9*6 fields?

How many records in your table?

>Then, for consulting, I make a number of recordsets ordered each one by
>a different field. They are about 60 recordsets.
>
>But if your suggestion work, I can delete all the additional fields and
>generate the recordsets by SQL.
>
>What do you think is better? (I have it already working)

It all depends on your speed requirements which depends on how many
records you have.

On the one hand if it's a few thousand records and the Access database
file is on your local hard drive then a full table scan on the fields
might be quite acceptable as this might only be 5 or 10 Mb in size.

On the other hand if it's 50K records and on a network then you're
looking as much as 100 times as long to do the same search.

On the third hand if you already have it working then don't muck with
it. <smile>

That said stuff can happen. So if the users can enter new words and
you fill in these additional fields and something happens you might
get data that is incomplete. Therefore you should have a routine that
does a bulk update of all these fields.

And if you are going to do a bulk update then you are strongly urged
to delete the indexes, do the bulk update and recreate the indexes.
Things will likely go much, much faster. There is DAO collections
code that you can use to delete/recreate the indexes.

Finally I don't care for the idea of 6 virtually identical fields in
one table. You'd be better having one field with six rows in
another table. That would simplify the searching. Maybe. Depending.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/ac...
Tony's Microsoft Access Blog - http://msmvps.com/blo...
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeup...

mm

8/22/2011 7:38:00 AM

El 22/08/2011 03:52 a.m., Tony Toews escribió:
> On Mon, 22 Aug 2011 01:22:46 -0300, Eduardo<mm@mm.com> wrote:
>
>> It's a database with fixed data, so there is not data additions by the
>> program. It's a dictionary.
>>
>> I need to consult 6 fields in a table that have different
>> transliterations of Greeks words, and I need to find a word by aproximation.
>>
>> So I added additional fields for earch of the 6 fields (the
>> transliterations), one without the first letter, other one without 2
>> letters, and so on until without the 9 first letters. And I filled the
>> new data using a routine.
>
> Did you index each of those additional 9*6 fields?

I tried, but Access told me that I can't have more than 32 indexes, so I
removed all the indexes.

>
> How many records in your table?

~~ 8600

>
>> Then, for consulting, I make a number of recordsets ordered each one by
>> a different field. They are about 60 recordsets.
>>
>> But if your suggestion work, I can delete all the additional fields and
>> generate the recordsets by SQL.
>>
>> What do you think is better? (I have it already working)
>
> It all depends on your speed requirements which depends on how many
> records you have.

I need more speed.
I realized that I needed to add also comparison without the last
letters, so I added a new loop level discarding letters from the end.
iRecordSet.FindFirst iField & " LIKE '" & Left(iWord...

And with long words it's taking too much time, about 3 minutes.

>
> On the one hand if it's a few thousand records and the Access database
> file is on your local hard drive then a full table scan on the fields
> might be quite acceptable as this might only be 5 or 10 Mb in size.
>
> On the other hand if it's 50K records and on a network then you're
> looking as much as 100 times as long to do the same search.

It's a local database on the HD.

>
> On the third hand if you already have it working then don't muck with
> it.<smile>

As I said, now I see that I need more speed.

>
> That said stuff can happen. So if the users can enter new words and
> you fill in these additional fields and something happens you might
> get data that is incomplete. Therefore you should have a routine that
> does a bulk update of all these fields.

There is not new records entries, it's a fixed dictinary with Bible words.

>
> And if you are going to do a bulk update then you are strongly urged
> to delete the indexes, do the bulk update and recreate the indexes.
> Things will likely go much, much faster. There is DAO collections
> code that you can use to delete/recreate the indexes.

I don't have much experience with "complex" operations with databases,
so I need to learn anything that could be useful to do this task better.

>
> Finally I don't care for the idea of 6 virtually identical fields in
> one table. You'd be better having one field with six rows in
> another table. That would simplify the searching. Maybe. Depending.

And Access would let me to add the indexes... (they would be less than
32 in each table).

But if I go for the SQL Mid(Filename,2) it could be faster?

>
> Tony

Thank you.

Schmidt

8/22/2011 1:01:00 PM

Am 22.08.2011 09:37, schrieb Eduardo:

> And Access would let me to add the indexes... (they would
> be less than 32 in each table).
>
> But if I go for the SQL Mid(Filename,2) it could be faster?
No - in this case (using builtin SQL-Functions) there's always
at least one fulltable-scan necessary.

But didn't we have this topic already "on the table"
(the thread, where you were asking "find a similar word")?

In short, if there are only about 8000 Word-entries,
which need "enhanced, unsharp scanning", then you
would be faster, if you handle these wordscans
not directly per "normal DB-Functions and table-scans"
(in case, the DB-engine in question does not
offer builtin "unsharp search-functions", which
are more capable than for example 'Like').

So the best approach would be, to move your
8000 words over into a String-Array and perform
your comparisons on this (or also additional,
differently ordered or shortened StringArrays)
simply by index, directly in your own App-Loops.

The problem is an old one - and your current
approach (same word-content in different
"shapes", shortened from right, shortened from
left, etc. ... and then multiple comparison-loops)
is the one, which perhaps everyone starts with.

But eventually one will find, that this "naive"
approach might be sufficient for a small amount
of words, but multiple array-scans are costly in
terms of performance and need a lot of memory
when the "unsharp requirements" grow - or the
wordcount gets higher.

And that's, were all these more advanced,
unsharp algorithms come into play (metaphone,
ratcliff & Co.)

Did you already tried one of them against your
set of words?

Or could you give a list of (maybe 10-20 should
be sufficient) words - and then a few searchterms,
for which you define the word (out of your
10-20 words-list) that should be returned - just that
we can see, how tolerant/unsharp the algorithm should be...

Olaf

Tony Toews

8/22/2011 7:32:00 PM

On Mon, 22 Aug 2011 15:00:41 +0200, Schmidt <sss@online.de> wrote:

>So the best approach would be, to move your
>8000 words over into a String-Array and perform
>your comparisons on this (or also additional,
>differently ordered or shortened StringArrays)
>simply by index, directly in your own App-Loops.

Agreed.

I almost always, 99.999% of the time, think in terms of database
because I'm locked into that mindset after 30+ years and 15 years in
Access.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/ac...
Tony's Microsoft Access Blog - http://msmvps.com/blo...
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeup...

Tony Toews

8/22/2011 7:34:00 PM

On Mon, 22 Aug 2011 04:37:59 -0300, Eduardo <mm@mm.com> wrote:

>> Did you index each of those additional 9*6 fields?
>
>I tried, but Access told me that I can't have more than 32 indexes, so I
>removed all the indexes.

Oops, I forgot about that problem.

>> How many records in your table?
>~~ 8600
>It's a local database on the HD.

>> That said stuff can happen. So if the users can enter new words and
>> you fill in these additional fields and something happens you might
>> get data that is incomplete. Therefore you should have a routine that
>> does a bulk update of all these fields.
>
>There is not new records entries, it's a fixed dictinary with Bible words.

That helps.

Schmidt has the right idea. Go with arrays in memory. For 8800
records that'll be just fine. 8 million records maybe not but then
these days with 2 Gb of available RAM even that might be doable.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/ac...
Tony's Microsoft Access Blog - http://msmvps.com/blo...
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeup...

mm

8/22/2011 11:56:00 PM

Hello Olaf.

El 22/08/2011 10:00 a.m., Schmidt escribió:
> Am 22.08.2011 09:37, schrieb Eduardo:
>
>> And Access would let me to add the indexes... (they would
> > be less than 32 in each table).
>>
>> But if I go for the SQL Mid(Filename,2) it could be faster?
> No - in this case (using builtin SQL-Functions) there's always
> at least one fulltable-scan necessary.

OK.

> But didn't we have this topic already "on the table"
> (the thread, where you were asking "find a similar word")?

Yes and no. At that time, it was to compare with a small set of words
that can range from 5 to 50 words, that I have in an array in memory.
They are also bible words, but that case was for words used in a
versicle or several versicles.
That is already working fine.

Now, I'm working again on that program, and I found that I also need to
find a word when no versicle is specified, and that's the difference
because I need to search in the whole dictionary.

>
> In short, if there are only about 8000 Word-entries,
> which need "enhanced, unsharp scanning", then you
> would be faster, if you handle these wordscans
> not directly per "normal DB-Functions and table-scans"
> (in case, the DB-engine in question does not
> offer builtin "unsharp search-functions", which
> are more capable than for example 'Like').

Yes, I remember that you suggested to go to a SQLite database...
The only database type that I have worked with so far are Access
databases, so I'm trying to stay with Access unless I see that I really
must migrate to another one (and also learn all the things related to
distribution and deployment).

>
> So the best approach would be, to move your
> 8000 words over into a String-Array and perform
> your comparisons on this (or also additional,
> differently ordered or shortened StringArrays)
> simply by index, directly in your own App-Loops.

This is something that I'm considering, but at the end they will be 8000
* 60, because all the variants.
I In fact there are two tables, ~8000 are the Hebrew words, and the
Greek words are ~5000.

This will take time to load ~13,000 x 60 registers in memory at the
start of the program.

In that case I would search on each vector with a succesive
aproximations routine.

>
> The problem is an old one - and your current
> approach (same word-content in different
> "shapes", shortened from right, shortened from
> left, etc. ... and then multiple comparison-loops)
> is the one, which perhaps everyone starts with.
>
> But eventually one will find, that this "naive"
> approach might be sufficient for a small amount
> of words, but multiple array-scans are costly in
> terms of performance and need a lot of memory
> when the "unsharp requirements" grow - or the
> wordcount gets higher.

In this case, the dictionay won't ever change.

>
> And that's, were all these more advanced,
> unsharp algorithms come into play (metaphone,
> ratcliff & Co.)
>
> Did you already tried one of them against your
> set of words?

Humm, no. What I'm doing, is shortening from right, shortening from left
(also varying, the search word shortened 2 characters, the database word
shortened one, and so), and do a "FindFirst" in the register in
question. If I find a match, then I compare that word (not the shortened
one but the complete word) with the word that I'm searching (also the
complete one) with an algorytm based on "Levenshtein Distance".
Then I move first to the previous and later to the next contiguous
records until I reach a number of words that didn't match more than 50%,
then I stop (and go to the next searching cicle in the loop).

At the end, I order the list of words taht were found by the percentage
of coincidence, and if they are more than 20, I leave only the 20 more
relevants.

>
> Or could you give a list of (maybe 10-20 should
> be sufficient) words - and then a few searchterms,
> for which you define the word (out of your
> 10-20 words-list) that should be returned - just that
> we can see, how tolerant/unsharp the algorithm should be...
>
> Olaf

OK, I'll copy some results from what I've made.
Suppose that the user wants to find the word "anupokritos", but he or
she doesn't remember how to spell it, and writes it "like it sound":

******************
word: anaupocritos (Greek)
It took 159,06 seconds

Word ID: G 505 anupokritos percentage: 82
Word ID: G 506 anupotaktos percentage: 65
Word ID: G 5273 hupokrites percentage: 58
Word ID: G 379 anapologetos percentage: 58
Word ID: G 368 anantirrhetos percentage: 58
Word ID: G 369 anantirrhetos percentage: 58
Word ID: G 178 akatakritos percentage: 58
Word ID: G 799 Asugkritos percentage: 57
Word ID: G 5580 pseudochristos percentage: 56
Word ID: G 319 anagnorizomai percentage: 53
Word ID: G 338 anaitios percentage: 50
Word ID: G 402 anachoreo percentage: 50
Word ID: G 87 adiakritos percentage: 50
Word ID: G 361 anamartetos percentage: 50
Word ID: G 526 apallotrioo percentage: 50
Word ID: G 3480 Nazoraios percentage: 50
Word ID: G 4182 polupoikilos percentage: 50
Word ID: G 125 Aiguptos percentage: 50
Word ID: G 378 anapleroo percentage: 50
Word ID: G 377 anapipto percentage: 50
******************

Some more samples:

******************
word: neumaticós (Greek)
It took 124,41 seconds

Word ID: G 4153 pneumatikos percentage: 81
Word ID: G 4152 pneumatikos percentage: 81
Word ID: G 1193 dermatinos percentage: 66
Word ID: G 3020 Leuitikos percentage: 60
Word ID: G 4984 somatikos percentage: 60
Word ID: G 4985 somatikos percentage: 60
Word ID: G 1444 Hebraikos percentage: 60
Word ID: G 2122 eukairos percentage: 57
Word ID: G 2121 eukairos percentage: 57
Word ID: G 2739 kaumatizo percentage: 57
Word ID: G 3524 nephaleos percentage: 55
Word ID: G 5538 chrematismos percentage: 54
Word ID: G 2773 kermatistes percentage: 54
Word ID: G 3566 numphios percentage: 53
Word ID: G 3512 neoterikos percentage: 50
Word ID: G 2452 Ioudaikos percentage: 50
Word ID: G 2451 Ioudaikos percentage: 50
Word ID: G 2441 himatismos percentage: 50
Word ID: G 1054 Galatikos percentage: 50
Word ID: G 4262 probatikos percentage: 50
******************
word: teofenustos (Greek)
It took 152,21 seconds

Word ID: G 2315 theopneustos percentage: 65
Word ID: G 3504 neophutos percentage: 60
Word ID: G 1675 Hellenistes percentage: 54
Word ID: G 1354 Dionusios percentage: 53
Word ID: G 5118 tosoutos percentage: 53
Word ID: G 5108 toioutos percentage: 53
Word ID: G 5082 telikoutos percentage: 51
Word ID: G 4339 proselutos percentage: 51
Word ID: G 5533 chreopheiletes percentage: 50
Word ID: G 2459 Ioustos percentage: 50
Word ID: G 2312 theodidaktos percentage: 49
Word ID: G 2180 Ephesios percentage: 48
Word ID: G 1721 emphutos percentage: 48
******************
word: alos (Greek)
It took 16,13 seconds

Word ID: G 243 allos percentage: 97
Word ID: G 247 allos percentage: 97
Word ID: G 2570 kalos percentage: 79
Word ID: G 4535 salos percentage: 79
Word ID: G 2573 kalos percentage: 77
Word ID: G 216 alalos percentage: 66
Word ID: G 5194 hualos percentage: 66
Word ID: G 358 analos percentage: 66
Word ID: G 527 apalos percentage: 65
Word ID: G 836 aulos percentage: 55
Word ID: G 259 halosis percentage: 54
Word ID: G 3171 megalos percentage: 54
Word ID: G 806 asphalos percentage: 54
Word ID: G 250 aloe percentage: 53
Word ID: G 301 Amos percentage: 50
******************
word: Jehová Nisi (Hebrew)
It took 215,4 seconds

Word ID: H 3071 Yehovah nicciy percentage: 72
Word ID: H 3070 Yehovah yireh percentage: 54
******************
word: Jehová Sitquenu (Hebrew)
It took 292,88 seconds

Word ID: H 3072 Yehovah tsidqenuw percentage: 72
******************
word: Elojim (Hebrew)
It took 59,43 seconds

Word ID: H 430 'elohiym percentage: 97
Word ID: H 440 'Elowniy percentage: 53
******************
word: nefes (Hebrew)
It took 36,44 seconds

Word ID: H 5315 nephesh percentage: 80
Word ID: H 657 'ephec percentage: 77
Word ID: H 5311 nephets percentage: 62
Word ID: H 5233 nekec percentage: 62
Word ID: H 5298 Nepheg percentage: 60
Word ID: H 5309 nephel percentage: 60
Word ID: H 5316 nepheth percentage: 60
Word ID: H 660 'eph`eh percentage: 57
Word ID: H 7516 rephesh percentage: 49
Word ID: H 2665 chephes percentage: 49
******************

In all cases the desired word was the first one in the list.

Thanks.

mm

8/23/2011 12:05:00 AM

El 22/08/2011 04:33 p.m., Tony Toews escribió:
> On Mon, 22 Aug 2011 04:37:59 -0300, Eduardo<mm@mm.com> wrote:
>
>>> Did you index each of those additional 9*6 fields?
>>
>> I tried, but Access told me that I can't have more than 32 indexes, so I
>> removed all the indexes.
>
> Oops, I forgot about that problem.
>
>>> How many records in your table?
>> ~~ 8600
>> It's a local database on the HD.
>
>>> That said stuff can happen. So if the users can enter new words and
>>> you fill in these additional fields and something happens you might
>>> get data that is incomplete. Therefore you should have a routine that
>>> does a bulk update of all these fields.
>>
>> There is not new records entries, it's a fixed dictinary with Bible words.
>
> That helps.
>
> Schmidt has the right idea. Go with arrays in memory. For 8800
> records that'll be just fine. 8 million records maybe not but then
> these days with 2 Gb of available RAM even that might be doable.
>
> Tony

OK, but they are 8000 (in fact 13000 because they are two tables) * 60,
not just 8000.

I think I should do this in the background, and if the user wants to
search a word inmediately after he/she starts the program, I'll have to
say that the database is still not ready. It has also some programming
complications because of the use of Doevents.

But this is the next way I'll explore, because it's taking too much time
now.

Another thing to try is to split the fields in several tables so I can
index every one. Do you think that it worth to try that?

microsoft.public.vb.general.discussion

ORDER BY a second letter of a field

mm

Tony Toews

(nobody)

mm

Tony Toews

mm

Schmidt

Tony Toews

Tony Toews

mm

mm

x Login to ForumsZone