[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

The Gems Repository Stats

Leslie Viljoen

5/30/2008 9:19:00 PM

Wondering why the endless "bulk updating" takes so long I looked that
the source a bit.
There seems that if there are more than 50 gems missing from the quick
list, a bulk update
is done:

source_index.rb:
use_incremental = missing_gems.size <= INCREMENTAL_THRESHHOLD

(INCREMENTAL_THRESHOLD is 50)

..yet it seems the quick list is always more than 50 gems out of date (for the
bulk update seems to always be done)

When downloading and extracting both lists, it look like there are 13464 gems
listed in the bulk file and 13432 in the quick list.

(For the curious, they are these files, compressed with zlib:
http://gems.rubyforge....
http://gems.rubyforge.org/quic...)


This is a difference of 32 gems, but then perhaps some on the quick list are out
of date? Or perhaps I got the numbers wrong?


So to make things faster:
--------------------------------

1. Since the bulk index is 854k and expands to 20MB, perhaps there's a way to
keep that quick index more up-to-date?

2. Only 3134 of the 13432 gems are unique gems - 10298 are older versions
of these gems. I think that people rarely search or install old gems, so perhaps
the list can be split into a file for latest versions versus old versions.

3. I often search for gems repeatedly, and the bulk index gets pulled down
repeatedly - why not save this file locally for at least a few hours?
(probably try to implement this myself just now)

4. Perhaps if the server is taking strain, a mirror or two could be
set up? I doubt
many people would care about such relatively small files on their servers -
I'd be willing to ask some people if they'd do a ZA mirror.



Any comments?

Les

3 Answers

Eric Hodel

6/5/2008 4:33:00 AM

0

On May 30, 2008, at 14:18 PM, Leslie Viljoen wrote:
> Wondering why the endless "bulk updating" takes so long I looked that
> the source a bit.
> There seems that if there are more than 50 gems missing from the quick
> list, a bulk update
> is done:
>
> source_index.rb:
> use_incremental = missing_gems.size <= INCREMENTAL_THRESHHOLD
>
> (INCREMENTAL_THRESHOLD is 50)
>
> ..yet it seems the quick list is always more than 50 gems out of
> date (for the
> bulk update seems to always be done)

You are looking at RubyGems 0.9.4 or older, so this code is at least
one year old. RubyGems 0.9.5 has a configurable threshold. The next
release of RubyGems has a new metadata fetcher (SpecFetcher) and will
not need to perform bulk updates at all (which is 2-4 weeks from
release).

> When downloading and extracting both lists, it look like there are
> 13464 gems
> listed in the bulk file and 13432 in the quick list.

I'm not sure why that is, I'll have to update my local mirror and see
if there is a bug in the latest indexer code. Can you get a list of
the missing gems and file a bug on rubyforge to remind me?

http://rubyforge.org/tracker/?atid=575&group_id=126&f...

> (For the curious, they are these files, compressed with zlib:
> http://gems.rubyforge....
> http://gems.rubyforge.org/quic...)
>
> This is a difference of 32 gems, but then perhaps some on the quick
> list are out
> of date? Or perhaps I got the numbers wrong?
>
>
> So to make things faster:
> --------------------------------
>
> 1. Since the bulk index is 854k and expands to 20MB, perhaps there's
> a way to
> keep that quick index more up-to-date?

All indexes are regenerated at the same time in a directory in /tmp
then moved into place when the index is complete. .gem files may be
missing because it takes time for the mirrors to sync up to the
master. The mailing list on the link at the bottom of this email
would be the best place to take this up.

> 2. Only 3134 of the 13432 gems are unique gems - 10298 are older
> versions
> of these gems. I think that people rarely search or install old
> gems, so perhaps
> the list can be split into a file for latest versions versus old
> versions.

This was done in 1.1.1, IIRC, but hasn't been very helpful.
SpecFetcher also makes this distinction, but it is written in less of
a band-aid fashion and works out quite cleanly. It was probably a
mistake to add a latest index to 1.1.1, but without it I would not
have arrived at the SpecFetcher so quickly.

> 3. I often search for gems repeatedly, and the bulk index gets
> pulled down
> repeatedly - why not save this file locally for at least a few hours?
> (probably try to implement this myself just now)

This is a bug that will not be fixed. SpecFetcher has a much simpler
API that doesn't have the big pile of band-aids I applied to the old
one in an attempt to get it to scale to the current size. The
original design held up surprisingly well over the years, but it's
finally time to see it go.

SpecFetcher will call back into the old bulk index code when a
repository doesn't have the indexes it needs, and will spit out a
warning that hopefully can help the affected users get the indexer
updated.

> 4. Perhaps if the server is taking strain, a mirror or two could be
> set up? I doubt many people would care about such relatively small
> files on their servers - I'd be willing to ask some people if they'd
> do a ZA mirror.

CPU-wise there isn't any strain on the mirrors that I know of, since
they only serve up static content. If you would like to set up a
mirror, there's information here:

http://rubyforge.org/docman/view.php/5/231/mirror_...

> Any comments?

This is a better topic for the rubygems-developers list, but I've
solved the problem (hopefully for at least the next five years)
already. Compare lib/rubygems/spec_fetcher.rb in trunk to the old code.

Leslie Viljoen

6/5/2008 8:58:00 AM

0

On Thu, Jun 5, 2008 at 6:32 AM, Eric Hodel <drbrain@segment7.net> wrote:
> On May 30, 2008, at 14:18 PM, Leslie Viljoen wrote:
>>
>> Wondering why the endless "bulk updating" takes so long I looked that
>> the source a bit.
>> There seems that if there are more than 50 gems missing from the quick
>> list, a bulk update
>> is done:
>>
>> source_index.rb:
>> use_incremental = missing_gems.size <= INCREMENTAL_THRESHHOLD
>>
>> (INCREMENTAL_THRESHOLD is 50)
>>
>> ..yet it seems the quick list is always more than 50 gems out of date (for
>> the
>> bulk update seems to always be done)
>
> You are looking at RubyGems 0.9.4 or older, so this code is at least one
> year old. RubyGems 0.9.5 has a configurable threshold. The next release of
> RubyGems has a new metadata fetcher (SpecFetcher) and will not need to
> perform bulk updates at all (which is 2-4 weeks from release).
>
>> When downloading and extracting both lists, it look like there are 13464
>> gems
>> listed in the bulk file and 13432 in the quick list.

Probably my regular expressions letting me down, I thought this was
normal and the cause
of the bulk-updating but have misunderstood.

>> 1. Since the bulk index is 854k and expands to 20MB, perhaps there's a way
>> to
>> keep that quick index more up-to-date?
>
> All indexes are regenerated at the same time in a directory in /tmp then
> moved into place when the index is complete. .gem files may be missing
> because it takes time for the mirrors to sync up to the master. The mailing
> list on the link at the bottom of this email would be the best place to take
> this up.
>
>> 2. Only 3134 of the 13432 gems are unique gems - 10298 are older versions
>> of these gems. I think that people rarely search or install old gems, so
>> perhaps
>> the list can be split into a file for latest versions versus old versions.
>
> This was done in 1.1.1, IIRC, but hasn't been very helpful. SpecFetcher
> also makes this distinction, but it is written in less of a band-aid fashion
> and works out quite cleanly. It was probably a mistake to add a latest
> index to 1.1.1, but without it I would not have arrived at the SpecFetcher
> so quickly.
>
>> 3. I often search for gems repeatedly, and the bulk index gets pulled down
>> repeatedly - why not save this file locally for at least a few hours?
>> (probably try to implement this myself just now)
>
> This is a bug that will not be fixed. SpecFetcher has a much simpler API
> that doesn't have the big pile of band-aids I applied to the old one in an
> attempt to get it to scale to the current size. The original design held up
> surprisingly well over the years, but it's finally time to see it go.
>
> SpecFetcher will call back into the old bulk index code when a repository
> doesn't have the indexes it needs, and will spit out a warning that
> hopefully can help the affected users get the indexer updated.

Is specfetcher part of an upcoming version of gems?

>> 4. Perhaps if the server is taking strain, a mirror or two could be
>> set up? I doubt many people would care about such relatively small files
>> on their servers - I'd be willing to ask some people if they'd do a ZA
>> mirror.
>
> CPU-wise there isn't any strain on the mirrors that I know of, since they
> only serve up static content. If you would like to set up a mirror, there's
> information here:
>
> http://rubyforge.org/docman/view.php/5/231/mirror_...

Thanks for your great work!

Les

Eric Hodel

6/5/2008 10:47:00 AM

0

On Jun 5, 2008, at 01:57 AM, Leslie Viljoen wrote:
>>> 3. I often search for gems repeatedly, and the bulk index gets
>>> pulled down
>>> repeatedly - why not save this file locally for at least a few
>>> hours?
>>> (probably try to implement this myself just now)
>>
>> This is a bug that will not be fixed. SpecFetcher has a much
>> simpler API
>> that doesn't have the big pile of band-aids I applied to the old
>> one in an
>> attempt to get it to scale to the current size. The original
>> design held up
>> surprisingly well over the years, but it's finally time to see it go.
>>
>> SpecFetcher will call back into the old bulk index code when a
>> repository
>> doesn't have the indexes it needs, and will spit out a warning that
>> hopefully can help the affected users get the indexer updated.
>
> Is specfetcher part of an upcoming version of gems?

Yes, 2-4 weeks out. I have to go through the rest of the tracker
fixing bugs, wait for some feedback from apple and jruby, and give it
a week or so to settle for testing.