[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How do I quickly search the end of a huge text file?

Brian Green

9/5/2008 1:51:00 AM

I am trying to create a ruby script that will search a maya ascii file
for specific text. The problem I'm running into is that it's running to
slow for the system at work. I know that all the information I need is
in the last 5% of the text file - but I haven't been able to figure out
a way to either jump to near the end, and then start search through
lines or even better iterating backwards through the file till I find
what I'm looking for... here is the code I'm currently using - which
works but slowly - any suggestions for how to speed this up would be
greatly appreciated! :D

require "FileUtils"
require "ftools"

def FindRenderLayers (root)
layersFile = []
dirLocation = root.gsub(/(\\)$/, '')
list = Dir.entries(dirLocation)

list.each do |file|
if file =~ /\.ma$/
fileName = root + file
layersFile.push file
File.open(fileName) do |file|
while line = file.gets
if line =~ /connectAttr
(\"renderLayerManager.rlmi\[[0-9]\]\")/
if $1 != "defaultRenderLayer"
editedLine = "-" + $1
layersFile.push editedLine
end
end
end
end
end
end
return layersFile
end

root = "C:\\Users\\Brian\\Documents\\Ruby\\"
puts FindRenderLayers(root)
--
Posted via http://www.ruby-....

8 Answers

Victor H. Goff III

9/5/2008 1:58:00 AM

0

[Note: parts of this message were removed to make it a legal post.]

IO::SEEK_END at Ruby-Doc may be the ticket...
http://www.ruby-doc.org/core/classes/IO.ht...


On Thu, Sep 4, 2008 at 8:51 PM, Brian Green <gallagherjb@gmail.com> wrote:

> I am trying to create a ruby script that will search a maya ascii file
> for specific text. The problem I'm running into is that it's running to
> slow for the system at work. I know that all the information I need is
> in the last 5% of the text file - but I haven't been able to figure out
> a way to either jump to near the end, and then start search through
> lines or even better iterating backwards through the file till I find
> what I'm looking for... here is the code I'm currently using - which
> works but slowly - any suggestions for how to speed this up would be
> greatly appreciated! :D
>
> require "FileUtils"
> require "ftools"
>
> def FindRenderLayers (root)
> layersFile = []
> dirLocation = root.gsub(/(\\)$/, '')
> list = Dir.entries(dirLocation)
>
> list.each do |file|
> if file =~ /\.ma$/
> fileName = root + file
> layersFile.push file
> File.open(fileName) do |file|
> while line = file.gets
> if line =~ /connectAttr
> (\"renderLayerManager.rlmi\[[0-9]\]\")/
> if $1 != "defaultRenderLayer"
> editedLine = "-" + $1
> layersFile.push editedLine
> end
> end
> end
> end
> end
> end
> return layersFile
> end
>
> root = "C:\\Users\\Brian\\Documents\\Ruby\\"
> puts FindRenderLayers(root)
> --
> Posted via http://www.ruby-....
>
>

Brian Green

9/5/2008 2:30:00 AM

0

Victor Goff wrote:
> IO::SEEK_END at Ruby-Doc may be the ticket...
> http://www.ruby-doc.org/core/classes/IO.ht...

Thanks for your input... I actually tried using SEEK_END - couldn't get
it to work right...
--
Posted via http://www.ruby-....

Brian Green

9/5/2008 4:30:00 AM

0

Peña, Botp wrote:
> From: Brian Green [mailto:gallagherjb@gmail.com]
> # I am trying to create a ruby script that will search a maya ascii file
> # for specific text. The problem I'm running into is that it's
> # running to slow for the system at work.
>
> why do you say it is slow? what is your comparison? where is your
> benchmark?
> how many files do you have? how large are the files?
> how much disk space do you have?
> how much memory do you have?
> how fast is your cpu?

It's slow because the script is going to integrated into the companies
online asset management software - and I was told by the IT guys that if
it's slower than a certain speed it will time out - it currently is too
slow.

As far as how many files it ranges between 3-5 (usually), the sizes of
the files vary from about 5MB-50MB

Disk space is not an issue - there's tons of it. As far memory goes -
the IT guys said it can't load the whole file into memory.

CPU is fairly fast - but again this isn't the problem - since it will be
running from a server...

>
> # I know that all the information I need is
> # in the last 5% of the text file - but I haven't been able to
>
> are you sure of the 5% ?
> where is your proof?

I've gone through many files and manually located where the text I'm
looking for appears - they appear no further out that 5% from the end...


> # figure out a way to either jump to near the end, and then
> # start search through lines
>
> low level, use IO:SEEK_END

I'm not sure how to use the SEEK_END properly and it's hard finding good
examples...

> # or even better iterating backwards through the file till I find
> # what I'm looking for...
>
> arggh. but your comparison will be forward. otherwise, you'll have to
> reverse your search/regex pattern. implement a reverse readline/gets.
>
That sounds good how do I do that?


> # here is the code I'm currently using - which
> # works
>
> are you sure it works? see my comment below, inline of your code.
>
> # but slowly - any suggestions for how to speed this up would be
> # greatly appreciated! :D
> #
> # require "FileUtils"
> # require "ftools"
> #
> # def FindRenderLayers (root)
> # layersFile = []
> # dirLocation = root.gsub(/(\\)$/, '')
> # list = Dir.entries(dirLocation)
> #
> # list.each do |file|
> # if file =~ /\.ma$/
> # fileName = root + file
> # layersFile.push file
> # File.open(fileName) do |file|
> # while line = file.gets
> # if line =~ /connectAttr
> #(\"renderLayerManager.rlmi\[[0-9]\]\")/
> # if $1 != "defaultRenderLayer"
>
> pls forgive me at this point because i am at a lost
>
> 1. how could $1, which is patterned after
> \"renderLayerManager.rlmi\[[0-9]\]\", be ever be equal to
> "defaultRenderLayer" ??
>
Sorry - yeah that's not needed - had it a while ago and forgot to erase
it.

> 2. and besides why need to compare again, if you can ask it straight
> from your regex comparison?
>
You're right...
>
> # editedLine = "-" + $1
> # layersFile.push editedLine
> # end
> # end
> # end
> # end
> # end
> # end
> # return layersFile
> # end
> #
> # root = "C:\\Users\\Brian\\Documents\\Ruby\\"
> # puts FindRenderLayers(root)
>
>
> kind regards -botp

--
Posted via http://www.ruby-....

Lex Williams

9/5/2008 5:38:00 AM

0

Here is an usage example :

begin
file = File.open(ARGV[0])
rescue
puts "file does not exist or is not a file\n"
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .
--
Posted via http://www.ruby-....

Lex Williams

9/5/2008 5:43:00 AM

0

Lex Williams wrote:
> Here is an usage example :
>
> begin
> file = File.open(ARGV[0])
> rescue
> puts "file does not exist or is not a file\n"
> end
>
> file.seek(-25,IO::SEEK_END)
> puts file.readlines
>
> The code will read the rest of the files from that location . Try it on
> a file and see .

I meant the rest of the lines . Sorry .
--
Posted via http://www.ruby-....

Brian Green

9/5/2008 11:32:00 AM

0

Thank you very much!! That's exactly what I was looking for!

I just added

file.seek(-2000,IO::SEEK_END)

right after the line

fileSize = File.size(fileName)

and it worked perfectly! It's running about 18x faster - which is a huge
improvement - I think the guys at work will be satisifed with it's speed
now!

Thanks again Lex!! :D

Lex Williams wrote:
> Lex Williams wrote:
>> Here is an usage example :
>>
>> begin
>> file = File.open(ARGV[0])
>> rescue
>> puts "file does not exist or is not a file\n"
>> end
>>
>> file.seek(-25,IO::SEEK_END)
>> puts file.readlines
>>
>> The code will read the rest of the files from that location . Try it on
>> a file and see .
>
> I meant the rest of the lines . Sorry .

--
Posted via http://www.ruby-....

botp

9/5/2008 12:25:00 PM

0

On Fri, Sep 5, 2008 at 7:32 PM, Brian Green <gallagherjb@gmail.com> wrote:
> I just added
> file.seek(-2000,IO::SEEK_END)
> right after the line
> fileSize = File.size(fileName)

if i'm not mistaken, that would be

fileSize = File.size(fileName)
file.seek(-0.05*fileSize, IO::SEEK_END)

Reid Thompson

9/5/2008 12:28:00 PM

0

Brian Green wrote:
> Thank you very much!! That's exactly what I was looking for!
>
> I just added
>
> file.seek(-2000,IO::SEEK_END)
>
> right after the line
>
> fileSize = File.size(fileName)
>

50 megabytes = 52 428 800 bytes
5% = 52428800 * .05 = 2621440
2621440 != 2000

perhaps:
fileSize = File.size(fileName)
seeklen = ((0.05 * fileSize) * -1).to_i
file = File.open(ARGV[0)
file.seek(seeklen, IO::SEEK_END)
puts file.readlines
--
Posted via http://www.ruby-....