Asp Forum - Evading the limit of a pipe's standard input

Ashley Moran

8/29/2006 7:17:00 PM

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Thanks
Ashley

7 Answers

Gennady Bystritsky

8/29/2006 8:06:00 PM

> -----Original Message-----
> From: Ashley Moran [mailto:work@ashleymoran.me.uk]
> Sent: Tuesday, August 29, 2006 12:17 PM
> To: ruby-talk ML
> Subject: Evading the limit of a pipe's standard input
>
> Hi
>
> I'm trying to write a tool that generates a really long SQL script
> and passes it to psql, but I've hit a problem because my script is
> too long to be sent to the stdin of psql. This is the first time
> I've run into the limit so it had me scratching my head for a
> while.
> I've tried a load of tricks, even putting the lines in an array, eg:
>
> MAX_SUCCESSFUL_TIMES = 3047
>
> query = ["BEGIN WORK;"]
> (MAX_SUCCESSFUL_TIMES + 1).times do
> query << "INSERT INTO people (name) VALUES ('Fred');"
> end
> query << "COMMIT;"
>
> IO.popen("psql -U test test","r+") do |pipe|
> query.each { |statement| pipe.puts statement }
> end
>
> but it still fails when the total length of commands exceeds the
> limit (which by experiment I've found to be 128K on Mac OS X, hence
> the specific number of times above).
>
> What's the best solution to this. I would like to stick to inter-
> process communication, and avoid temporary files and rewriting it to
> use DBD, if possible. Or are they my only options?

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

snacktime

8/29/2006 10:51:00 PM

>
> If you say that psql accepts only limited amount of data from its stdin,
> it is more of its problem, rather than pipe's. To check it, try to feed
> the big file to psql via redirection, like in:

It's not, psql is commonly used to restore whole databases by feeding
it data from stdin.

Ken Bloom

8/30/2006 12:25:00 AM

On Wed, 30 Aug 2006 04:16:35 +0900, Ashley Moran wrote:

> Hi
>
> I'm trying to write a tool that generates a really long SQL script
> and passes it to psql, but I've hit a problem because my script is
> too long to be sent to the stdin of psql. This is the first time
> I've run into the limit so it had me scratching my head for a while.
> I've tried a load of tricks, even putting the lines in an array, eg:
>
> MAX_SUCCESSFUL_TIMES = 3047
>
> query = ["BEGIN WORK;"]
> (MAX_SUCCESSFUL_TIMES + 1).times do
> query << "INSERT INTO people (name) VALUES ('Fred');"
> end
> query << "COMMIT;"
>
> IO.popen("psql -U test test","r+") do |pipe|
> query.each { |statement| pipe.puts statement }
> end
>
> but it still fails when the total length of commands exceeds the
> limit (which by experiment I've found to be 128K on Mac OS X, hence
> the specific number of times above).
>
> What's the best solution to this. I would like to stick to inter-
> process communication, and avoid temporary files and rewriting it to
> use DBD, if possible. Or are they my only options?

Rewriting to DBI is not very difficult, so unless you have a reason other
than not wanting DBI calls to clutter your code or not wanting to
massively restructure your code, the following should work:

class DBI::DatabaseHandle
#Takes a whole SQL script in a string
#and executes it on the database.
def batch(sql)
sql=sql.split(";").delete_at(-1)
sql.each{ |statement| self.do(statement) }
end
end

in the sample code you gave above,
query.each {|statement|dbh.do(statement)}
should also work just fine.

Is it possible with the pipe solution that you need to read back the
output data so that pgsql doesn't back up its pipe buffer as Grennady
Bystritsky has suggested?

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Ashley Moran

8/30/2006 7:34:00 AM

On Tuesday 29 August 2006 21:06, Gennady Bystritsky wrote:
> Make sure the command you spawn in IO.popen is actually reading out
> stuff from its stdin in parallel. The limit a pipe have on the system
> level is the amount of the unread data it can hold. When the data is
> read out, it makes room for more data. However, if your reader is stuck
> for some reason, the writer will wait too.
>
> If you say that psql accepts only limited amount of data from its stdin,
> it is more of its problem, rather than pipe's. To check it, try to feed
> the big file to psql via redirection, like in:
>
> psql -U test test < "your big file"
>
> Hope I am not too off from what you meant,
> Gennady.

Hi Gennady

I was not very clear in my e-mail, I didn't think psql was limited in the size
of file it can read. You were right it was just the buffer filling up. I
had actually tried reading from the buffer but not in parallel. This seems
to work:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

I assume pipe.read blocks, so the thread doesn't turn the CPU into a heating
element.

Cheers
Ashley

--
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky

Ashley Moran

8/30/2006 8:02:00 AM

On Wednesday 30 August 2006 08:34, Ashley Moran wrote:
> IO.popen("psql -U test test","r+") do |pipe|
> Thread.new { loop { pipe.read } } # fixes it
> query.each { |statement| pipe.puts statement }
> end

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
this to work. Is there a less magical way to sync it? (I took a stab at
Process.wait but it doesn't work.)

Ashley

--
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky

Ken Bloom

8/30/2006 1:05:00 PM

On Wed, 30 Aug 2006 17:02:08 +0900, Ashley Moran wrote:

> On Wednesday 30 August 2006 08:34, Ashley Moran wrote:
>> Â Â IO.popen("psql -U test test","r+") do |pipe|
>> Â Â Â Thread.new { loop { pipe.read } } Â Â Â Â Â Â Â Â # fixes it
>> Â Â Â query.each { |statement| pipe.puts statement }
>> Â Â end
>
>
> Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
> this to work. Is there a less magical way to sync it? (I took a stab at
> Process.wait but it doesn't work.)
>
> Ashley
>

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test >
/dev/null","r+")

--Ken

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Ashley Moran

8/30/2006 3:46:00 PM

On Wednesday 30 August 2006 14:10, Ken Bloom wrote:
> Since you're throwing away the output, instead of using a thread, why try
> one of the following:
> * popen with the mode "w"
> IO.popen("psql -U test test","w")
> * redirect the results from pgsql to /dev/null inside the popen call
> IO.popen("psql -U test test > /dev/null","r+")
>
> --Ken

Thanks Ken the last one was what I was looking for. I forgot to say
that 'IO.popen("psql -U test test","w")' was actually one of the first things
I tried, but it dumps the (several thousand lines of) psql results back into
the shell, which renders the output of the script useless.

Ashley

--
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky

comp.lang.ruby

Evading the limit of a pipe's standard input

Ashley Moran

Gennady Bystritsky

snacktime

Ken Bloom

Ashley Moran

Ashley Moran

Ken Bloom

Ashley Moran

x Login to ForumsZone