Asp Forum - How to make Ruby _THE_ scripting language of choice, fold in SQLite

John Carter

9/25/2008 11:44:00 PM

Much has been written about the OOP / RDBMS impedance mismatch....

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level... the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In some cases the results are returned as strings in other cases
various levels of conversion can be ommitted.

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

Suggestion to make Ruby the undisputed leader of the scripting languages...
1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

2) incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.

3) and perhaps optionally generate string queries for
(mysql/postgres/oracle...) backends.

4) provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

5) Hide the distinction between Object "handles" and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

# Attaches, unless told otherwise to sqlite db __FILE__.db
require 'rubysql'

class Customer
attr_primary_key :name
attr_reader :street, :city

# Map this class to a sql table called Customer with primary key :name and fields street and city.
include persistent
end

class Invoice
attr_foreign :customer, Customer

end

# It's persisted, so even if this appears a useless and can be garbage
# collected, it will at least be added to Customer table
Customer.new( "Joe Blogs", "52 hell", "Doggcity")

jane = Customer.new( "Jane Blag", "52 Sell", "Dodgcity")

a = Invoice.new

# The following three lines should be equivalent...
a.customer = "Jane Blag"
a.customer = jane
a.customer = select * from Customer where name == "Jane Blag"

# An both should transparently and automagically allow ...
assert_equal( "Dodgcity", a.customer.city)

and a.customer.city be the same as...
select city from Customer where name == a.customer;

19 Answers

ara.t.howard

9/25/2008 11:53:00 PM

On Sep 25, 2008, at 5:43 PM, John Carter wrote:

> Much has been written about the OOP / RDBMS impedance mismatch....
>
> But when it comes down to it, SQL wins for certain tasks, and
> ruby/perl/python wins for other tasks.
>
> Sure we have adapters that provide access from Ruby to SQL databases,
> but at some primitive level... the query is written as a string, the
> string is fed to a SQL interpreter, the query is optimized by the SQL
> engine and the results returned.
>
> In some cases the results are returned as strings in other cases
> various levels of conversion can be ommitted.
>
> The fact is the function call notation is insufficiently rich to
> specify an arbitary SQL statement without reducing to the (very)
> lowest common denominator (string) and then invoking a (heavyweight)
> SQL interpretor.
>
> Suggestion to make Ruby the undisputed leader of the scripting
> languages...
> 1) Extend Ruby syntax to accept a useful subset of SQLite SQL.
>
> 2) incorporate the appropriate bits of SQLITE into the Ruby
> interpretor to evaluate it.
>
> 3) and perhaps optionally generate string queries for
> (mysql/postgres/oracle...) backends.
>
> 4) provide a means of transparently mapping certain classes to
> tables so that they are automagically persisted at commit points,
> on garbage collection or program exit, and automagically
> revivified on reference.
>
> 5) Hide the distinction between Object "handles" and foreign
> keys. ie. If instance variable @c of an object of class A is
> a reference to an instance of class / table C, then
> following that reference may cause that row of table C to be
> looked up and revivified as an instance of class C.
>
> # Attaches, unless told otherwise to sqlite db __FILE__.db
> require 'rubysql'
>
> class Customer
> attr_primary_key :name
> attr_reader :street, :city
>
> # Map this class to a sql table called Customer with primary
> key :name and fields street and city.
> include persistent
> end
>
> class Invoice
> attr_foreign :customer, Customer
>
> end
>
> # It's persisted, so even if this appears a useless and can be garbage
> # collected, it will at least be added to Customer table
> Customer.new( "Joe Blogs", "52 hell", "Doggcity")
>
> jane = Customer.new( "Jane Blag", "52 Sell", "Dodgcity")
>
> a = Invoice.new
>
> # The following three lines should be equivalent...
> a.customer = "Jane Blag" a.customer = jane
> a.customer = select * from Customer where name == "Jane Blag"
>
> # An both should transparently and automagically allow ...
> assert_equal( "Dodgcity", a.customer.city)
>
> and a.customer.city be the same as...
> select city from Customer where name == a.customer;
>

i think that's looking backwards. something like couchdb takes only a
few hundred lines of code to interface with and already does a ton of
that - the rest would be trivial. imho sql is a dead-end road,
nevertheless much of what you are describing is already done by sequel
(the ruby orm).

cheers.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

David Masover

9/26/2008 1:33:00 AM

On Thursday 25 September 2008 18:43:31 John Carter wrote:

> But when it comes down to it, SQL wins for certain tasks, and
> ruby/perl/python wins for other tasks.

The tasks SQL wins for, it wins mostly because it is executed inside the
database engine. I doubt very much that, given the choice between writing SQL
and writing Ruby, I would rather write SQL.

CouchDB will let me write custom queries in JavaScript.

And there was a similar, Ruby-based project (rddb?), but I haven't seen any
activity there in awhile. Still, the idea is the same -- I can write queries
in pure ruby.

> at some primitive level... the query is written as a string, the
> string is fed to a SQL interpreter, the query is optimized by the SQL
> engine and the results returned.

In other words, you're wanting to fundamentally change the language -- syntax,
parser, interpreter, top to bottom -- for a performance hack, to make it
perform better with SQL databases.

SQL databases aren't the only kind of database, nor is a database the only way
to store data. Nor is SQLite the only SQL database, or necessarily the best
to optimize for.

> Suggestion to make Ruby the undisputed leader of the scripting languages...

I don't think any of the other scripting languages are suffering greatly
because of lack of SQL.

In fact, Ruby has more often been slower than other languages, in my
experience. Often several orders of magnitude slower than Python, for
instance -- and Python does have Django.

Consider that -- Ruby sees plenty of production use _despite_ being slow.

> 1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

Sequel does this, somewhat. So does ActiveRecord, for that matter.

Neither of which is bound to one database engine.

Now, yes, they currently generate strings, at some point. However, since you
don't have to directly interact with the strings, your step 2 is nothing more
than a performance hack, which only applies to SQLite.

> 4) provide a means of transparently mapping certain classes to
> tables so that they are automagically persisted at commit points,
> on garbage collection or program exit, and automagically
> revivified on reference.

In other words, an ORM.

Like ActiveRecord, DataMapper, Og, Sequel, etc.

Only difference you're proposing is to make the persistence transparent.
Programmers working with ActiveRecord, in particular, don't seem to think
about concurrency issues enough -- how many of us actually use transactions?
But you're suggesting we bury that problem even further in abstractions.

What happens when two programs try to access the same object?

> 5) Hide the distinction between Object "handles" and foreign
> keys.

Yeah, like ActiveRecord, DataMapper...

The foreign keys are there if you need them, but you don't have to deal with
them directly.

> # It's persisted, so even if this appears a useless and can be garbage
> # collected, it will at least be added to Customer table
> Customer.new( "Joe Blogs", "52 hell", "Doggcity")

Great -- what happens if, somewhere very soon after this point, an exception
is raised? What about validations?

Never mind that it's kind of ugly that you're using positional arguments
there...

In a modern ORM, I'd do something like:

cust = Customer.new( :name => 'Joe blogs')
cust.save!

Maybe an address wasn't required, and this will work. Maybe an address was
required, in which case, an exception will be raised, and the record won't be
saved. It's useful to separate this from creation for other reasons, too --
suppose I did provide an address, and now want to remove it:

cust = Customer.find_by_name 'Joe blogs'
cust.address = nil
cust.save!

In this case, again, if address was required, I'll get a validation error, and
this particular change to the record won't be saved.

Clifford Heath

9/26/2008 2:15:00 AM

David Masover wrote:
> SQL databases aren't the only kind of database, nor is a database the only way
> to store data. Nor is SQLite the only SQL database, or necessarily the best
> to optimize for.

However, most of the non-SQL things that are included in such definitions
of "database" are not actually transactional, and hence don't qualify
as databases in any true sense of the word. The purpose of a database is
to do things that filesystems do not - ACID. Any "database" that doesn't
provide strong guarantees of ACID behaviours doesn't deserve the title.

SQL sucks, but it is strongly identified with true transactional behaviour,
an identification that only the dills who wrote MyISAM seem to have been
able to ignore and break down.

> Programmers working with ActiveRecord, in particular, don't seem to think
> about concurrency issues enough -- how many of us actually use transactions?

If by "transactions" you mean ActiveRecord::Base.transaction, possibly no-one.
Attempting rollback on exceptions from a user-mode process without using
a two-phase locking protocol (as many/most of AR's adapters do) is utterly
flawed and not transactional at all.

Transactional is when you can pull the power cord at any time and your data
is guaranteed consistent, and nothing that you committed has been lost.

Sorry to butt in like this, and you (David) probably know it all anyhow, but
I wanted to reinforce the true meaning of "database" and "transaction", since
they're so widely abused, berated and misunderstood.

Clifford Heath, Data Constellation.
Agile information Management and Design.

ara.t.howard

9/26/2008 2:51:00 AM

On Sep 25, 2008, at 8:19 PM, Clifford Heath wrote:

> If by "transactions" you mean ActiveRecord::Base.transaction,
> possibly no-one.
> Attempting rollback on exceptions from a user-mode process without
> using
> a two-phase locking protocol (as many/most of AR's adapters do) is
> utterly
> flawed and not transactional at all.

using them may not even help

http://drawohara.com/post/12411960/sane-transactions-for-ac...

though i have not tested this lately....

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Jeff Davis

9/26/2008 3:09:00 AM

On Fri, 2008-09-26 at 08:43 +0900, John Carter wrote:
> The fact is the function call notation is insufficiently rich to
> specify an arbitary SQL statement without reducing to the (very)
> lowest common denominator (string) and then invoking a (heavyweight)
> SQL interpretor.

I suggest you look at Sequel http://sequel.ruby...

This isn't just an ORM or DBI style layer. This is using Ruby syntax to
form relational expressions with an SQL flavor.

The python equivalent is SQLAlchemy.

These are really using the right approach, they do not try to map some
not-very-well-defined object model onto SQL, they just offer a SQL-ish
language inside of Ruby that generates real SQL, and sends it off for
evaluation.

> Suggestion to make Ruby the undisputed leader of the scripting languages...
> 1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

SQLite is pretty far from the SQL standard, last time I checked. I don't
consider it the pinnacle of database semantics.

> 2) incorporate the appropriate bits of SQLITE into the Ruby
> interpretor to evaluate it.

You haven't shown why _any_ of this needs to be in the interpreter
itself. What are you trying to accomplish, and why can't you accomplish
it through Ruby's rich extensibility?

> 3) and perhaps optionally generate string queries for
> (mysql/postgres/oracle...) backends.

What is the language you're trying to use for this purpose, and how is
this simpler than current systems that can generate strings to send to
SQL DBMSs?

> 4) provide a means of transparently mapping certain classes to
> tables so that they are automagically persisted at commit points,
> on garbage collection or program exit, and automagically
> revivified on reference.

The devil is in the details. "Automagically"? Is it more automagical
than current solutions like AR?

What about constraints and concurrency? These automagical persistence
engines tend to have difficulty in a centralized environment unless your
data is 100% semantically independent.

> 5) Hide the distinction between Object "handles" and foreign
> keys. ie. If instance variable @c of an object of class A is
> a reference to an instance of class / table C, then
> following that reference may cause that row of table C to be
> looked up and revivified as an instance of class C.

You're not the first person to have thought of this. The problem is,
there _is_ a distinction between those two concepts, and for good
reason.

What about a summary object, that represents the result of some GROUP BY
query? What is the object ID of that, and what is the database key?

> # Attaches, unless told otherwise to sqlite db __FILE__.db

What about ruby code that's not stored in a file?

The code example you gave looks pretty much like every other ORM. How is
yours fundamentally different, aside from not being able to connect to a
remote server?

Regards,
Jeff Davis

Jeff Davis

9/26/2008 3:17:00 AM

On Fri, 2008-09-26 at 08:53 +0900, ara.t.howard wrote:
> i think that's looking backwards. something like couchdb takes only a
> few hundred lines of code to interface with and already does a ton of
> that - the rest would be trivial. imho sql is a dead-end road,
> nevertheless much of what you are describing is already done by sequel
> (the ruby orm).
>

Sequel is fundamentally different than a traditional ORM, and in my
opinion, much better. SQLAlchemy is a conceptually similar project for
python, and both are much more promising than any ORM.

As far as SQL being a "dead-end road", it needs to have a replacement,
first. Sequel/SQLAlchemy are a start in that direction, but the SQL
DBMSs provide a lot of things aside from just language.

Regards,
Jeff Davis

ara.t.howard

9/26/2008 3:28:00 AM

On Sep 25, 2008, at 9:16 PM, Jeff Davis wrote:

> the SQL
> DBMSs provide a lot of things aside from just language

indeed. it's sql that's the issue imho, not acid properties.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

David Masover

9/26/2008 3:34:00 AM

On Thursday 25 September 2008 21:19:37 Clifford Heath wrote:
> David Masover wrote:
> > SQL databases aren't the only kind of database, nor is a database the only
way
> > to store data. Nor is SQLite the only SQL database, or necessarily the
best
> > to optimize for.
>
> However, most of the non-SQL things that are included in such definitions
> of "database" are not actually transactional, and hence don't qualify
> as databases in any true sense of the word.

However, many things that I would consider are, in fact, quite transactional.
Not as much as they could be, perhaps, but close enough.

First example: Filesystems. With ordered write mode, you can simply write to a
temporary file, then rename said temporary file into place. With unordered
writes, add an fsync between those two steps.

Any modern, journaled filesystem will make sure that the rename either
succeeds or doesn't, and you've made sure all data is successfully on disk
before you rename.

Implementation: Maildir.

Second example: Amazon's Dynamo. This powers S3 and SimpleDB, among other
things. They've got a paper on it. Some implementations can, indeed, be
considered purely-ACID.

What sets it apart from (most) SQL is, conflicts are expected as part of
normal operation, and are left to the application to sort out. That is:
Rather than preventing anyone else from modifying a record while I update it,
simply allow two versions of a record to exist, and provide common algorithms
for either choosing which version "wins", or for combining the two versions
into a third.

Thus, inconsistency is allowed, but only temporarily. It is never exposed to
the end-user, or even to (most of) the application.

Third example: Distributed version control, like Git. I mention this mostly
because it reflects the same philosophy as Dynamo above, albeit with more
human intervention -- but also because this is the one most developers are
likely to be intimately familiar with (or should be).

> If by "transactions" you mean ActiveRecord::Base.transaction, possibly
no-one.

Precisely so.

I remember that one of the first things I wrote, when developing a new Rails
app, was this method:

def double_save
transaction do
save(false)
yield
save!
end
end

This to allow the creation of circular record structures -- for example, every
domain must have an SOA record, and every record (of any kind) must have a
domain. Since I'd made these fields NOT NULL on the database, I would often
do this hack with new records of this kind:

d = Domain.new(...)
d.soa_id = -1
d.double_save do
d.soa = Soa.create!(...)
end

That -1, of course, would've killed my validations.

> Attempting rollback on exceptions from a user-mode process without using
> a two-phase locking protocol (as many/most of AR's adapters do) is utterly
> flawed and not transactional at all.

I actually don't see a lot of that -- more simple assumptions that nothing
will go wrong, or that dangling records aren't a problem.

Quite a lot of defensive programming, too, based on the idea that the data
WILL get corrupted somehow, someday, and being able to handle corrupt data is
more important than avoiding the corruption in the first place.

ara.t.howard

9/26/2008 4:16:00 AM

On Sep 25, 2008, at 9:34 PM, David Masover wrote:

not trying to be difficult but...

>
> However, many things that I would consider are, in fact, quite
> transactional.
> Not as much as they could be, perhaps, but close enough.
>
> First example: Filesystems. With ordered write mode, you can simply
> write to a
> temporary file, then rename said temporary file into place. With
> unordered
> writes, add an fsync between those two steps.
>
> Any modern, journaled filesystem will make sure that the rename either
> succeeds or doesn't, and you've made sure all data is successfully
> on disk
> before you rename.
>
> Implementation: Maildir.

rename is not atomic on NFS, nor many other network filesystems. on
NFS you have only softlink and rmdir as atomic operations.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Clifford Heath

9/26/2008 5:09:00 AM

ara.t.howard wrote:
> On Sep 25, 2008, at 9:34 PM, David Masover wrote:
>> First example: Filesystems. With ordered write mode, you can simply
>> write to a temporary file, then rename said temporary file into place.
>> With unordered writes, add an fsync between those two steps.
> rename is not atomic on NFS, nor many other network filesystems. on
> NFS you have only softlink and rmdir as atomic operations.

Yes, the Walter Tichy method, first used (to my knowledge) by him in RCS.

However, without the fsync, its still not transactional, as you can pull
the power cord after the operation and there's no guarantee that the disk
blocks that make up the file content are actually on disk. All the FS
journalling does is to ensure that the block allocation tables and inodes
are clean.

Also I don't know the defined POSIX semantics now, but the system-wide
"sync" call would return as soon as every dirty block was in the disk I/O
queues, and before they were on disk. You had to call it twice before
being sure, as the scan for dirty blocks couldn't complete until the
queues were empty. (I don't believe fsync has this flaw).

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

Neither is Amazon's approach, and neither is DVCS, despite their success
in solving *different* problems.

Clifford Heath.

comp.lang.ruby

How to make Ruby _THE_ scripting language of choice, fold in SQLite

John Carter

ara.t.howard

David Masover

Clifford Heath

ara.t.howard

Jeff Davis

Jeff Davis

ara.t.howard

David Masover

ara.t.howard

Clifford Heath

x Login to ForumsZone