Asp Forum - NoSQL Movement? - comp.lang.python

Xah Lee

3/3/2010 5:36:00 PM

recently i wrote a blog article on The NoSQL Movement
at http://x...comp/...

i'd like to post it somewhere public to solicit opinions, but in the
20 min or so, i couldn't find a proper newsgroup, nor private list
that my somewhat anti-NoSQL Movement article is fitting.

So, i thought i'd post here to solicit some opinins from the programer
community i know.

Here's the plain text version

-----------------------------
The NoSQL Movement

Xah Lee, 2010-01-26

In the past few years, there's new fashionable thinking about anti
relational database, now blessed with a rhyming term: NoSQL.
Basically, it considers that relational database is outdated, and not
“horizontally” scalable. I'm quite dubious of these claims.

According to Wikipedia Scalability article, verticle scalability means
adding more resource to a single node, such as more cpu, memory. (You
can easily do this by running your db server on a more powerful
machine.), and “Horizontal scalability” means adding more machines.
(and indeed, this is not simple with sql databases, but again, it is
the same situation with any software, not just database. To add more
machines to run one single software, the software must have some sort
of grid computing infrastructure built-in. This is not a problem of
the software per se, it is just the way things are. It is not a
problem of databases.)

I'm quite old fashioned when it comes to computer technology. In order
to convience me of some revolutionary new-fangled technology, i must
see improvement based on math foundation. I am a expert of SQL, and
believe that relational database is pretty much the gist of database
with respect to math. Sure, a tight definition of relations of your
data may not be necessary for many applications that simply just need
store and retrieve and modify data without much concern about the
relations of them. But still, that's what relational database
technology do too. You just don't worry about normalizing when you
design your table schema.

The NoSQL movement is really about scaling movement, about adding more
machines, about some so-called “cloud computing” and services with
simple interfaces. (like so many fashionable movements in the
computing industry, often they are not well defined.) It is not really
about anti relation designs in your data. It's more about adding
features for practical need such as providing easy-to-user APIs (so
you users don't have to know SQL or Schemas), ability to add more
nodes, provide commercial interface services to your database, provide
parallel systems that access your data. Of course, these needs are all
done by any big old relational database companies such as Oracle over
the years as they constantly adopt the changing industry's needs and
cheaper computing power. If you need any relations in your data, you
can't escape relational database model. That is just the cold truth of
math.

Importat data, such as used in the bank transactions, has relations.
You have to have tight relational definitions and assurance of data
integrity.

Here's a second hand quote from Microsoft's Technical Fellow David
Campbell. Source

I've been doing this database stuff for over 20 years and I
remember hearing that the object databases were going to wipe out
the SQL databases. And then a little less than 10 years ago the
XML databases were going to wipe out.... We actually ... you
know... people inside Microsoft, [have said] 'let's stop working
on SQL Server, let's go build a native XML store because in five
years it's all going....'

LOL. That's exactly my thought.

Though, i'd have to have some hands on experience with one of those
new database services to see what it's all about.

--------------------
Amazon S3 and Dynamo

Look at Structured storage. That seems to be what these nosql
databases are. Most are just a key-value pair structure, or just
storage of documents with no relations. I don't see how this differ
from a sql database using one single table as schema.

Amazon's Amazon S3 is another storage service, which uses Amazon's
Dynamo (storage system), indicated by Wikipedia to be one of those
NoSQL db. Looking at the S3 and Dynamo articles, it appears the db is
just a Distributed hash table system, with added http access
interface. So, basically, little or no relations. Again, i don't see
how this is different from, say, MySQL with one single table of 2
columns, added with distributed infrastructure. (distributed database
is often a integrated feature of commercial dbs, e.g. Wikipedia Oracle
database article cites Oracle Real Application Clusters )

Here's a interesting quote on S3:

Bucket names and keys are chosen so that objects are addressable
using HTTP URLs:

* http://s3.amazonaws.com/...
* http://bucket.s3.amazona...
* http://bucket/key (where bucket is a DNS CNAME record
pointing to bucket.s3.amazonaws.com)

Because objects are accessible by unmodified HTTP clients, S3 can
be used to replace significant existing (static) web hosting
infrastructure.

So this means, for example, i can store all my images in S3, and in my
html document, the inline images are just normal img tags with normal
urls. This applies to any other type of file, pdf, audio, but html
too. So, S3 becomes the web host server as well as the file system.

Here's Amazon's instruction on how to use it as image server. Seems
quite simple: How to use Amazon S3 for hosting web pages and media
files? Source

--------------------
Google BigTable

Another is Google's BigTable. I can't make much comment. To make a
sensible comment, one must have some experience of actually
implementing a database. For example, a file system is a sort of
database. If i created a scheme that allows me to access my data as
files in NTFS that are distributed over hundreds of PC, communicated
thru http running Apache. This will let me access my files. To insert,
delete, data, one can have cgi scripts on each machine. Would this be
considered as a new fantastic NoNoSQL?

---------------------

comments can also be posted to
http://xahlee.blogspot.com/2010/01/nosql-mov...

Thanks.

Xah
? http://x...

?

40 Answers

MRAB

3/3/2010 7:09:00 PM

Xah Lee wrote:
> recently i wrote a blog article on The NoSQL Movement
> at http://xahlee.org/comp/...
>
> i'd like to post it somewhere public to solicit opinions, but in the
> 20 min or so, i couldn't find a proper newsgroup, nor private list
> that my somewhat anti-NoSQL Movement article is fitting.
>
> So, i thought i'd post here to solicit some opinins from the programer
> community i know.
>
[snip]
Couldn't find a relevant newsgroup, so decided to inflict it on a number
of others...

ccc31807

3/3/2010 8:55:00 PM

On Mar 3, 12:36 pm, Xah Lee <xah...@gmail.com> wrote:
> recently i wrote a blog article on The NoSQL Movement
> athttp://xahlee.org/comp/...
>
> i'd like to post it somewhere public to solicit opinions, but in the
> 20 min or so, i couldn't find a proper newsgroup, nor private list
> that my somewhat anti-NoSQL Movement article is fitting.

I only read the first two paragraphs of your article, so I can't
respond to it.

I've halfway followed the NoSQL movement. My day job is a database
manager and I so SQL databases for a living, as well as Perl. I see a
lot of abuse of relational databases in the Real World, as well as a
lot of abuse for non-SQL alternatives, e.g., (mis)using Excel for a
database. The big, enterprise database we have at work is built on IBM
UniQuery, which is a non-SQL flat file database product, so I've had a
lot of experience with big non-SQL database work.

I've also developed a marked preference for plain text databases. For
a lot of applications they are simpler, easier, and better. I've also
had some experience with XML databases, and find that they are ideal
for applications with 'ragged' data.

As with anything else, you need to match the tool to the job. Yes, I
feel that relational database technology has been much used, and much
abused. However, one of my favorite applications is Postgres, and I
think it's absolutely unbeatable where you have to store data and
perform a large number of queries.

Finally, with regard to Structured Query Language itself, I find that
it's well suited to its purpose. I hand write a lot of SQL statements
for various purposes, and while like any language you find it
exceedingly difficult to express concepts that you can think, it
mostly allows the expression of most of what you want to say.

CC.

toby

3/3/2010 9:56:00 PM

On Mar 3, 3:54 pm, ccc31807 <carte...@gmail.com> wrote:
> On Mar 3, 12:36 pm, Xah Lee <xah...@gmail.com> wrote:
>
> > recently i wrote a blog article on The NoSQL Movement
> > athttp://xahlee.org/comp/...
>
> > i'd like to post it somewhere public to solicit opinions, but in the
> > 20 min or so, i couldn't find a proper newsgroup, nor private list
> > that my somewhat anti-NoSQL Movement article is fitting.
>
> I only read the first two paragraphs of your article, so I can't
> respond to it.
>
> I've halfway followed the NoSQL movement. My day job is a database
> manager and I so SQL databases for a living, as well as Perl. I see a
> lot of abuse of relational databases in the Real World, as well as a
> lot of abuse for non-SQL alternatives, e.g., (mis)using Excel for a
> database. The big, enterprise database we have at work is built on IBM
> UniQuery, which is a non-SQL flat file database product, so I've had a
> lot of experience with big non-SQL database work.
>
> I've also developed a marked preference for plain text databases. For
> a lot of applications they are simpler, easier, and better. I've also
> had some experience with XML databases, and find that they are ideal
> for applications with 'ragged' data.
>
> As with anything else, you need to match the tool to the job. Yes, I
> feel that relational database technology has been much used, and much
> abused. However, one of my favorite applications is Postgres, and I
> think it's absolutely unbeatable

It is beatable outside of its sweetspot, like any system. NoSQL is not
so much about "beating" relational databases, as simply a blanket term
for useful non-relational technologies. There's not much point in
reading Xah beyond the heading of his manifesto, as it is no more
relevant to be "anti-NoSQL" as to be "anti-integers" because they
don't store fractions.

> where you have to store data and

"relational data"

> perform a large number of queries.

Why does the number matter?

>
> Finally, with regard to Structured Query Language itself, I find that
> it's well suited to its purpose. I hand write a lot of SQL statements
> for various purposes, and while like any language you find it
> exceedingly difficult to express concepts that you can think, it
> mostly allows the expression of most of what you want to say.
>
> CC.

Alex Mizrahi

3/3/2010 10:12:00 PM

XL> recently i wrote a blog article on The NoSQL Movement
XL> at http://xahlee.org/comp/...

What is your experience with SQL/NoSQL?

Note that NoSQL is mostly about scalability, that is, dealing with large
data sets, lots of queries per seconds.
What is your experience in this area?

Jonathan Gardner

3/3/2010 10:23:00 PM

On Wed, Mar 3, 2010 at 12:54 PM, ccc31807 <cartercc@gmail.com> wrote:
>
> As with anything else, you need to match the tool to the job. Yes, I
> feel that relational database technology has been much used, and much
> abused. However, one of my favorite applications is Postgres, and I
> think it's absolutely unbeatable where you have to store data and
> perform a large number of queries.
>

Let me elaborate on this point for those who haven't experienced this
for themselves.

When you are starting a new project and you don't have a definitive
picture of what the data is going to look like or how it is going to
be queried, SQL databases (like PostgreSQL) will help you quickly
formalize and understand what your data needs to do. In this role,
these databases are invaluable. I can see no comparable tool in the
wild, especially not OODBMS.

As you grow in scale, you may eventually reach a point where the
database can't keep up with you. Either you need to partition the data
across machines or you need more specialized and optimized query
plans. When you reach that point, there are a number of options that
don't include an SQL database. I would expect your project to move
those parts of the data away from an SQL database and towards a more
specific solution.

I see it as a sign of maturity with sufficiently scaled software that
they no longer use an SQL database to manage their data. At some point
in the project's lifetime, the data is understood well enough that the
general nature of the SQL database is unnecessary.

--
Jonathan Gardner
jgardner@jonathangardner.net

jt

3/3/2010 10:42:00 PM

Jonathan Gardner wrote:

>
> I see it as a sign of maturity with sufficiently scaled software that
> they no longer use an SQL database to manage their data. At some point
> in the project's lifetime, the data is understood well enough that the
> general nature of the SQL database is unnecessary.
>

I am really struggling to understand this concept.

Is it the normalised table structure that is in question or the query
language?

Could you give some sort of example of where SQL would not be the way to
go. The only things I can think of a simple flat file databases.

Philip Semanchuk

3/4/2010 2:53:00 AM

On Mar 3, 2010, at 5:41 PM, Avid Fan wrote:

> Jonathan Gardner wrote:
>
>> I see it as a sign of maturity with sufficiently scaled software that
>> they no longer use an SQL database to manage their data. At some
>> point
>> in the project's lifetime, the data is understood well enough that
>> the
>> general nature of the SQL database is unnecessary.
>
> I am really struggling to understand this concept.
>
> Is it the normalised table structure that is in question or the
> query language?
>
> Could you give some sort of example of where SQL would not be the
> way to go. The only things I can think of a simple flat file
> databases.

Well, Zope is backed by an object database rather than a relational one.

Jack Diederich

3/4/2010 3:43:00 AM

On Wed, Mar 3, 2010 at 12:36 PM, Xah Lee <xahlee@gmail.com> wrote:
[snip]

Xah Lee is a longstanding usenet troll. Don't feed the trolls.

mk

3/4/2010 12:02:00 PM

Jonathan Gardner wrote:

> When you are starting a new project and you don't have a definitive
> picture of what the data is going to look like or how it is going to
> be queried, SQL databases (like PostgreSQL) will help you quickly
> formalize and understand what your data needs to do. In this role,
> these databases are invaluable. I can see no comparable tool in the
> wild, especially not OODBMS.

FWIW, I talked to my promoting professor about the subject, and he
claimed that there's quite a number of papers on OODBMS that point to
fundamental problems with constructing capable query languages for
OODBMS. Sadly, I have not had time to get & read those sources.

Regards,
mk

Duncan Booth

3/4/2010 1:11:00 PM

Avid Fan <me@privacy.net> wrote:

> Jonathan Gardner wrote:
>
>>
>> I see it as a sign of maturity with sufficiently scaled software that
>> they no longer use an SQL database to manage their data. At some
point
>> in the project's lifetime, the data is understood well enough that
the
>> general nature of the SQL database is unnecessary.
>>
>
> I am really struggling to understand this concept.
>
> Is it the normalised table structure that is in question or the query
> language?
>
> Could you give some sort of example of where SQL would not be the way
to
> go. The only things I can think of a simple flat file databases.

Probably one of the best known large non-sql databases is Google's
bigtable. Xah Lee of course dismissed this as he decided to write how
bad non-sql databases are without actually looking at the prime example.

If you look at some of the uses of bigtable you may begin to understand
the tradeoffs that are made with sql. When you use bigtable you have
records with fields, and you have indices, but there are limitations on
the kinds of queries you can perform: in particular you cannot do joins,
but more subtly there is no guarantee that the index is up to date (so
you might miss recent updates or even get data back from a query when
the data no longer matches the query).

By sacrificing some of SQL's power, Google get big benefits: namely
updating data is a much more localised option. Instead of an update
having to lock the indices while they are updated, updates to different
records can happen simultaneously possibly on servers on the opposite
sides of the world. You can have many, many servers all using the same
data although they may not have identical or completely consistent views
of that data.

Bigtable impacts on how you store the data: for example you need to
avoid reducing data to normal form (no joins!), its much better and
cheaper just to store all the data you need directly in each record.
Also aggregate values need to be at least partly pre-computed and stored
in the database.

Boiling this down to a concrete example, imagine you wanted to implement
a system like twitter. Think carefully about how you'd handle a
sufficiently high rate of new tweets reliably with a sql database. Now
think how you'd do the same thing with bigtable: most tweets don't
interact, so it becomes much easier to see how the load is spread across
the servers: each user has the data relevant to them stored near the
server they are using and index changes propagate gradually to the rest
of the system.

--
Duncan Booth http://kupuguy.bl...

comp.lang.python

NoSQL Movement?