[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Question of reference and (sub)strings.

Daniel Sheppard

12/15/2005 5:51:00 AM


> Given that I only want to compute the offsets once, an
> obvious solution
> would be to construct an Array of String - each element
> representing a
> sub-string of the original... but this would double memory use. What
> would be the best way to avoid duplicating the character
> sequences and
> causing run-time bloat?

I might be wrong - but I'm pretty sure that substrings in ruby are
created with copy-on-write. That is, when you take a substring, a new
block of memory isn't allocated to the new String, it references the
same block of memory as the original string - the allocation of a new
block of memory only occurs when one of the strings is modified.


#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################


6 Answers

Robert Klemme

12/15/2005 10:22:00 AM

0

Daniel Sheppard wrote:
>> Given that I only want to compute the offsets once, an
>> obvious solution
>> would be to construct an Array of String - each element
>> representing a
>> sub-string of the original... but this would double memory use. What
>> would be the best way to avoid duplicating the character
>> sequences and
>> causing run-time bloat?
>
> I might be wrong

You're not.

> - but I'm pretty sure that substrings in ruby are
> created with copy-on-write. That is, when you take a substring, a new
> block of memory isn't allocated to the new String, it references the
> same block of memory as the original string - the allocation of a new
> block of memory only occurs when one of the strings is modified.

Exactly. It seems this would be the simplest solution.

robert

Steve [RubyTalk]

12/15/2005 3:48:00 PM

0

Robert Klemme wrote:
>> I might be wrong
>>
> You're not.
>
>> - but I'm pretty sure that substrings in ruby are
>> created with copy-on-write. That is, when you take a substring, a new
>> block of memory isn't allocated to the new String, it references the
>> same block of memory as the original string - the allocation of a new
>> block of memory only occurs when one of the strings is modified.
>>
> Exactly. It seems this would be the simplest solution.
>
>
That sounds like absolutely great news - a _very_ pleasant surprise. I
couldn't have hoped for anything better. (Thanks!)

I assume that if I do something like:

# Assume offsets is a pre-computed array of positive integer
positions into the String originalstr.
# with offsets[0]==0 and offsets[-1]==@originalstr.size
@fields=Array.new (offsets.size-1)
for i in 1..(offsets.size) do
# I assume this next line is what is meant by a Ruby sub-string?
@fields[i-1]=@originalstr[offsets[i-1]..offsets[i]]
end

... and, assuming that @fields is exposed only as a read-only attribute,
that I can assume the memory it consumes to be independent of the length
of originalstr and dependent only upon numfields?

While I've no reason to doubt this confirmed answer, by any chance can
someone suggest a good way to demonstrate that this is the case without
resorting to either using very large strings and looking at VM usage of
the interpreter process... or resorting to reviewing the source to
Ruby's implementation?





Robert Klemme

12/15/2005 4:09:00 PM

0

Steve [RubyTalk] wrote:
> Robert Klemme wrote:
>>> I might be wrong
>>>
>> You're not.
>>
>>> - but I'm pretty sure that substrings in ruby are
>>> created with copy-on-write. That is, when you take a substring, a
>>> new block of memory isn't allocated to the new String, it
>>> references the same block of memory as the original string - the
>>> allocation of a new block of memory only occurs when one of the
>>> strings is modified.
>>>
>> Exactly. It seems this would be the simplest solution.
>>
>>
> That sounds like absolutely great news - a _very_ pleasant surprise. I
> couldn't have hoped for anything better. (Thanks!)
>
> I assume that if I do something like:
>
> # Assume offsets is a pre-computed array of positive integer
> positions into the String originalstr.

Care to unveil a bit of the nature of the computation that yields those
indexes?

> # with offsets[0]==0 and offsets[-1]==@originalstr.size
> @fields=Array.new (offsets.size-1)
> for i in 1..(offsets.size) do
> # I assume this next line is what is meant by a Ruby sub-string?
> @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]]
> end
>
> .. and, assuming that @fields is exposed only as a read-only
> attribute, that I can assume the memory it consumes to be independent
> of the length of originalstr and dependent only upon numfields?

You can help keeping this read only be freezing all strings involved.

> While I've no reason to doubt this confirmed answer, by any chance can
> someone suggest a good way to demonstrate that this is the case
> without resorting to either using very large strings and looking at
> VM usage of the interpreter process... or resorting to reviewing the
> source to Ruby's implementation?

The only additional method of verification that comes to mind is to ask
Matz. :-)

Kind regards

robert

Steve [RubyTalk]

12/15/2005 5:51:00 PM

0

Robert Klemme wrote:
>> # Assume offsets is a pre-computed array of positive integer
>> positions into the String originalstr.
>>
> Care to unveil a bit of the nature of the computation that yields those
> indexes?
>
It's not really relevant to the question I was asking - but I've no
problem saying more. I've a domain specific (order-preserving and
extensible) 'type-system' which is imposed over otherwise opaque data
structures. Given an instance of a 'type-signature' and a pointer, it
is possible to determine the number of bytes which represent each 'typed
value' - and (significantly) list construction drops out as being the
concatenation of the value representations and type-signatures. The
type signatures range in complexity from the simplest constant 'N-bytes
interpreted as a natural number' through sentinel encodings (Null
terminated strings on steroids) and (in principle - if not frequently in
practice) arbitrary computation ranging over named integer values
occurring 'earlier' in the list.
At the moment I'm toying with the idea that I can memory-map the values
(using a C-implemented module) and do the computations on the mapped
values in Ruby - having presented opaque values and 'type-signatures' as
String objects to Ruby. I expect that typical computations may involve
matching regular expressions; doing arithmetic; computing various hashes
and summations etc. At the moment I'm concentrating on establishing if
Ruby is a suitable tool for the task at hand.
>> # with offsets[0]==0 and offsets[-1]==@originalstr.size
>> @fields=Array.new (offsets.size-1)
>> for i in 1..(offsets.size) do
>> # I assume this next line is what is meant by a Ruby sub-string?
>> @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]]
>> end
>>
>> .. and, assuming that @fields is exposed only as a read-only
>> attribute, that I can assume the memory it consumes to be independent
>> of the length of originalstr and dependent only upon numfields?
>>
> You can help keeping this read only be freezing all strings involved.
>
Yes - that sounds a good idea to me.
>> While I've no reason to doubt this confirmed answer, by any chance can
>> someone suggest a good way to demonstrate that this is the case
>> without resorting to either using very large strings and looking at
>> VM usage of the interpreter process... or resorting to reviewing the
>> source to Ruby's implementation?
>>
> The only additional method of verification that comes to mind is to ask
> Matz. :-)
>
Hmmm - a lack of profiling tools might prove something of a stumbling
block... I'll need to have a careful think about that. Rather than
wanting to check up on fellow Rubyists, I really want to periodically
check that I make no invalid assumptions as I work forwards from this
basis towards an implementation. I don't want to find out only after I
think I've finished that a resource leak or extravagant resource demands
will require a re-write before the software can be used against real data.

Steve




Ronald E Jeffries

12/15/2005 11:17:00 PM

0

On Thu, 15 Dec 2005 14:50:59 +0900, "Daniel Sheppard" <daniels@pronto.com.au>
wrote:

>I might be wrong - but I'm pretty sure that substrings in ruby are
>created with copy-on-write. That is, when you take a substring, a new
>block of memory isn't allocated to the new String, it references the
>same block of memory as the original string - the allocation of a new
>block of memory only occurs when one of the strings is modified.

I asked about this a week or two ago, as part of my Extended Set Theory thing,
and the impression I got from the answers then was that sub /arrays/ are copy on
write but that sub /strings/ were not.

I wonder how one might confirm this, other than by asking matz or reading the
compiler and libraries ...

--
Ron Jeffries
www.XProgramming.com
I'm giving the best advice I have. You get to decide if it's true for you.

Don Stockbauer

12/27/2009 9:07:00 AM

0

On Dec 27, 2:48 am, George Hammond <Nowhe...@notspam.com> wrote:
> On Sun, 27 Dec 2009 14:34:50 +1300, "Geopelia"
>
>
>
> <phildo...@xtra.co.nz> wrote:
>
> >"George Hammond" <Nowhe...@notspam.com> wrote in message
> >news:4qucj59sj8t5qe58do5ekkqcn27fck20og@4ax.com...
> >> On Sun, 27 Dec 2009 01:15:29 +1300, "Geopelia"
> >> <phildo...@xtra.co.nz> wrote:
>
> >>  How do not-ghosts chill a patch inside a room by 10 °F
> >>>> quickly or flick a compass/magnètometer after someone asks?  Go back
> >>>> and read my other posts, arsehole.
>
> >>>> -Aut
>
> >>>G:  Aut isn't me! Anyway, I don't believe in ghosts.
> >>>Geopelia
>
> >> [Hammond]
> >>  Dear Geopelia,
> >>  Mark L. Ferguson says someone else is posting under your
> >> name; apparently "Autym D.C.":
>
> >(Geopelia)
> >If somebody wants to be me, good luck to her. Would she like to swap bodies
> >too?
> >Sometimes I get a bit tired of being 80, especially when I have to do all
> >the heavy work around the place.
>
> [Hammond]
>    Like I say it gets confusing when 3 or more people are
> quoted in a single post unless people are using initials.
>    "Aut" is posting from google-groups while you are not, so
> simply checking the headers will spot a forger immediately.
> ========================================
> GEORGE  HAMMOND'S PROOF OF GOD WEBSITE
>                       Primary sitehttp://webspace.webring.com/people/eg/geor...
>                       Mirror site
>      http://proof-of-god.freewebsiteh...
>      HAMMOND FOLK SONG by Casey Bennetto
>      http://interrobang.jwgh.org/songs/h...
> =======================================

"Gee, I wish WE had one of them Doomsday Devices!!!!"