Asp Forum - Inverse scanf: finding format specifers of existing fields

Bil Kleb

5/2/2007 10:48:00 AM

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

Thanks,
--
Bil Kleb
http://fun3d.lar...

[1] Legacy formatted-Fortran data files.

16 Answers

Xavier Noria

5/2/2007 11:03:00 AM

On May 2, 2007, at 12:50 PM, Bil Kleb wrote:

> Hi,
>
> I have files full of numbers that I need to twiddle,
> but the format of the numbers cannot change[1], e.g.,
>
> '0.4577' -> '0.7728'
>
> or
>
> '-2.345e-02' -> ' 1.232e-03'

Are there many different formats?

-- fxn

Robert Klemme

5/2/2007 11:39:00 AM

On 02.05.2007 12:47, Bil Kleb wrote:
> I have files full of numbers that I need to twiddle,
> but the format of the numbers cannot change[1], e.g.,
>
> '0.4577' -> '0.7728'
>
> or
>
> '-2.345e-02' -> ' 1.232e-03'
>
> Using scanf for the output seems to be the solution to
> the second half of the problem, but how does one derive
> the format specifier string of the input fields, which vary?

If there is a fixed number of formats you can probably use a cascade of
RX matches. Otherwise it probably becomes a bit more complex like
matching sequences of digits and measuring their lengths.

>> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> #<MatchData:0x7ef61250>
>> pa="%#{md[0].size}.#{md[2].size}f"
=> "%6.4f"
>> pa % 0.4577111
=> "0.4577"

HTH

robert

dblack

5/2/2007 12:13:00 PM

Hi --

On 5/2/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
> Hi,
>
> I have files full of numbers that I need to twiddle,
> but the format of the numbers cannot change[1], e.g.,
>
> '0.4577' -> '0.7728'
>
> or
>
> '-2.345e-02' -> ' 1.232e-03'
>
> Using scanf for the output seems to be the solution to
> the second half of the problem, but how does one derive
> the format specifier string of the input fields, which vary?

You could probably just do a gsub, like this:

require 'scanf'

re = /-?\d+\.\d+(e-\d+)?/

a = "'0.4577' -> '0.7728'"
b = "'-2.345e-02' -> ' 1.232e-03'"

as = a.gsub(re, "%f")
bs = a.gsub(re, "%f")

p a.scanf(as)
p b.scanf(bs)

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

David

--
Upcoming Rails training by Ruby Power and Light:
Four-day Intro to Intermediate
May 8-11, 2007
Edison, NJ
http://www.rubypal.com/event...

Bil Kleb

5/2/2007 12:47:00 PM

Xavier Noria wrote:
>
> Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three "types":
integers, vanilla floats, and exponentials.

Regards,
--
Bil Kleb
http://fun3d.lar...

Bil Kleb

5/2/2007 12:58:00 PM

David A. Black wrote:
> Hi --

Hi.

> Output:
>
> [0.4577, 0.7728]
> [-0.02345, 0.001232]

The second output indicates that I failed to express
my predicament clearly, as the numbers are no longer
in exponential format?

A brief re-cast:

The original file has numbers of the form

5 0.4577 -2.345e-02

Something reads the numbers and spits out new numbers,
but in exactly the same format as the original file, e.g.,

8 0.7728 1.232e-03

I.e., I can't write the last number out as 0.001232 --
it has to be in exponential format with the same field
lengths.

Regards,
--
Bil Kleb
http://fun3d.lar...

Xavier Noria

5/2/2007 12:58:00 PM

On May 2, 2007, at 2:50 PM, Bil Kleb wrote:

> Xavier Noria wrote:
>> Are there many different formats?
>
> Yes, in that the field lengths are different.
>
> No, in that the there are really only three "types":
> integers, vanilla floats, and exponentials.

Then I think you could base the solution on String#index/regexps
depending on the existence of "e" and ".", since we can assume
numbers are well-formed. The idea would be:

if none
%d
elsif "e"
%e
else
%f with computed widths
end

-- fxn

Bil Kleb

5/2/2007 1:08:00 PM

Robert Klemme wrote:
>
> If there is a fixed number of formats you can probably use a cascade of
> RX matches.

Unfortunately not.

> Otherwise it probably becomes a bit more complex like
> matching sequences of digits and measuring their lengths.
>
> >> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
> => #<MatchData:0x7ef61250>
> >> pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn't thought of using MatchData groups, but as you say,
it may get ugly fast... I'm thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

Thanks,
--
Bil Kleb
http://fun3d.lar...

Bil Kleb

5/2/2007 1:10:00 PM

Xavier Noria wrote:
>
> Then I think you could base the solution on String#index/regexps
> depending on the existence of "e" and ".", since we can assume numbers
> are well-formed. The idea would be:
>
> if none
> %d
> elsif "e"
> %e
> else
> %f with computed widths
> end

This, coupled with Robert's computed field lengths
is beginning to look tractable...

Thanks,
--
Bil Kleb
http://fun3d.lar...

Robert Klemme

5/2/2007 1:21:00 PM

On 02.05.2007 15:08, Bil Kleb wrote:
> Robert Klemme wrote:
>>
>> If there is a fixed number of formats you can probably use a cascade
>> of RX matches.
>
> Unfortunately not.
>
>> Otherwise it probably becomes a bit more complex like matching
>> sequences of digits and measuring their lengths.
>>
>> >> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
>> => #<MatchData:0x7ef61250>
>> >> pa="%#{md[0].size}.#{md[2].size}f"
>
> Hmmm, this looks like a viable path.
>
> I hadn't thought of using MatchData groups, but as you say,
> it may get ugly fast... I'm thinking of edge cases like
> dealing with the leading space if positive numbers become
> negative, or accommodating the number of digits needed for
> exponentials or integers if the new number exceeds the
> capacity of the existing format.

For floating point numbers you might even get away with a single regexp
if that is crafted appropriately and group values are evaluated accordingly.

Kind regards

robert

Rick DeNatale

5/2/2007 11:06:00 PM

On 5/2/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
> Hi,
>
> I have files full of numbers that I need to twiddle,
> but the format of the numbers cannot change[1], e.g.,
>
> '0.4577' -> '0.7728'
>
> or
>
> '-2.345e-02' -> ' 1.232e-03'
>
> Using scanf for the output seems to be the solution to
> the second half of the problem, but how does one derive
> the format specifier string of the input fields, which vary?
>
Bill,

How's this for a start? I wrote it leaning towards clarity vs. conciseness.

rick@frodo:/public/rubyscripts$ cat number_format.rb
class String
def to_number_format
m = match(%r{^([ ]*)([+-]?)(.*)$})
leading_blanks, sign, rest = m[1], m[2], m[3]
plus_flag = sign == '+' ? sign : ''
case rest
when %r{^([\d]\.([\d]+)([eE])[+-][\d]+)(.*)$}
# exponentiated float
entirety, frac_part, e_or_E, exponent, suffix = $1, $2, $3, $4, $5
entirety = leading_blanks << entirety
"%#{entirety.length}.#{frac_part.length}#{e_or_E}#{suffix}"
when %r{^([\d]+\.([\d]*))(.*)$}
# simple float
entirety, frac_part, suffix = $1, $2, $3
zero = frac_part.match(/00$/) ? '0' : ''
"%#{zero}#{entirety.length}.#{frac_part.length}f#{suffix}"
when %r{^(0[\d]+)([^e.]*)$}
# zero padded integer
digits, suffix = $1, $2
"#{leading_blanks}%#{plus_flag}0#{digits.length}d#{$suffix}"
when %r{^([\d]+)([^e.]*)$}
# whitespace padded integer
digits, suffix = $1, $2
digits = leading_blanks << digits
"%#{digits.length}d#{suffix}"
else
nil
end
end
end

x = '0.4577'
puts x
puts x.to_number_format
puts x.to_number_format % x.to_f
puts(x.to_number_format % 0.7728)
puts (x.to_number_format % x.to_f) == x
puts

x = '-2.345e-02'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_f)
puts(x.to_number_format % 1.232e-03)
puts (x.to_number_format % x.to_f) == x
puts

x = '12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_f) == x
puts

x = ' 00012345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x
puts

x = ' 12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x

rick@frodo:/public/rubyscripts$ ruby number_format.rb
0.4577
%6.4f
0.4577
0.7728
true

-2.345e-02
%9.3e
-2.345e-02
1.232e-03
true

12345
%5d
12345
765
true

00012345
%08d
00012345
00000765
true

12345
%7d
12345
765
true

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denh...

comp.lang.ruby

Inverse scanf: finding format specifers of existing fields

Bil Kleb

Xavier Noria

Robert Klemme

dblack

Bil Kleb

Bil Kleb

Xavier Noria

Bil Kleb

Bil Kleb

Robert Klemme

Rick DeNatale

x Login to ForumsZone