William James
11/29/2015 7:42:00 AM
WJ wrote:
> Hrvoje Niksic wrote:
>
> > Here is an interesting, not entirely academic problem that me and a
> > colleague are "wrestling" with. Say there is a file, containing
> > entries like this:
> >
> > foo 5
> > bar 20
> > baz 4
> > foo 6
> > foobar 23
> > foobar 3
> > ...
> >
> > There are a lot of lines in the file (~10000), but many of the words
> > repeat (there are ~500 unique words). We have endeavored to write a
> > program that would sum the occurences of each word, and display them
I think he means: sum the numbers associated with the words.
> > sorted alphabetically, e.g.:
> >
> > bar 20
> > baz 4
> > foo 11
> > foobar 26
> > ...
Ocaml:
#load "str.cma";;
let table = Hashtbl.create 999 ;;
let chan = open_in "data.txt" in
try
while true do
let [word; numstr] = Str.split (Str.regexp " +") (input_line chan) in
let num = int_of_string numstr in
try Hashtbl.replace table word (num + (Hashtbl.find table word))
with Not_found -> Hashtbl.add table word num
done
with End_of_file -> close_in chan ;;
Hashtbl.fold (fun k v acc -> (k,v)::acc) table []
|> List.sort compare
|> List.iter (fun (word,n) -> Printf.printf "%s %d\n" word n) ;;
bar 20
baz 4
foo 11
foobar 26
--
Elie [Wiesel] thus could have remained at Birkenau to await the Russians.
Although his father had permission to stay with him as a hospital patient or
orderly, father and son talked it over and decided to move out with the
Germans. --- Robert Faurisson