Dennis Lee Bieber
1/12/2008 9:07:00 PM
On Sat, 12 Jan 2008 14:04:42 -0600, Landon <projecteclipsor@gmail.com>
declaimed the following in comp.lang.python:
> would take collapse all consecutive whitespace in a document into one
> space. I could just use the projects from K&R, but I imagine a Python
import sys
for ln in sys.stdin:
sys.stdout.write(" ".join(ln.split())) #if split() keeps the \n
# sys.stdout.write(" ".join(ln.split()) + "\n") #if \n is trimmed
You need a book to show this as a significant example? (NOTE: I'm
typing these off the top of my head, and haven't actually run them,
hence the option above)
"wc" is probably similar -- especially if you don't mind loading the
entire thing at one go...
import sys
print len(sys.stdin.read().split())
Line-by-line would be
import sys
sm = 0
for ln in sys.stdin:
sm += len(ln.split())
print sm
(I'm presuming wc just gives a total of the number of words in the
file... If you want a list of occurrences of each word, then it gets a
bit trickier... do you include punctuation, and case differences?)
Those utilities listed in K&R required actual manipulation of the
data, manual comparisons of characters and logic to determine if one
were in or out of a word, etc. All this work is handled by the Python
library, making the utilities trivial to code -- and probably not worth
coding the above: the time to load the Python interpreter and byte-code
compile the program could be a significant part of the run-time.
Word frequency, first cut (keeping case and punctuation):
import sys
wf = {}
for ln in sys.stdin:
for wd in ln.split():
wf[wd] = wf.get(wd, 0) + 1
kys = wf.keys()
kys.sort()
for ky in kys:
print "%40s : %10s" % (ky, wf[ky])
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com wulfraed@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-asst@bestiaria.com)
HTTP://www.bestiaria.com/