[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.lisp

Re: massive data analysis with lisp

William James

4/13/2015 11:39:00 PM

JShrager wrote:

> (defun nfloaddirdata (file o dates?)
> (with-open-file
> (i file)
> (let ((headline (read-line i)))
> (format o "(~a~%" (subseq headline 0 (1- (length headline)))))
> (loop as line = (read-line i nil nil)
> until (null line)
> as j from 1 by 1
> do (when (zerop (mod j 5)) (format o "~%"))
> ;;Since everything's in the SAME FORMAT I can use very
> specific
> ;; parsing instead of having to split the line. This could be
> made even
> ;; more efficient by re-using string arrays, and prob. other
> techniques.
> (let* ((comma-pos-1 (position #\, line))
> (comma-pos-2 (position #\, line :start (1+
> comma-pos-1))))
> (if dates?
> (format o " (~a ~a ~s) "
> (subseq line 0 comma-pos-1)
> (subseq line (1+ comma-pos-1) comma-pos-2)
> (subseq line (1+ comma-pos-2)))
> ;; Use dots to save cons cells if we don't need the date.
> (format o " (~a . ~a) "
> (subseq line 0 comma-pos-1)
> (subseq line (1+ comma-pos-1) comma-pos-2)
> )
> )))
> (format o "~% )~%")))

Gauche Scheme:

(use srfi-13) ; string-drop-right
(use srfi-42) ; do-ec

(define (load-data file oport dates?)
(call-with-input-file file
(lambda (iport)
(format oport "(~a~%" (string-drop-right (read-line iport) 1))
(do-ec (:parallel (:port line iport read-line)
(:list j (lrange 1)))
(begin
(when (zero? (mod j 5)) (newline oport))
(let1 fields (string-split line #\,)
(if dates?
(apply format oport " (~a ~a ~s) " fields)
(apply format oport " (~a . ~a) " (take fields 2))))))))
(format oport "~% )~%"))
2 Answers

taruss

4/14/2015 11:02:00 PM

0

On Monday, April 13, 2015 at 4:39:53 PM UTC-7, WJ wrote:
> JShrager wrote:
>
> > (defun nfloaddirdata (file o dates?)
> > (with-open-file
> > (i file)
> > (let ((headline (read-line i)))
> > (format o "(~a~%" (subseq headline 0 (1- (length headline)))))
> > (loop as line = (read-line i nil nil)
> > until (null line)
> > as j from 1 by 1
> > do (when (zerop (mod j 5)) (format o "~%"))
> > ;; Since everything's in the SAME FORMAT I can use very specific
> > ;; parsing instead of having to split the line. This could be made even
> > ;; more efficient by re-using string arrays, and prob. other techniques.

One such technique would be to avoid creating new strings below by using SUBSEQ and format. Instead one could use WRITE-STRING with optional arguments, although it would be more verbose.


> > (let* ((comma-pos-1 (position #\, line))
> > (comma-pos-2 (position #\, line :start (1+
> > comma-pos-1))))
> > (if dates?
> > (format o " (~a ~a ~s) "
> > (subseq line 0 comma-pos-1)
> > (subseq line (1+ comma-pos-1) comma-pos-2)
> > (subseq line (1+ comma-pos-2)))

For example:
(write-string o " (")
(write-string line 0 :end comma-pos-1)
(write-string o " ")
(write-string line 0 :start (1+ comma-pos-1) :end comma-pos-2)
(write-string o " ")
(write-string line 0 :start (1+ comma-pos-2))
(write-string o ") ")

> > ;; Use dots to save cons cells if we don't need the date.
> > (format o " (~a . ~a) "
> > (subseq line 0 comma-pos-1)
> > (subseq line (1+ comma-pos-1) comma-pos-2)
> > )
> > )))
> > (format o "~% )~%")))
>
> Gauche Scheme:
>
> (use srfi-13) ; string-drop-right
> (use srfi-42) ; do-ec
>
> (define (load-data file oport dates?)
> (call-with-input-file file
> (lambda (iport)
> (format oport "(~a~%" (string-drop-right (read-line iport) 1))
> (do-ec (:parallel (:port line iport read-line)
> (:list j (lrange 1)))
> (begin
> (when (zero? (mod j 5)) (newline oport))
> (let1 fields (string-split line #\,)
^^^^^^^^^^^^^^^^^^^^^^

Fails the requirement to not split the line, presumably out of a desire
for a more efficient parsing usage.

> (if dates?
> (apply format oport " (~a ~a ~s) " fields)
> (apply format oport " (~a . ~a) " (take fields 2))))))))
> (format oport "~% )~%"))

Madhu

4/15/2015 1:37:00 AM

0

* taruss@google.com <7bd724cb-a298-4033-a97a-d053980ee157@googlegroups.com> :
Wrote on Tue, 14 Apr 2015 16:01:50 -0700 (PDT):

|> (let1 fields (string-split line #\,)
| ^^^^^^^^^^^^^^^^^^^^^^
| Fails the requirement to not split the line, presumably out of a desire
| for a more efficient parsing usage.

Which is more expensive? Consing small strings which will be collected
with the generation, or a funcall?

I've settled on the following STRING-SPLIT API when I do not need to
cons up new strings, and can work with indices for non-empty words, I
pass a MAP-FN which gets the BEGIN and END indices into string. [One
question was, would it be useful to pass a position argument to MAP-FN,
i.e. the nth MAP-FN call for a particular line, but in general it
wasn't. Please post any bugs in the code]


(defun string-split (charbag line &key (start 0) (end (length line)) map-fn)
(let ((p start) (q 0) ret)
(declare (fixnum p q))
(loop
;; skip whitespace at left
(loop (cond ((or (= p end) (not (find (elt line p) charbag)))
(return))
(t (incf p))))
;; find word end
(setq q p)
(loop (cond ((and (< q end) (not (find (elt line q) charbag)))
(incf q))
(t (return))))
(cond ((< p q)
(cond (map-fn (funcall map-fn line p q))
(t (push (subseq line p q) ret)))
(setq p q))
(t (return (nreverse ret)))))))

---Madhu