[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: [QUIZ] GEDCOM Parser (#6

Jamis Buck

11/5/2004 4:58:00 PM

Dave Burt wrote:

> There's a GEDCOM spec at:
> http://homepages.rootsweb.com/~pmcbride/gedcom/5...
>
> It's a bit of a mess, somewhat contradictory...

Indeed. :) That's why I didn't point it out. If you're up to othe
challenge, you are of course welcome to write something that actually
converts a GEDCOM's semantic content into XML, instead of just syntactic
like I recommended. It's a bigger job, but certainly more useful.

>
> Anyway, XML.
>
> For the above, I think I prefer the following to Jamis' example XML:
> <name>Jamis Gordon /Buck/
> <surn>Buck</surn>
> ....

I considered that, too. It just didn't "look" appealing to me. Personal
preference. But then, I'm not a big fan of XML in general. :)

>
> Interestingly, the sample GED file given doesn't have any names like this,
> nor even any SURN elements.

Interesting. Guess I didn't stop to verify that. :) For what it's worth,
I was use the GED file for my own family as a reference when I wrote the
quiz, and my GED file was emitted by Personal Ancestral File (PAF). It
breaks all the names up as I described, with a NAME value, and SURN/GIVN
name values.

>
> "Those using the optional name pieces should assume that few systems will
> process them, and most will not provide the name pieces. "
> http://homepages.rootsweb.com/~pmcbride/gedcom/55gcch2.htm#PERSONAL_NAME...

Fascinating. :) Especially since PAF is one of the most widely used
programs around... In my experience, the optional name pieces are often
emitted by programs, but whether they are used when a GED is imported or
not, I can't say. Probably not.

> It does have:
> 1 NOTE Line 1
> 2 CONT Line 2
> 2 CONT Lin
> 2 CONC e 3
> 2 CONT Line
> 2 CONC 4
>
> and:
> 1 SOUR @S1@
> 2 PAGE 1
> ....
> 0 @S1@ SOUR
> 1 TEXT Hello
>
> I think these are the two interesting cases converting to XML.
> The CONTinuation tag represents just a continuation of the data in the
> parent element, as lines are of limited length; it has no semantic value. I
> think these tags need to be understood by a GEDCOM->XML parser.

In a "real" convertor, they definiately would be. I didn't want to
describe ad nauseum the available tags and what their semantic meanings
are, though. As I said such a parser would be more generally useful than
what the quiz calls for, so if you are up to the additional research,
it's not really that much more coding.

> The second of the two fragments shows a tag (SOURce) with a value (the link
> to xref-id @S1@) as well as a sub-tree. Same thing as Jamis' NAME example,
> also common elsewhere in the spec. The use of the id attribute for ids is
> obvious, but I'm not sure the value attribute is ideal, especially
> considering that the spec states that source description (the value of the
> SOURce tag) may be continued with CONT or CONC, thus may be multi-line.
>
> Thus:
> <sour>@S1@
> <page>1</page>
> </sour>
> ....
> <sour id="@S1@">
> <text>Hello</text>
> </sour>

Interesting point. It's been several years since I delved into the
GEDCOM spec, so I've forgotten quite a bit. As you said, though, it
appears to be contradictory in places, and what is worse, each program
that exports/imports GED files each has their own proprietary flavor of
GEDCOM with extension tags and so forth. It can be quite an adventure to
write a GEDCOM parser that can handle, intelligently, the different
variants.

- Jamis

--
Jamis Buck
jgb3@email.byu.edu
http://www.jamisbuck...