Date: Fri, 24 May 1996 16:21:45 -0400 From: "Peter N. Schweitzer" Reply-To: pschweitzer@usgs.gov Subject: Chew and spit version 1.0 To: mp-users@geochange.er.usgs.gov Users of mp and xtme; I have just completed version 1.0 of cns. The source is bundled in with mp and xtme in which is getting to be large (900k), so I've put only sources and docs in which is a much more manageable 300k. This message constitutes the documentation for cns. cns (chew and spit) is a metadata pre-parser designed to assist metadata managers convert records that cannot be parsed by mp into records that can be parsed by mp. It takes as input a poorly-formatted metadata file and, optionally, a list of element aliases, and outputs (1) a metadata file that can be read by mp and (2) a file listing all of the lines that it couldn't figure out where to put. Throwing a switch (-v) causes it to put out comments describing its decision-making process. Full usage is as follows: cns [-v] [-i info_file] [-a aliases] [-e leftovers] [-o output_file] input_file where info_file is where it will put the messages generated as a result of using -v. If -v is used and -i is not, info will go to stdout. aliases is a file relating text strings likely to be found in the metadata with element names from the Standard. This is plain ASCII. Each line begins with an element name as mp expects it to be (underscores included) followed by one or more white spaces followed by an arbitrary string which, when found, will be recognized as representing the element named. leftovers is where it will put strings that it can't place. If no leftovers file is specified, they will go to stderr. output_file should be readable by mp. It will not generally pass mp without error; we are dealing with nonconforming records that are likely to need at least some editing. input_file is an arbitrary text file. This has to be plain ASCII, of course, but need not have any indentation. Features: * Skips over non-alphabetic characters at the beginnings of lines. This means that you can specify the %$*#@ section numbers before your element name; it will ignore the number and find the name. * Fills in missing container elements where only one level is missing. I tried to make it go three levels down but was not happy with the results. You can see what happens by adding -DThreeLevelLookDown in your compiler command line. * Allows you to use spaces in the element names, any letter case, and aliases (for example, you can call Spatial_Domain "Geographic Extent"). * Identifies all actions by line number in the input file. With -v, tells what elements it recognized and what it did with them. * Scalar values that cannot be placed but occur immediately following other scalar values are considered as text and included (like mp). Bugs: Well, any program designed to take bad data and make it "less bad" in several rather subjective respects isn't likely to satisfy every user. However, I don't think it will dump core on you. Suggested uses: If you have a collection of metadata from various sources in various formats with varying degrees of conformance and structure, this tool may well save you some time by finding the structure of the metadata and by substituting the proper element names where the metadata producer has been sloppy. It should work particularly well in cases where a nonparseable template has been used well (meaning that the people who wrote the metadata did it with a template that mp can't read, but they did it consistently and didn't mess up the template's structure). Tech support: You WILL have to edit the output. My hope is that cns will make it easier to convert to mp-readable form than not having cns. Please contact me for explanations of apparent misbehavior. I will need to see the original file. Email me (pschweitzer@usgs.gov). Your comments and experiences are welcome. Technical contact: Peter N. Schweitzer Mail Stop 906, National Center U.S. Geological Survey Reston, VA 22092 Tel: (703) 648-6533 FAX: (703) 648-6647 email: pschweitzer@usgs.gov