 |
| Interactive mode |
 |
| Short statistics report |
 |
| Detailed statistics report |
 |
| 'Strict' metadata snippet |
 |
| Metadata snippet with ESRI extension elements |
|
| The above output was obtained from a subset of Glynn County, GA structures data produced as part of a FEMA / Heinz Center coastal erosion economic impact study. There are over one hundred attributes in the dataset. Here is the metadata for the complete data set produced before the advent of dbfmeta. |
|
|
|
- Name/Version Reviewed: DBFmeta version 1.7
- Reviewer / Date: Hugh Phillips / June 8, 2002
- Date and version of last known release: November 4, 2004 (version 1.13)
- Other relevant information and tips:
- To perform similar analysis on attribute or tabular data associated with ArcInfo geodatabases, coverages or standalone INFO files, use ArcToolbox or INFODBASE to export the INFO file to dbase format, or use ArcView to load the INFO file and export the loaded table as a .dbf file, then run DBFmeta on the resultant file. The potential limitation on this process include: the field type descriptions reported for the .dbf file may differ from the original ArcInfo field types as a result of the conversion, and attribute field names will be trimmed to the ten character limit allowed for a .dbf file.
- Function:
- A utility to extract attribute metadata from standalone .dbf files or .dbf files associated with ESRI shapefiles.
- Background Information
- Get binaries and source code here
- Platform: UNIX, MS-Windows 95, 98, ME, NT, 2000
- Principal Contact: Peter N. Schweitzer, USGS, pschweitzer@usgs.gov
- Status: Functional, general release
- Metadata Storage Structure: not applicable
- Description:
-
DBFmeta is a standalone metadata utility that performs statistical tests on the fields of .dbf files and creates a statistics report and an mp compatible metadata snippet from its findings. These functions directly address difficult issues in metadata preparation and data quality checking. The tool can be run hands-off or in an interactive mode that prompts the user for values it cannot determine itself. Writing about this tool hardly does it justice; to really appreciate its value you have to run it on a difficult file and observe the results.
When a data set has many attributes and types include enumerated domains with many enumerations, preparing the Detailed_Description section of metadata under Entity_and_Attribute_Information is a laborious task. DBFmeta can be a lifesaver in this situation.
The statistical report produced by DBFmeta is valuable for detecting dirt in attributes, and to provide information that may be useful to improve or modify the metadata snippet produced by the tool. By default, the tool sends an abbreviated statistical report to the console, but a detailed statistical report may also be output as a user option. Either of those statistical reports may be redirected to an output file. Value occurence frequency analysis performed by the tool will identify unwanted duplication of values and subtle unwanted value differences caused by misspelling, bad entry, non-entry, and flags in numeric fields. It also reports the maximum and minimum non-flag values and the number of positive, negative, and zero values in numeric fields.
DBFmeta quickly creates a metadata snippet including each attribute and the Domain Values that it has guessed based on the data type of the attribute and the statistics of the attribute values. The tool is wise enough to include in its metadata snippet output the essential useful metadata elements for a complete Detailed_Description framework that only the metadata creator can know (the value for Attribute_Definition, Enumerated_Domain_Value_Definition, Attribute_Units_of_Measure) without also including the mindless mandatory elements (such as Enumerated_Domain_Value_Definition_Source and Attribute_Definition_Source) in the snippet that few metadata creators would include. By default, the tool also produces metadata elements and values describing the data type of the attribute according to the Arc8 ESRI profile, but as a user option (-strict), this output can be suppressed. The DBFmeta produced snippet may subsequently be pasted (using tkme for example) into a metadata document under the Entity_and_Attribute_Information section.
DBFmeta assigns attributes into only two types, Enumerated Domain and Range Domain. There are no statistical tests that the tool could perform to distinguish between Enumerated, Codeset, and Unrepresentable Domains. Defaulting to Enumerated Domain for character fields is an efficient mechanism because all the values that occur are reported in case they are needed for the metadata. The metadata creator can readily substitute the two compact domain representations for the longer Enumerated Domain when appropriate.
If the attribute is character type, it is assigned as enumerated domain, all unique values are given as alphabetically sorted enumerations in the metadata snippet, and the frequency of each enumeration is reported in the detailed statistics report. If the attribute is a Boolean field it is are reported as string type and enumerated as 'T', 'F' or '?' in the case of true, false, and empty respectively. If the attribute is numeric, the situation can be much more complex - the attribute may be guessed as and reported as range domain only, range domain with a flag enumeration, or enumerated domain. How well DBFmeta guesses is somewhat dependent on the number of records it has to test - the more it has, the better it will guess.
If the attribute type is integer and there isn't a lot of variety in the values, the attribute is assumed to be enumerated domain and the enumerations are listed in the metadata snippet. If there are only one or two negative values, those are assumed to be flag values (like -999), written as eumerations, and the remaining numbers greater than or equal to zero are written as range domain. All other cases of integers are written as pure range domain. Date type fields are analyzed as 8 digit integers.
If the atrribute type is floating point, it will be reported as a pure range domain field and the minimum and maximum values observed will be reported as the Range_Domain_Minimum and Range_Domain_Maximum (strictly speaking this is not in accordance with the definitions for those elements, but it is usually impossible to satisfy the CSDGM definition for those elements, and the actual observed minimum and maximum values have a lot of value). Conversion of ESRI ArcInfo coverages to shapefiles may result in the inclusion of ESRI ArcInfo data internal fields in the .dbf file associated with the shape and field name truncation (to the ten character field name limit of a .dbf file). These internal fields are typically of little value in metadata and are well known to the ESRI GIS user. For those reasons, DBFmeta excludes the shape, fnode_, tnode_, lpoly_, rpoly_, length, area, perimeter, cover_ and cover_id fields from its output metadata snippet. The last two fields are not always identifiable by DBFmeta due to truncation, and additionally other fields may become truncated so they no longer unique. In the former case, DBFmeta assumes the field is an interesting attribute and analyzes it contents; in the latter case it can still complete its analysis, but it advises the user that the field names are not unique.
- Cost: none
- Notable Plus:
- Because the tool operates in command line mode, it is readily incorporated into scripts for batch mode metadata extraction from multiple .dbf files
- Notable Minus: -
- Metadata Exchange
- DBFmeta can output its metadata snippets as indented ASCII text, SGML or XML
- Useability
-
DBFmeta is accompanied by a documentation file that adequately describes the function, usage, and output of the tool. Additionally, when the tool is invoked without parameters it returns usage instructions. It isn't even necessary to include the .dbf extension part of the input filename for DBFmeta to correctly identify the appropriate input file. When invoked and supplied with the path to a valid .dbf file it sends metadata output to an output file with default name. If no output file name is supplied and a file with default name (dbfmeta.out) exists, DBFmeta will state that and ask for a different output file name. If an output filename is supplied and a file by that name already exists, the existing file will be overwritten.
The interactive mode of operation is not recommended, especially for datafiles with many attributes. The user is only prompted for attribute definitions and for units of range domain values. This (thankfully) means the user must use an alternate method to fill in definitions for enumerations. It is impossible to restart an interactive session with DBFmeta once interrupted (by a ^C, say, or a power glitch), one must start over from the beginning with the first attribute.
- Administrative
-
The tool operates on 32bit versions of MS-Windows and is compiled for common UNIX variants. The source code is available for compilation on other platforms. The tool may be downloaded as a standalone executable (23KB) or as part of a suite of metadata tools suite (1.7 MB with documention and source). DBFmeta is a self-contained application that does not depend on any other software components for operation.
To make the tool readily usable from a DOS command window, the administrator need only insure the location of the dbfmeta.exe is indicated by the PATH environment variable or that the tool is launched via a batch file whose location is identified in the PATH. Most administrators would benefit from periodically downloading and installing the complete metadata tool suite to insure the most up to date version of all of the metadata tools developed by Peter Schweitzer.
- Tool Reliability:
-
The tool was faultless in tests when supplied with valid parameters and a valid .dbf file. Its response to bad user input is not always elegant, but some responsibility for correct use must be assigned to the user. The following cases of bad user input and the reaction of DBFmeta were posed and observed:
- When supplied with an input filename for a file that does not have a .dbf extension (even though its content is that of a valid .dbf file), DBFmeta returns is 'Error: could not open input file file_name_without_dbf_ext.'
- When presented with an input file with a .dbf extension that is not actually a valid .dbf file, DBFmeta sends a long stream of WARNING messages to console which (in the case tested) eventually terminated in an error message from DBFmeta announcing it was having trouble allocating space.
- Sample of Interface: Command line driven in a DOS Window
|