We have been running DBText for Libraries for about seven years. During that time a lot of garbage has accumulated in the database. There are three areas of concern. I will be covering them and my solutions later.
Whenever we do a mailing, we manage to find and fix problems in the borrowers file. We run the names and addresses through MyMailingList for each mailing. That adds the ZIP+4 to the ZIP codes and checks the addresses for validity. You would think that it would be easy to export the Borrowers file in a comma delimited format and import that to Koha. However there are a number of problems.
- Embedded CR/LFs: When an address has multiple lines such as subdivision and street address, the staff has entered a carriage return. These totally confuse any program trying to read the comma delimited file.
- Blank names: Apparently someone started to enter a new user and accidently hit save before entering any data. This happens often enough to cause some problems.
- Missing fields: We will list and drop any borrowers that do not have a street address or city. State defaults to Florida and we can fix the ZIP code using MyMailingList.
- Koha has a separate field for street number and street name.
- We have a number of types of entries in the borrowers file such as businesses, publicity contacts or potential donors. These have to be coded for the patron category.
- DBText has a facility for multiple entries in a field. It exports them separated by a pipe(|) symbol.
The catalog has many of the same problems as the borrowers file; embedded CR/LFs, blank entries or missing fields. These records should be converted to MARC records for import into Koha. Other problems include:
- Originally the media type field was user entry. That gave rise to “Book”, “book”, ” Book” and any other way you could mis-enter the word. DBText recognizes them all as a book, but exports them as is.
- Title could be missing. If there is no title or author, the record is dropped.
- Call numbers started out using Library of Congress call numbers. We are in the process of recoding all the fiction as FIC plus the first three letters of the author’s name. Some of the LoC call numbers have been simplified in recent years.
- Early cataloging was wildly inconsistent. In recent years we have had the help of professionals to do the cataloging.
- As a specialized library, our subject headings are more detailed than in general.
Serials and Archives
This should be relatively straight forward once I understand what Koha wants. We generally have one record for each serial or archive box. For serials, the holdings field is a list of all the issues and is updated as new ones arrive. The archive records basically describe the contents of a box.