February 19th, 2014

Potential Problems with Indexes, Images and Transcripts

The three main ways that you’ll find data represented on the internet are through images, indexes and transcriptions. Images are usually scanned original documents, a transcription will hold the full textual content of the document in a file, while an index will contain a list of names with or without additional details regarding its members, and may direct you to where the original document is held.

An online index would ideally lead you to a full transcription of the original document you seek, which could then be compared against a scanned image of the original. But we don’t live in a perfect world, and in spite of the great technological progress we’ve made, digital images take up a lot of disk space and bandwidth, and so are rarely found online. It is simply more financially feasible to provide text only data, for both free websites run by volunteers, and commercial sites. Images can, and are supplied economically for census returns however, as there is a massive demand for them, which in turn covers the cost of providing them.

Believe it or not, as rare as images of original documents are on the internet, transcriptions are even harder to find. It takes much time, and major effort to prepare a transcription, and for many documents it is not really required. Indexes that are connected or linked to images are therefore the major source of most genealogical data to be found on the internet. Keep in mind, especially if you’re a beginner, that you can’t accomplish all your genealogical research on the internet. It is a valuable tool, but everything you find will eventually have to be authenticated by checking it with the original source. Let’s take a look at some of the potential problems you might experience with indexes.

Potential Index Related Problems

The majority of indexes on the internet today are made available through the valuable contributions of time and resources by other genealogists. Often working from micro films or digitized versions of original documents, it is understandable that information is sometimes transcribed inaccurately. Ideally indexes would be created by professional palaeographers, with an excellent knowledge of place names and surnames, but this is not true even on large scale projects, whose data is usually transcribed an input by clerical workers.

Data validation is also an area of concern when browsing indexes on the internet. Of course it is much easier to check the authenticity of twenty first or twentieth century records than that of those of previous centuries, so unfortunately validation has been overlooked by some projects. The “the let’s don’t and say we did “attitude has led to the critical failure of some indexes, and consequently many genealogist’s family histories. Unfortunately many spelling errors and typos are made when transcribing, and unfortunately these mistakes end up being published. I have come across surnames with numbers in the middle of them, misspelled surnames, and even gender misidentification; one website had over one thousand female Johns in its index.

These are some of the things you’ll want to look out for when using an online index, and the reason that online information should always be verified via comparison to the original documentation. On a positive note, the greatest value of an index is that it can tell you where to find that original document.