Statistical Policy Working Paper 19 - Computer Assisted Survey Information Collection


 

 

 

 

                  MEMBERS OF THE FEDERAL COMMITTEE ON

                        STATISTICAL METHODOLOGY

 

                             (April 1990)

 

                       Maria E. Gonzalez (Chair)

                    Office of Management and Budget

 

 

Yvonne M. Bishop              Daniel Kasprzyk

Energy.Information            Bureau of the Census

  Administration

                              Daniel Melnick

Warren L. Buckler             National Science Foundation

Social Security Administration

                              Robert P. Parker

Charles E. Caudill            Bureau of Economic Analysis

National Agricultural

  Statistical Service         David A. Pierce

                              Federal Reserve Board

John E. Cremeans

Office of Business Analysis   Thomas J. Plewes 

                              Bureau of Labor Statistics

Zahava D. Doering

Smithsonian Institution       Wesley L. Schaible

                              Bureau of Labor Statistics

Joseph K. Garrett

Bureau of the Census          Fritz J. Scheuren

                              Internal Revenue Service

Robert M. Groves

Bureau of the Census          Monroe G. Sirken

                              National Center for Health

C. Terry Ireland                Statistics

National Computer Security

Center                        Robert D. Tortora

                              Bureau of the Census

Charles D. Jones

Bureau of the Census

 

 

 

 

 

                                PREFACE

 

 

The Federal Committee on Statistical Methodology was organized by the

office of Management and Budget (OMB) in 1975 to investigate

methodological issues in Federal statistics.  Members of the

committee, selected by OMB on the basis of their individual expertise

and interest in statistical methods, serve in their personal capacity


rather than as agency representatives.  The committee conducts its

work through subcommittees that are organized to study particular

issues and that are open to any Federal employee who wishes to

participate in the studies.  Statistical Policy Working Papers are

prepared by the subcommittee members and reflect only their individual

and collective ideas.

 

The Subcommittee on Computer Assisted Survey information Collection

investigated the use of computers in collecting survey information. 

This report covert the different ways in which small computers can be

used to improve data collection.- For example, the report describes

computer assisted telephone interviewing (CATI), computer assisted

personal interviewing (CAPI), data collection using touchtone

telephones, and voice recognition.  More than most working papers the

relevance of the information in this report will age very quickly.

 

Various methodological issues are also addressed in this report.  For

example, issues discussed include human-machine interfaces, software

development, hardware planning, and computer security.

 

The Subcommittee on Computer Assisted Survey Information Collection

was chaired by Terry Ireland of the National Computer Security Center,

Department of Defense.

 

                                   i

 

 

                      CASIC Subcommittee Members

 

 

C. Terrence Ireland, Chair

National Computer Security Center

(Defense)

 

Thomas Anastasio

National Computer Security Center 

(Defense)

 

Martin Baum

National Center for Health Statistics 

(Health and Human Services)

 

William Blackmore

Energy Information Administration

(Energy)

 

Richard Clayton

Bureau of Labor Statistics (Labor)

 

Ann Ducca

Energy Information Administration

(Energy)

 

Ralph Gillman

Energy Information Administration

(Energy)

 

Maria E. Gonzalez, Ex officio 

Office of Management and Budget 

(Executive Office of the President)

 

Stuart Katzke

National Institute of Standards and Technology 

(Commerce)

 

George Kraft

National Institute of Standards and Technology 

(Commerce)

 

Cathy Mazur

National Agricultural Statistical Service 

(Agriculture)

 

John Sietsema

National Center for Education Statistics 

(Education)

 

                                  ii

 

 

                            Acknowledgments

 

     The idea to develop a Statistical Working Paper on the use of

computers to support the collection of survey information was first

put forward by Yvonne Bishop of the Energy Information Administration. 

Ms. Bishop has a special interest in data collection techniques that

do not involve an interviewer.  With the advice of members of the

Federal Committee on Statistical Methodology (FCSM), Maria Gonzalez

organized a subcommittee with an expanded scope to examine a range of

computer methodologies that supported the collection of information,

the subcommittee on Computer Assisted Survey Information collection,

(casic) . The members of the CASIC Subcommittee further expanded the

report to include the three important methods of data collection,

Computer Assisted Telephone interviewing (CATI), Computer Assisted

Personal Interviewing (CAPI), and Computer Assisted Self Interviewing

(CASI).  For each related technological area from software interfaces

to computer security, the, CASIC Subcommittee investigated and wrote

sections of the working paper that showed the application of these

areas to CATI, CAPI, and CASI.  The CASIC Subcommittee thanks the

members of the FCSM for their advice and comments on several drafts of

the working paper.  Special thanks go to Charles Caudell (HASS) and

Joe Garrett (Census). for their in depth comments on the various

drafts.

 

 

                                  iii

 

 

Click HERE for graphic.

 

 

 

        COMPUTER ASSISTED SURVEY INFORMATION COLLECTION (CASIC)

 

                           TABLE OF CONTENTS

 

Part I. Executive Summary                                            1

     A.   Introduction                                               1

     B.   Computer Assisted Survey Information Collection            2

 

Part II. Introduction                                                3

     A.   Objectives, Scope, and Users                               3

     B.   Federal Information Processing Standards                   8

     C.   Organization of Report                                     9

 

Part III. Options for Automated Statistical Surveys                 11

     A.   Computer Assisted Telephone Interviewing (CATI)           11

     B.   Computer Assisted Personal Interviewing (CAPI)            15

     C.   Computer Assisted Self Interviewing (CASI)                17

 

Part IV. Methodological Issues                                      25

     A.  Human-machine Interfaces                                   25

     B.  Software Development                                       32

     C.  Data Collection Programs                                   36

     D.  System Interfaces For Data Conversion                      41

     E.  Computer Security                                          44

     F.  Hardware Planning                                          50

     G.  Network Planning                                           54

 

Part V. References                                                  63

 

Part VI. Appendices                                                 67

     A.  Costs                                                      67

     B.  Quality Improvements offered by CASIC                      73

     C.  Survey Examples                                            78

     D.  Taxonomy                                                   94

     E.  Glossary                                                   96

 

                                   v

 

 

I.   Executive Summary

 

I.A. Introduction

 

     Surveys have used computers since the Bureau of the Census

obtained the UNIVAC I. Since that breakthrough,, the power of rapid

calculating has been applied to almost every phase of the survey

process, including sample design, sample selection, and estimation. 

The most important implication of these applications is that survey

practitioners can now consider a growing range of techniques that were

not affordable, or even thought of, before the availability of

inexpensive and fast calculating capability.

 

     The last major survey operation to benefit from automation is

data collection.  Computers were first applied to collection using

mainframes to control certain aspects of telephone collection, and

computer Assisted Telephone Interviewing (CATI) was born.  The first

applications of CATI provided a flood of research worldwide evaluating

the impact of this technique on the survey error profile and costs. 

CATI is now used to help interviewers in all collection activities,

including scheduling calls, controlling detailed interview branching,

editing and reconciliation, thus providing much greater control over

the collection process and reducing many sources of error. 

Simultaneously, a tremendous storehouse of information is captured by

the computer to provide additional insight into the data collection

process.

 

     In just two decades, CATI has become a standard collection

vehicle grounded strongly in a firm body of research.

 

     The ongoing advances in computer technology, and particularly the

arrival of microcomputers, continue to offer survey practitioners more

fertile ground for improving the quality of published data.  The first

portable computers were quickly pressed into service to duplicate the

advantages of CATI in a. personal visit environment.  Thus, Computer

Assisted Personal Interviewing (CAPI) grew from the seeds of CATI.

 

     While CATI and CAPI represent advances for surveys requiring

interviewers, microcomputers are now finding important roles in self-

administered questionnaires, where interviewers are not needed.  These

roles take advantage of more advanced technology and the widespread

availability of technology to allow respondents to complete the

questionnaire without the assistance of an interviewer.  Prepared Data

Entry (PDE) allows respondents that have a compatible microcomputer or

terminal to access and complete the questionnaire directly on their

screen.

 

     Touchtone Data Entry (TDE) allows respondents to call and answer

questions posed by a computer using the keypad of their touchtone

telephone for well-controlled and inexpensive collection.  As an

extension of this approach, recently developed techniques in

 

                                   1

 

 

 

 

Voice Recognition Entry (VRE) allow respondents to answer questions by

speaking directly into the telephone.  The computer translates the

respondent's answers into text for verification with the respondent

and then stores the text in a data base.

 

     These and other collection methods will continue to evolve out of

the work now underway.  New technology will assuredly bring more

options for survey practitioners to consider.

 

     The use of these collection methods, while bringing needed

improvements in the quality of collected data, has created other

challenges.  These automated collection methods are made possible

through the close interaction of statisticians, subject matter experts

and colleagues in the computer sciences.  To use these methods

effectively, each profession must learn and use the models and

techniques of the other professions.  This close relationship will

continue to grow, with advances in each field supporting advances in

the others.

 

     The goal of this report is to profile several automated survey

collection methodologies and provide a glimpse of what future

technological advances may offer to survey operations.

 

     The selection of one or more of these collection methods depends

on a clear understanding of computer applications.  Software and

hardware selection can be essential to success,, as may be the use of

networks for the computers.  As with any survey method, the need to

assure the confidentiality of the data gathered and stored by the

computers is critical.

 

     This report discusses several data collection methodologies now

being used in Federal agencies in terms of procedures, impact on

quality and costs.  It also discusses the significant issues

surrounding the use of advanced technologies to augment survey data

collection.

 

 

I.B.  Computer Assisted Survey Information Collection (CASIC)

 

     For this report, the Subcommittee defines Computer Assisted

Survey information Collection as those information gathering

activities using computers as a major feature in the collection of

data from respondents, and in transmitting of data to other sites for

post-collection processing.  It is in this area of survey operations

that technology is now having the greatest impact.

 

                                   2

 

 

 

 

II. Introduction

 

II.A. Objectives, Scope, and Users

 

     The Subcommittee on Computer Assisted Survey Information

Collection was established in October 1988 to document and discuss the

status and potential use of advanced technology for collecting

statistical data, for its transmittal to central processing sites, and

the conceptual and practical issues surrounding implementation.  High

quality published data begins with collecting high quality data from

respondents.  Much of survey processing addresses, and compensates

for, weaknesses in the quality of the collected data and absence of

uncollected data.  The survey questionnaire, received on time,

completely filled out and accurate, can reduce post-collection errors

and.their related costs.

 

     The Computer Assisted Survey Information Collection Subcommittee

of the Federal Committee on Statistical Methodology has studied the

various implications of the vast computing power now available to

support statistical surveys and is providing this information for use

throughout the Federal Government.

 

Objectives

 

     The primary objective is to describe emerging methods of

interactive electronic data collection and transmission, potential

benefits, and current examples of their use in Federal surveys.  This

report also covers techniques and appropriate references to the

literature.

 

     A secondary objective is to consider specific methodologies and

related issues stemming from the use of computer assisted statistical

surveys.  Also addressed are other practical considerations involving

human-machine interfaces, software design, hardware features, data

transmission and computer security.  The issues involve such f actors

as quality, costs, and respondent reaction to computerized surveys.

 

     Some advantages of automated surveys are:

 

     a.   improved data quality from (1) the introduction of automated

          questionnaire branching, editing features, and computer

          utility support; and (2) a shorter processing path from data

          collection to data processing (e.g., reduced keying errors

          because keying of the paper questionnaire is no longer

          necessary).

 

     b.   improved timeliness of data capture by the elimination. of

          some data entry steps and of extensive editing.

 

                                   3

 

 

 

 

 

     c.   increased flexibility in data gathering (e.g., for

          conducting multiple version questionnaire surveys involving

          question reordering and different natural languages).

 

     In deciding which collection method to use, quality is a relative

idea that is affected by a tradeoff between cost and benefit.  The

choice of a data collection method is usually based on a combination

of performance and cost factors.  Together they determine affordable

quality.  For traditional collection methods, these factors and the

decision-making process are usually well-known.  Now, as technology

progresses, new methods are being tested that expand the array of

potential collection tools and challenge the survey-designer to

reevaluate old cost/performance assumptions.

 

     These semi-automated collection applications fall naturally into

3 areas: (1) Computer Assisted Telephone Interviewing (CATI) where the

interviewer and respondent talk over a telephone, limiting their

personal interactions while maintaining the substantial flexibility

provided by a telephone; (2) Computer Assisted Personal Interviewing

(CAPI) where the interviewer and respondent talk directly across the

table, although this direct access comes with the cost of additional

logistical problems; and (3) Computer Assisted Self Interviewing

(CASI), a newly coined phrase to describe situations where the

interviewer is replaced by interaction with the computer. -

Subcategories include Prepared Data Entry (PDE) where the respondent

uses a computer terminal; and Touchtone Data Entry (TDE) and more

recently, Voice Recognition Entry (VRE) where the respondent interacts

with a computer over a phone line.

 

     However, computer applications are not limited to obtaining data

from respondents.  In addition, the prompt transmittal of reported

data to the processing facility and the conversion of data

 

     to proper formats are important to the publication of timely and

relevant information.

 

     New options will encourage reconsideration of old assumptions

about quality, cost, and technology.  Decisions made years ago in an

era of fewer alternatives should be reviewed periodically.  Many

factors can change in a short period. Only a few years ago, automation

costs were driven by the scarcity of mainframe hardware capacity.  Now

the labor involved in developing specialized systems dominates

automation costs.  Portable and desktop microcomputers were not widely

available at the beginning of this decade.  Now, widely available,

inexpensive and powerful, they are an assumed part of the work

environment.  The tough questions involve the selection of the

appropriate system configuration.

 

     The general goal of this report is to challenge Federal survey

managers to reconsider their operations in light of recent changes

 

                                   4

 

 

 

 

 

in survey methods available, or attainable through new technology, and

to reassess their methods of providing information to the public that

is accurate, timely and relevant.

 

 

Scope

 

     Automated data collection includes three major groups of people: 

the respondents, the interviewers, and the designers and developers of

the system and procedures for collection.  This report covers the

essential factors involved in successfully including the requirements

of each group.

 

     The survey operations considered in this report include the

computer-related activities of design - and development of the

questionnaire, interviewing, data entry, editing and follow-up for

nonresponse or edit reconciliation, data transmission and data

conversion.

 

     The critical activities of sample design, sample selection and

estimation are not included in the scope of this report.  Still, the

choice of an automated collection method is important to these

activities.  This choice must be an integral part of the survey

design.  For example, the decision to use CATI to improve collection

of time critical data may provide the sample designer with additional

flexibility to consider techniques that require rigorous sample

control or complex questionnaire branching logic.

 

Respondents

 

     The respondent must be considered the primary user of any survey

vehicle, whether automated or not, and all aspects of the response

environment must be developed with the respondent in mind.

     The cooperation of respondents is the single most critical factor

in survey operations, and they must be treated with the greatest care. 

Even one-time surveys must strive to leave the respondent with the

feeling of contribution and importance, and the willingness to

participate in future surveys.  Thus, our primary job is to consider

computer-related techniques that allow the respondent to answer the

survey completely and accurately in a natural environment.

 

     Automated collection methods provide survey managers with

opportunities to improve control and reduce sources of error.  These

methods also can be designed to capture workload and performance data

in the background while interviews are conducted.  However, these -

features must not interfere with the natural interactions during the

interview.

 

     The transition to automated surveys presents additional

challenges.  For example, in a switch from mail questionnaires to

CATI, the surveyor must work with the respondents to remove their

                                   5

 

 

 

 

uncertainties about the transition in order to retain their continuing

cooperation.

 

     The arrival of a variety of automated self-response methods

involving computerized questionnaires presents new challenges for

ensuring that the respondent is sufficiently knowledgeable and

comfortable dealing directly with the computer.  As always, the

respondent must be trained in the use of the collection process. 

Whether by simple instructions or more formal procedures manuals, the

surveyor must work diligently to develop simple, clear directions for

use, or risk losing the full cooperation of the respondent.  For

example, in the use of PDE, respondents must interact directly with

computer displays.  This requires understandable questions, adequate

help facilities, and a clear set of allowable answers.  Finally, just

as managers must worry about interviewers' illness, absence,

vacations, and vacancies, designers of automated self-response systems

must include emergency back-up procedure to assure that respondents

can complete the survey.

 

     The design of the human-machine interface requires a clear

understanding of what the respondent expects.  Do people react to

questions differently when presented on paper compared to telephone

interviewers and still differently if posed from computerized displays

or computerized voices? Also, what information is lost by changing

from personal visits, where the interviewer can assess a variety of

non-verbal clues, to telephone collection, or automated self-response

where voices are not directly heard? What are the differences in

application of these techniques in household versus establishment

surveys?

 

     While new automated methods provide many features attractive to

survey designers, new responsibilities come with their use.  The

respondent must be assured of the confidentiality of the data

provided.  Confidentiality is the cornerstone of respondent

cooperation, from the interview through final processing, estimation,

and storage of microdata.  Whereas face-to-face interviews provide an

environment where the respondent can assess and control access by

others, use of telephone collection and transmission of self-reported

data creates new problems in confidentiality.  The integrity and

authenticity of the respondents answers during the transmission

process is a related issue.  The ability to transmit large volumes of

data from remote sites may only partially solve collection, problems

in some surveys that require actual signatures and protection of the

transmitted data.

 

 

Interviewer

 

     The second most important user is the interviewer.  The systems

provided to help in the interview process must be easy to use, must

work consistently and must provide improvements in the interview

environment.  Early use of CAPI required interviewers to

                                   6

 

 

 

 

 

carry the first generation of portable computers to the respondent's

home.  These heavy machines were often left in automobiles until the

interviewer could decide that the respondent was home.  The result was

reduced productivity and higher costs.

 

     Interviewers must believe that computer assistance will improve

their effectiveness.  They need to be convinced that the computer is

simply a tool to speed and simplify their work.  CATI, CAPI and CASI

support specific wording for each question, and simplify moving to the

next question, which is often dependent on previous answers.  However,

these systems can be over-developed so that interviewers are left

little or no discretion for judgment or contribution.  The result may

be low morale, indifference, deviation from established procedures,

and high turnover rates.

 

System Designers

 

     The third important user is the system designer who may use the

computer environment to design the survey and to lay out the

procedures for its use.  Besides the ease of use to both respondent

and interviewer, the decisions made early in the development process

carry over to the ongoing use and maintenance of the system for years. 

The design environment is similar to that used in any software

development process.  Software tools that support this "software

engineering" process should give flexibility to the designer and

provide, for long-term maintenance of the survey.

 

     System designers face difficult choices, such as building

customized systems from scratch versus linking standardized "off the

shelf" software packages.  The inevitable limitations must be compared

against reduced maintenance and lower start up costs.

 

                                   7

 

 

 

 

II.B. Federal Information Processing Standards

 

     Today, more than ever, information is the force that drives the

activities of the Federal Government and information processing

systems are the mechanisms that process, store, and transfer this

information.  Information processing standards play an increasingly

important role in the strategies of Federal Agencies to make more

effective use of their information processing systems by providing

needed interoperability of systems and equipment, portability of data

and software, and methods for protecting data and computers from

accidental and intentional harmful events.  CASIC systems, like other

Federal information processing systems will be more effective if they

implement standards that provide for interoperability, portability,

and security.

 

     Within the Federal Government, the National Institute of

Standards and Technology (NIST) has the responsibility of promulgating

Federal Information Processing Standards and Guidelines for hardware,

software engineering, electronic document interchange, data

management, ADP operations, computer security, and ADP related

telecommunications. in addition, NIST develops conformance tests for

its standards where appropriate.  Developers of computer assisted

statistical survey systems should use NIST's standards and guidelines

whenever possible during - the design, implementation, and operation

of their systems.  A reference to NIST's standards program and

available standards and guidelines can be found in Section V under the

heading of "Standards." Additional information about NIST's program

may be obtained from:

 

     Program Coordination and Support Group

     National Computer Systems Laboratory

     Building 225, Room B151

     National Institute of Standards and Technology

     Gaithersburg, MD 20899

     Telephone:  (301) 975-2833

 

 

                                   8

 

 

 

II.C. organization of the Report

 

     This report is intended to provide reference and guidance for

survey practitioners across the Federal Government in planning and

refining data collection methods.  By sharing information and

experiences, others may gain and add to the effectiveness of

governmental survey activities.  The potential audience is much

broader than those involved in statistical surveys.  Many of the

methods described and the technological issues discussed are

applicable to any information collection activity, including the

collection of management information, program cost, productivity,, and

workload data.

 

     Part III covers the 3 major areas of CATI, CATI, and CASI where

the computer supports survey information collection.  Each major

application is defined and current survey application experiences are

described.  Each discussion describes the impact on specific survey

error components and potential for future applications.

 

     Part IV provides a discussion of broad technological and

developmental issues in the use of computer assisted surveys.  The

areas selected for consideration are: the human-machine interface;

software development; data collection systems; systems interfaces for

data conversion; computer security; hardware planning; and network

planning which includes electronic mail.

 

     Part V contains references organized by categories consistent

with the organization of the report.

 

     Part VI contains the appendices.  Appendix VI.A provides a

discussion of cost measurement relating to use of computers to collect

survey information.  Appendix VI.B provides a general discussion of

the improvements of quality that can be expected with the use of

computers.  Appendix VI.C provides a series.of survey efforts

currently underway, with a point of contact for additional

information.  Appendix VI.D lays out a suggested classification model

for surveys that depend on computer support.  It is consistent with

the various models in the body of this report.  Appendix VI.E contains

a glossary of words in active use where computers and surveys come

together.

 

                                   9

 

 

 

III. options for Automated Statistical Surveys

 

III.A. Computer Assisted Telephone Interviewing (CATI)

 

Definition

     

     Computer Assisted Telephone Interviewing or CATI is a computer

assisted survey process that uses the telephone for voice

communications between the interviewer and the respondent.

 

     CATI replaces the traditional paper-and-pencil questionnaire

interviewing.  The questionnaire is displayed to the interviewer by

the computer who then relays the question over the telephone to the

respondent.  The answers are given to the interviewer for entry into

the computer.  The collections of questions are structured so that

computer examination of previous answers can be used to select the

next question in sequence.  Computer-generated help facilities can be

initiated by the interviewer on command.

 

     The interview environment can be computer generated or handled

manually by the interviewer.  As the CATI systems grow in

sophistication, many manual functions will be taken over by the

computer: sampling unit selection, scheduling of telephone calls,

automatic dialing, and callbacks to respondents who are not reached on

the initial call.

 

     Data collected by CATI should have significantly fewer errors

than manual methods because the interviewer can validate directly

respondent's data that fails internal and historical edit checks. 

Time and cost requirements for data collection, validation, and data

conversion should be reduced.  Computer controlled questionnaires make

it possible to use more sophisticated designs than can be administered

with paper-and-pencil forms.  They can include complex logic

structures and questions finely tailored to the circumstances

associated with a specific sampling unit.

 

 

Examples of Current Use

 

     The exact number of CATI installations throughout the world is

unknown.  It probably is more than 1,000 considering the number of

countries, universities, and private sector vendors and survey

research installations involved in surveys.  In 1988, the U.S.

Government had 51 cooperating CATI centers.

 

     Both opinion and factual data are collected using CATI.  Most

questionnaires contain a mix of these data types.  Questionnaires

range from several questions with very little data validation to

several hundred questions customized for specific respondents

providing the ability to collect conveniently the same data in

different respondent environments.

 

                                  11

 

 

 

 

 

     The National Agricultural Statistics Service (KASS) within the

United States Department of Agriculture (USDA) executed its first CATI

questionnaire (Multiple Frame Cattle Survey) during 1982 in California

using four workstations and completing 100 interviews.  The

questionnaire consisted of 41 questions.  Today the largest known CATI

questionnaire is the December Agricultural Survey. it is used in 14

states with questionnaires customized for each state.  This survey has

over 200 questions with production items recorded in units convenient

to the respondent and converted to a common unit for data validation

and recording purposes.

 

     Today, HASS conducts a total of nine recurring CATI surveys.  The

surveys are monthly, quarterly and annual.  In 1988, NASS completed

125,000 CATI interviews using 183 data collection work stations in 14

remote sites located in state statistical offices.  Besides the

recurring CATI activity, NASS conducted three special data collections

in 1988 and two already were scheduled for 1989.  The questionnaires

were developed over a very short period.  Training time was short. 

The data collection period was somewhat short (3 days - 2 weeks). 

NASS found that CATI lends itself very well to applications with short

implementation schedules.  Field testing of the questionnaires is

efficient because once a problem area is identified, the questionnaire

can be modified and tested on another respondent in generally less

than an hour.

 

     Also, the Bureau of Labor Statistics (BLS) currently uses CATI in

17 States to collect monthly data on employment, hours and earnings

from 6,000 respondents.  BLS further uses CATI (1) to collect Consumer

Price Index (CPI) housing data; (2) to. collect hours at work and

hours paid as an input to productivity measures; and (3) for special

purpose studies to support Department of Labor initiatives.  In

addition, BLS uses CATI methods to conduct telephone record check

surveys to improve data quality.

 

Computing Environment

 

     The Uses of CATI are limited only by the capability of telephone

technology and the use of personal interviewers.  CATI is one of

several phases of the total data collection process.  It can be used

for nonresponse follow-up where initial contact is made by CATI, mail

or capi.

 

     The ability to use varied data collection techniques is

contingent upon the ability to develop computer questionnaires with

common software that can support the various data collection options. 

Common software is important to assure the same data is collected and

the same validations are applied.

 

     The computer has to be responsive in delivering sample units and

questions-to the interviewer.  The computer response times for both

interviewer and respondent must be less than what they would

                                  12

 

 

 

 

 

perceive as an unnecessary delay.  For example, experience has shown

that longer than a second between questions is too long for an

impatient respondent.  Longer than half a second wait for the display

of the next question is too long for the interviewer.  During this

period the computer may be required to access several databases and do

complex mathematical computations which would include logical

decisions affecting subsequent questions.

 

     The computer must deliver a different sampling unit in less than

10 seconds, and ideally in less than five.  During this period the

machine may have to query several potential respondent queues that

relate to scheduled callbacks in different time zones,- to previous

busy signals to be retried every 15 minutes; to special handling of

specific respondents by specific interviewers; to the generation of

new sampling units; and to the disposition of the completed interview

as correct.

 

     The software that drives the questionnaire must be easy for the

interviewer to use.  Question paths through a questionnaire must be

simple and easy for the interviewer to handle.  Menus with abbreviated

questions or questionnaire areas are desirable.  Skipping back to an

earlier question, changing that answer, and establishing another routs

through the questionnaire must be easy and quick to do.  Commands must

be standardized for use in related surveys to enable "second nature"

reactions by the interviewer in any given situation.

 

     The design of a CATI questionnaire poses problems beyond the

design of standard questionnaires.  If the designer has problems

developing the questionnaire, the interviewers will almost surely find

it difficult to use.  The objectives of the survey questions in a

computerized questionnaire may be no more complex than questions used

in pencil-and-paper surveys.  However, the flexibility provided by

automated question paths makes their design more difficult as the

possible sequences of questions must be worked out during design. 

Paths and branching must be worked out in advance and there may be

significant differences in question wording and in their number. 

Automatic sampling unit management can pose some difficult logic

problems for the automated survey designer.  Data validation using

historical or internal data correlations is a complex logic problem,

but is essential for recurring surveys.  Well designed computer

environments provide the interviewer with the ability to review the

respondent's answers for correctness and to annotate unusual

circumstances.

 

     Before the computer questionnaire designer can begin, the

questions must be developed by the survey staff using knowledge of

statistical theory and specific subject matter.  This survey staff

also must be well versed in face-to-face, self-administered, and

telephone questionnaire design.  In the face-to-face interview the

interviewer can offer explanations of the question, then probe for

additional information; and if necessary,, provide the respondent

                                  13

 

 

 

 

 

with the paper version of the questionnaire.  The respondent can

study, read ahead, reflect, and finally answer with a clear

understanding of the meaning of the question.

 

     For a self-administered questionnaire, the respondent no longer

has the benefit of the interviewer, but still can examine the

questionnaire in detail. in telephone interviewing the respondent may

not have the form in-hand and thus may be missing the visual clues

needed to understand the question.  Therefore, questions used in

telephone interviewing should be structured using single concept

questions.  Some simple applications rely less on posing very

structured questions and more on a "forms-screen" approach.  This

approach replicates the survey form on the computer screen.  Edit

failures may be highlighted, perhaps with a different color, and the

interviewer is trained to ask probing questions to reconcile suspected

inconsistencies in the responses.

 

                                  14

 

 

 

 

 

III.B. Computer Assisted Personal Interviewing (CAPI)

 

Definition

 

     Computer Assisted Personal Interviewing (CAPI) is a personal

interview conducted usually at the home or business of the respondent

using a portable personal computer.  In many respects it differs from

CATI only in the presence in the same room of the interviewer and the

respondent.  As with CATI, the questionnaire is programmed into the

computer with all the necessary logic to control the question path --

the logical flow of the questions based on such factors as previous

answers -- and provides both for computer generated editing by

pointing out inconsistencies to the interviewer and for direct editing

by the interviewer.  The system must be self-contained as the

interviewer does not have immediate access to supervisory assistance

or to other data sources.  The interviewer reads aloud each question

as it appears on the screen and records the respondent's answer in the

computer while providing interactive assistance to the respondent.

 

Examples of Current Use

 

     CAPI is currently being used by the National Center for Health

Statistics (NCHS) for the implementation of the National Health

Interview Survey (NHIS).  The Census Bureau is performing the field

data collection for NCHS.  The NHIS is a household survey conducted in

approximately 50,000 households per year.  CAPI has been used to

collect a portion of the survey data: the AIDS supplement

questionnaire that requires approximately 15 minutes to complete.  The

1990 Health Promotion and Disease Prevention Questionnaire of the NHIS

will be fielded in January 1990.  Major tests of CAPI have been

conducted by the Bureau of the Census and the Research Triangle

Institute.  National Analysts conducted a nationwide CAPI for the USDA

sponsored 1987 Nationwide Food Consumption Survey.  The Bureau of

Labor Statistics used CAPI for establishment record check surveys. 

National opinion Research Center also is experimenting with CAPI.  In

Europe, CAPI has been used by the Netherlands Central Bureau of

Statistics to collect data for the Netherlands Labor Force Survey. 

The U.K. Office of Population Censuses and Surveys has also carried

out a major test of CAPI.  Most of these efforts are at an early stage

of CAPI development.

 

Potential Uses

 

     CAPI can be used for all household surveys and establishment

surveys, and the software can be used f or any of the other automated

data collection mechanisms.  As the technology improves to provide

lighter computers with longer battery life and user friendly software,

CAPI will be used more often, particularly for quick turnaround

surveys.  Procedures for developing CAPI

 

                                  15

 

 

 

 

questionnaires are similar to those for CATI.  However, greater

emphasis must be placed on help features because the CAPI interviewer

cannot rely on nearby experts.

 

     The type of resources and expertise needed to apply CAPI

technology to a survey are dependent on the availability of a good

authoring system.  If an authoring system is readily available, the

CAPI, survey instrument can be prepared by the typical survey

instrument designer with little or no computer experience.  Computer

programming assistance will be needed to write the case management and

output portions of the software.  Usually these portions of the survey

vary with each survey or survey instrument; therefore they must be

custom programmed.  On the other hand, if an authoring system is not

available, the entire CAPI instrument must be custom programmed with

either a general purpose language or a special purpose CAPI language. 

In either case, computer programming expertise is required.  The level

of expertise is dependent on the language selected. in addition, the

survey instrument preparation will require the services of a survey

instrument designer who will need to work very closely with the

computer programmers.

 

                                  16

 

 

 

 

 

III.C. Computer Assisted Self Interviewing (CASI)

 

Definition

 

     Computer Assisted Self Interviewing (CASI) has been introduced

into this report as a category to cover a new but growing area of

computer assisted surveys that involves data collection without the

direct presence of an interviewer.  CASI can take several different

forms that are differentiated by the collection method.  These include

Prepared Data Entry (PDE) where the respondent answers questions

displayed on a computer terminal; Touchtone Data Entry (TDE) where the

respondent answers computer generated questions by pressing buttons on

a telephone; and Voice Recognition Entry (VRE) where the respondent

answers questions by speaking directly into a telephone.  We consider

each in turn.

 

 

Background

 

     Self-response data collection has always been used for many

surveys that are mailed out.  This form of self-response collection

features simplicity in administration leading to low initial overhead

when compared to CATI and CAPI.  However, mail self-response

necessarily involves a reduction in control over the collection

process.  It is difficult f or the survey practitioner to assess the

status of the collection effort, e.g., whether the responses are in

transit or still in the respondents' hands.  Extensive mail or

telephone follow-up involves great costs, perhaps offsetting the

original simplicity of mail, and-risks ongoing cooperation, especially

if the response is "in the mail."

 

     In annual or quarterly surveys, mail may be the appropriate

vehicle.  In time critical surveys, the characteristics of mail

collection leave wide gaps in control.  Computer Assisted Self-

Response methods now being introduced into surveys hold great promise

to maintain the advantages of mail self-response, while improving

control and the ability to intervene in the collection process.

 

 

Definition - Prepared Data Entry (PDE)

 

     Prepared Data Entry (PDE) places the respondent in direct contact

with a computerized questionnaire through a computer terminal. in a

sense the computer is acting as the interviewer in a manner similar to

CATI or CAPI interviewers.

 

     The respondent uses a personal computer or terminal to fill out

interactively the survey questionnaire.  As each item appears on

screen, instructions and definitions for that item appear on a split

screen or are accessible by pressing a help key.  As data are entered,

range and consistency checks are automatically applied and

                                  17

 

 

 

 

anomalies pointed out to the respondent.  The response to previous

items may control the question path of the questionnaire.  Because of

the lack of an interviewer to help the respondent, the guidance

provided by the program must be substantial and the computer literacy

of the respondent is essential, at least at this stage of development.

 

     This category of automated data collection programs includes a

rapidly expanding set of respondent initiated data entry and

transmission methods.  These methods are directly dependent upon the

computer and telecommunications hardware available to the data

providers.  Individuals, small businesses, or reporting agents can

enter data into a personal computer in response to pre-programmed

floppy disks and mail the disks to the collecting agency.  Firms with

modems can transmit the data through telephone lines directly to the

collecting agency's mainframe, or via an electronic mail service. 

Larger firms with mainframes can download the data to a PC, then

either transmit directly from the PC over a modem to the agency's

mainframe or place the data on a diskette and mail it to the agency.

 

     These methods eliminate the need for rekeying the data and

suffering the risk of data entry errors.  The transmission methods

using telephone lines save several days in each collection cycle by

eliminating dependence on the physical transportation of machine-

readable data whether by mail or special couriers.  The data must be

checked to detect and correct errors introduced during transmission.

 

Examples of Current Use

 

     In the early 1980's, the Internal Revenue Service (IRS) decided

that the electronic transmission of returns by tax preparers to IRS

would be both a practical and cost-beneficial alternative to the

mailing of paper tax returns when a refund is claimed.  According to

the Agency, the benefits of electronic filing would include: (1)

reduced manual labor costs required to process, store, and retrieve

returns; (2) faster processing and retrieval of tax data; and (3)

reduced interest IRS must pay to taxpayers who file timely refund

returns that are not processed on time by the IRS.

 

     Further, IRS reports show that electronically transmitted returns

are processed with significantly fewer errors than paper returns. 

According to IRS figures for the 1988 filing season, as of April 29,

1988, 20 percent of paper returns processed by IRS had errors and only

5.5 percent of those filed electronically had errors.  For taxpayers,

electronic filing can mean refunds up to 3 weeks sooner, and because

IRS can deposit these refunds directly into taxpayer bank accounts,

refunds may arrive 3 to 4 days earlier

 

                                  18

 

 

 

 

 

than that.  For tax preparers, the ability to provide electronic

filing services to taxpayers promises a competitive business edge.

 

     The Petroleum Supply Division (PSD) of the Energy Information

Administration (EIA) decided in 1987 to investigate electronic forms

submission to collect the Petroleum Supply Reporting System (PSRS)

survey forms.  Ten of the major petroleum companies who file the

mandatory "Monthly Refinery Report" were contacted to assess their PC

and communications capabilities.  The respondents contacted showed

interest in investigating the use of PC's to collect this data.  Often

they were already using PC's for business, personal or academic

purposes.  The respondents either had a PC in their office area or had

access to one in another office.  Software such as Lotus 1-2-3 and

dbaseIII could usually be found on these PC'S.  Some PC's were

equipped with communications capabilities and those respondents were

already using telephone lines for company reporting.  It appeared to

be the appropriate time for the PC to enter the PSRS data collection

process.

 

     Early in 1988, PSD developed the Petroleum Electronic Data

Reporting option (PEDRO) and began providing its respondents with a

software diskette by which they could create an electronic image of

the form on a PC screen and enter their data in the appropriate cells. 

Firms having the necessary software capabilities can use their data

base to feed data directly to the electronic survey form eliminating

keying and transcription errors.  User-friendly software with help

functions has been added to data entry functions to provide quick

reference to definitions, conversion factors or other information to

speed the completion of the survey form.  This eliminates the need to

search hard-copy files for survey forms instructions, product

definitions, conversion tables, etc.

 

Definition -- Touchtone Data Entry

 

     Touchtone Data Entry (TDE) has been used for many years in the

private sector for a growing range of applications.  TDE, also known

as voice response, is used for banking by telephone, call routing,

college class registration and "talking yellow pages" to name just a

few.  The process is simple.  The caller initiates a call to a

computer which asks a series of questions.  The caller answers using

the touchtone keypad and the tones are recognized by the computer. 

The process offers inexpensive collection because there are little

ongoing labor costs after development.

 

     In a survey environment, TDE may be applied where the desired

responses are numerical, or when responses can be linked to a

numerical code, such as "yes" is "1" and "no" is "0."  As in other

applications, the respondent initiates the call to the collection

computer which controls the flow of the interview.  The computer asks

questions in either a synthesized voice or from a file of

 

                                  19

 

 

 

 

 

digitized phrases prerecorded by a human speaker.  After each

question, the respondent keys the answer.  The computer also repeats

each entry for verification directly with the respondent, and an

acknowledgement is required, such as "1" equals "correct."

 

     TDE offers many advantages over other collection methods.  In

repetitive surveys, the respondent retains a single form for monthly

or quarterly calls, reducing.the costs of both postage and the labor

involved in mail handling, both outgoing and incoming.  Costs for data

entry and data verification are eliminated.  Most importantly, the

uncertainty about sample status is minimized.  The status of the

sample can be assessed through analysis of the received calls versus

the list of active TDE respondents.  Informed judgments can be made

about the timing and extent of the nonresponse workload.  No time. is.

lost while survey forms are in the mail or waiting for data entry. 

This is especially important for time-critical surveys.

 

     TDE also offers convenience for the respondent.  The computer is

always available to accept the calls.  For busy respondents who are

frequently out of the office or away from home, in meetings or

traveling, this feature may be preferable to scheduling calls in

advance and risking interruptions and repeated callbacks.  TDE

reporting may require less time than CATI.

 

     TDE has some limitations that should be carefully addressed in

each survey environment.  First, not all respondents have touchtone

phones.  Thus, implementation of TDE would likely be in combination

with other collection modes, adding to the complexity of survey

management.  As with mail collection, the respondent also may need to

be reminded to call in, although a simple advance notice postcard has

proven very successful when properly timed.

 

Examples of Current Use

 

     The only known survey application of TDE is the Current

Employment Statistics (CES) survey at the Bureau of Labor Statistics

(BLS).  The CES program covers over 300,000 non-farm business

establishments monthly.  The, data items are few, essentially

employment, hours paid, and earnings, and the CES is conducted by mail

in conjunction with each state, the District of Columbia, Puerto Rico,

and the Virgin islands. collection of CES data is time critical. 

Preliminary estimates are published after 2 weeks of collection. 

Thus, the time lost due to the variability of the mails has A severe

impact on response rates.

 

     Initial experiments were done using CATI.  Large scale tests of

CATI collection, involving 13 states and over 5000 respondents

monthly, successfully showed the ability to collect data from the vast

majority of respondents in time for the first publication.  More than

half the CATI sample was drawn from chronically late

 

                                  20

 

 

 

respondents.  Response rates are routinely 85 percent versus 50

percent for mail.

 

     The higher costs of CATI stimulated interest in TDE self-

response.  The results of small scale tests in 4 states suggest that

TDE can retain high response rates over a sustained period.  Calls

average less than 2 minutes, and about 25 percent of respondents are

given short reminder calls just before the collection deadline.  BLS

is expanding TDE use to over 15 states during 1990.

 

     Procedurally, the combination of advance notice postcards, timed

to arrive during the reference period, and short nonresponse calls

provide a strong, inexpensive collection process. TDE respondents

receive a package of materials that explain the new collection method,

how it differs from mail and telephone collection.  First-time TDE

users are requested to call the computer on a test basis using special

codes before they are asked to submit real data.  The machine readable

data are uploaded to mainframes for further editing and

reconciliation.

 

     The respondents chosen for the first TDE tests were drawn from

those under CATI collection.  In this way the higher costs of CATI can

be offset by savings from TDE.  Other TDE tests targeted mail

respondents who generally reported on time.

 

     The widespread use of touchtone systems has spawned an industry-

wide working group to standardize features (e.g., the key on the

telephone) to simplify user access.

 

 

Definition -- Voice Recognition Entry

 

     Voice Recognition Entry (VRE) is Just developing as a technology. 

The characteristics of VRE are essentially the same as TDE.  The

respondent initiates the call to the computer, but instead of using

the touchtone keypad, the respondent speaks to answer, in this

application the spoken digits 0 through 9 and "yes" and "no." Both

"oh" and "zero" are recognized.

 

     There are two essential features for VRE systems.  First, they

should provide speaker independent recognition, meaning that almost

any voice can be recognized without any "training" of the system. 

Some systems require extensive training of the software for each

voice.  While this is used in some office dictation systems, it is

probably impractical for survey operations.  Also, systems should

provide for rapid entry of responses using continuous or connected

digits.  These features are commercially available for both

microcomputers and minicomputer applications.

 

     VRE also has limitations in application.  First, VRE is only

applicable to respondents with access to a phone, a small but

 

                                  21

 

 

 

unavoidable problem.  Recognition accuracy is the primary determinant

of respondent acceptance.  The system in use at the Bureau of Labor

Statistics was designed using speech profiles drawn from the mid-

western states.  Dialects from other regions may reduce the accuracy

of the recognition leading to respondent frustration and low

acceptance.  Early test results suggest that recognition remains high

in Maine, the home of a very difficult dialect for the speech

interpreting algorithms.  More testing is planned to decide the limits

of current technology.  Improving, recognition accuracy is the primary

objective of the companies involved in speech research and

development.

 

     Development of VRE is presently limited because there are few

current applications to provide advance training and public

acceptance.  Early results suggest that respondents familiar with TDE

and VRE prefer the later as more "natural." This finding points out

the differences in questionnaire design.  TDE questions ask

respondents to "enter" data, whereas VRE respondents are asked

questions in a manner similar to CATI because the responses are

spoken.  Recently, experiments using voice recognition have begun to

appear, conveniently providing training for future survey respondents. 

Also, the similarities between TDE and VRE may minimize acceptance

problems.

 

     Both TDE and VRE applications at BLS use short questionnaires. 

These techniques may limit the length of the survey, but this requires

testing.  They provide convenience and low costs, but respondents may

balk at long lists of questions and the current limitation on the

range of allowable answers to numbers and a few words.  VRE offers a

variety of interesting research problems in speech recognition and

natural language understanding.  These systems have not yet come into

widespread use.

 

 

Examples of Current Use

 

     The BLS is now conducting tests of voice recognition in the CES

survey.  The procedures will parallel those used for TDE and will

assess the effectiveness of VRE for the entire U.S. population.  They

will examine any limitations involving multiple telephone systems,

geographic distances, and respondents, acceptance.  Acceptance by

respondents has been high.

 

Potential Uses

 

     These computer assisted self-response methods have wide potential

applications.  Ideal surveys are repetitive, short and numerical,

especially if the data are entered into a computer before the call is

made.

 

                                  22

 

 

 

     TDE has been considered for screening eligible respondents from

the population.  Since eligibility is usually determined by very few

criteria, a mailed form could direct the respondent to call in the

answers to one or two questions to a central computer.  After entering

the unique identification number, the respondent would answer these

questions.  Then the survey manager would use the machine readable

file for nonresponse follow-up and subsequent sampling.

 

     BLS is considering TDE for pilot tests of survey supplements and

other special one-time surveys to reduce costs and add valuable

control, to augment or replace the traditional mail process, and to

gain experience in the design and use TDE systems.  The logical

extension of existing TDE and VRE technology is the linking of them

into a single system.  For example, respondents call the system which

then asks the respondent to respond by touchtone.  If the tone is not

recognized, the respondent is automatically switched to a VRE

component.  A third feature would be available to record changes in

the respondent's attributes (e.g., name or address), or to record

open-ended responses for later transcription -- voice mail. 

 

     Self-response methods are not limited to survey applications. 

Any ongoing project that collects cost, workload or other management

data could use self-response methods for inexpensive collection.  For

example, a large copier company uses TDE for collecting billing

information.  Equipment renters are required to call in the monthly

usage levels by entering copier usage as touchtone data.  The computer

then generates a bill in response to the touchtone entry.  Also, the

U.S. Postal service uses TDE to link callers to prerecorded tapes

covering the most frequently asked questions.  The BLS will begin

using similar technology to answer routine inquiries for economic

information.

 

Future

 

     Voice technology is still being developed.  "The NIST report

argues that the most natural mode of data collection is not paper or

keyboards, but speech" (William Nicholls, 1989).  Recorded voices are

currently being used in some surveys.  Speech technology includes

voice simulation which is useful today in TDE applications.  While

numerical and very limited vocabulary are being used in data

collection, it will be some time before automated speech systems will

be used to recognize free-form human speech in. a telephone interview

or in a personal interview setting.

 

Summary

 

     Some items to consider when deciding between data collection

methods are as follows:

 

                                  23

 

 

 

 

     1.   CATI offers cost saving over the personal interview setting

          and would be useful f or a large complex survey environment. 

          However, it misses people without telephones.

 

     2.   CAPI retains the benefits of a personal interview setting

          where response rate is important, and does not require a

          telephone.

 

     3.   TDE is cheaper than CATI, but cannot handle the complex

          survey, and respondent acceptance is a concern.

 

     4.   PDE is typically used in an establishment survey.  It does

          not require a separate-key entry stage, but requires

          respondents to have access to a terminal, typically a PC.

 

     5 .  VRE will see only specialized application in the medium

          term.

 

     Whichever technique is selected, the integration of the

electronic data collection method into computer based survey system

should be considered.  For example, address; labels and other

administrative items must be created from the sample database, then

the interview proceeds, editing is done, and the resulting data are

fed into the analysis or summary system.

 

     Also, the decision maker should consider whether to use a single

or mixed mode of data collection.  Two examples of mixed modes are the

Census" integrated CATI/CAPI design, or the BLS" integrated TDE/CATI

design.  William Nicholls comments that "In the long run, the best

data collection strategy for establishment surveys may prove to be a

readiness to accept whatever combination of methods the respondent

finds most convenient."  The creation of new technologies and

improvements to existing technologies will continue to have an effect

on data collection methodology.

 

                                  24

 

 

 

IV.  Methodological Issues

 

IV.A. Human-Machine Interfaces

 

Introduction

 

     The design of the interface between a person and a computer can

decide the success or failure of the interaction.  Although the

situation is improving, there is generally too little attention paid

to the effect of interface design on user performance.  Interface

design is often not considered until the last stages of software

development when the total design has already been "locked-in."

 

     Automated surveys will involve people with widely differing

abilities using machines ranging from manual data-entry devices to

powerful,computers.  Interface issues will reflect this diversity in

people and machines.  There is no one interface that will satisfy all

needs.  The relative importance of a given interface issue will depend

entirely on the context of person-machine environment.  Nonetheless,

there are some guiding principles of user interface design.

 

     CASIC benefits from consideration of user-related factors in

interactive systems, interaction styles, interaction devices, response

time considerations, system messages, printed manuals, online help,

tutorials, and development styles.  Many of these topics involve

detailed consideration of how to present the computer power to the

user.  For example, interaction styles can be broken down into command

languages that the user must learn before using the computer, menus

that guide the user through the necessary procedures, and the direct

manipulation of objects whose icon representation appears on the

screen. similarly, interaction devices can take on many forms --

keyboards, function keys, pointing devices, speech recognition,

displays, printers, etc.

 

     Techniques for automated information collection include CATI

CAPI, computer assisted self-response surveys, and prepared data

submission on tape.  Except for tape submission, these techniques

involve user interface design considerations.  All must be

successfully used with little or no training.  The user interface must

be "self-evident." Error recovery is important.  The user must be

protected from making errors wherever possible.  When it is possible

for the user to err, the recovery procedures must be positive,

helpful, and easy to follow.

 

User of the Interface

 

     It is essential to determine who the user of the interface will

be before designing the interface.  In automated statistical surveys,

a user may be a well trained and highly motivated survey

 

 

                                  25

 

 

 

professional.  At the other end of the range, the user may be a first-

time or only grudgingly cooperative survey respondent.  Even within

somewhat narrow user populations, there will be differences among

users that can affect the usefulness of the interface.  It may not

even be possible to design an interface that perfectly suits a single

user because the user is subject to changes over time due to personal

factors, new experiences, and changing needs.  A user-interface design

team should include an applied psychologist to help determine the

psychological profile and needs of the user.  The personality,

training, and experience of the potential users are large factors in

determining the most appropriate interaction style or styles for the

user interface.

 

Interaction Styles

 

     The choice of interaction style is also affected by the hardware

to be used in the survey.  Survey techniques that make use of

computers with standard input/output devices can use command

languages, menus or direct manipulation.  Command languages are used

to interact directly with the operating system of the computer.  They

allow a wide range of system functions -- storage, deletion, copying

and printing of files -- to be done.  The cost is a steep learning

curve to master the commands.  Command languages, while hard to learn,

are also easy to forget.  They can be intimidating to novice users who

realize that information can be lost or damaged by poorly chosen

commands.  On the other hand, a person familiar with command languages

can work rapidly and effectively.  For some people, mastery of a

command language is a source of pride which provides a sense of

satisfaction and motivation for good job performance.

 

     Menu selection represents another approach to interaction style. 

Menus present the user with a set of only those choices that are

appropriate at a given time.  The choices are often numbered or

lettered so the user can choose by entering the appropriate number or

letter from a keypad or keyboard.  Sometimes the choices are keyed to

the first letter of the line containing the choice.  Then, the

designer must be sure to avoid duplicate use of the starting letters. 

Some menus use pointing devices such as cursor keys, a trackball, a

joystick, or a mouse to highlight choices.  The user moves the

pointing device to make a choice, then pushes a button to make the

selection.  Also, menus may offer only single-line choices.  For

example, a menu may ask for confirmation of a request by entry of y

(for yes) or n (for no).

 

     Menus are often organized hierarchically in graphs - data

structures used to represent relationships among objects.  Family

trees are a form of graph that show the relationships of a person to

other family members.  Airline route maps are graphs that show paths

the airline follows in flying between locations.  With menus, the user

is essentially "flying" by making selections from the

 

                                  26

 

 

 

graph of menus (the technical term is "walking").  Selection of one

item from a menu takes the user on a different path through the graph

than does selection of another item.  Graph structures can ease the

design problem for complex user interfaces, but also can lead to user

confusion.  The user must be able to maintain a sense of location in

relation to previous choices made.  The user also must be given easy

access to "escape hatches" if an unwanted path (undesired choices) has

been walked on the graph.  CATI and CAPI designs rely heavily on

complex branching structures to control the interview.  The menus and

list of allowable responses must be clear, exhaustive and enable the

interviewer to retain effective control.

 

     Direct manipulation (DM) interfaces offer a third approach to

interaction style. in DM, the user is given the impression of directly

interacting with the objects of interest.  As an example of a DM

interface, consider a modern word-processing system.  The screen

representation of the document is made to be as close to the

appearance of the finished document as possible.  This is sometimes

called WYSIWYG, (pronounced "whizzi-wig"), for "What You See Is What

You Get." The user operates directly on the screen representation of

the document and immediately sees the results of the operation.  Many

commercially available graphical interfaces show how far DM can go

toward helping the user.  A mouse is typically used as the pointing

device to objects on the screen.  A typical screen object is an icon

that symbolically represents the object.  To delete a file, for

instance, the user simply points to the file name and "drags" it over

to a trashcan icon.

 

     Menu selection and direct manipulation are important user

interface techniques in situations that involve novice users with

little opportunity for training.  Although the interfaces must

accommodate novice users, they also must be flexible enough to avoid

frustrating more experienced users.  Direct manipulation can

accommodate novice and experienced users equally.  Menu systems should

allow experienced users to "select ahead" or to revert to a command

language style of interaction.

 

     Survey techniques that do not use more-or-less standard computers

will raise unique interface issues.  Alphabetic input, such as name

entry, in telephone keypad-entry systems raises the question of letter

assignment to keys that have multiple letters on them.  Disambiguation

may be possible when the entries can be compared to a fixed list of

permissible entries.

 

     Speech recognition and synthesis devices have the potential for

radically changing the preferred interaction style in user interfaces. 

Although speaker-independent recognition of free-form spoken natural

language is still in the future, rapid technological advances are

being made in the ability to recognize automatically a subset of

articulated words.  Advances are also being made in the ability to

synthesize natural-sounding speech under computer

 

                                  27

 

 

 

 

control.  The best form of human-machine interfaces in any give

situation or for any specialized group of users is still a research

question.  This can lead to degradation of the quality of the survey

due to user errors and frustration.

 

     Some survey techniques are already speech based.  In CATI and

CAPI, the user interacts with a speaking and listening person who is

visually and manually interacting with a computer.  The person

conducting the survey uses common sense to interact with the

respondent.  Although there are substantial efforts to imbue a

computer with common sense, practical use of this research remains in

the future.  Thus, the effective replacement of the human interviewer

by a computer also remains in the future.

 

Error Avoidance and Recovery

 

     Whenever possible, interfaces should be designed so that errors

are not possible.  The nature of potential errors in a given interface

must be thoroughly understood to lessen the probability of their

occurrence and the cost of recovering from them.  When a particular

sequence of operations is necessary to do a complex operation, the

interface should be designed to combine the entire sequence into a

single operation.  This will reduce the number of operations required

of the user (who probably thinks of the sequence as one operation

anyway).  All displays must have consistent layouts so the user does

not have to spend time and mental energy scanning the screen for

information.

 

     The interaction style can have a profound effect on errors. 

Properly designed menu systems can reduce errors by simply not

offering poor choices.  Choices offered must be clearly labelled.  The

consequences of a choice must be shown before the choice is made. 

There must be consistency between menus.  For example, a choice common

to all menus (such as Cancel Menu), must appear in the same place in

each menu and must have the same consequence (such as reversion to the

previous menu).

 

     Error messages should be designed to help the user.  The messages

should be specific, positive in tone, and constructive.  They should

tell the user what can be done to correct the error.  Whenever an

error is made, the user must have a clear and easily followed path to

recovery.  This not only reduces the seriousness of the consequences

of the error, but increases the user's confidence even in the face of

a few errors.

 

     Adequate training can help to reduce errors and increase

respondent acceptance.  Certainly, respondents should be trained

before using the system.  Good training can be reinforced by providing

on-line or telephone-accessible help and on-line tutorials. on-line or

telephone-accessible help gives the user an

 

                                  28

 

 

 

immediate reminder about proper operation of the system.  On-line

tutorials allow the user to review the correct procedures.

 

Design of Automated Form

 

     In general, automated forms should not be automated versions of

the manual forms they replace.  They should be designed from scratch

to consider to make use of opportunities and limitations introduced by

automation.  Sometimes, it might be appropriate to maintain the same

"look and feel" between a manual form and its automated counterpart. 

For instance, user training might be reduced by minimizing changes. 

In these cases, the form designers should compare the benefits of

staying with the old form with the costs of designing a new form.

 

     Automation provides opportunities for higher productivity, lower

errors, and greater user satisfaction over manual methods.  Repetitive

information can be automatically filled in from one form to another. 

Automatic editing for internal consistency and logical consistency

should help to lower error rates.  Automated forms also can provide

on-line help and tutorials for the user.

 

     Automated forms need not even look like paper forms.  The user

can be led through an interactive dialogue while the computer does the

data formatting.  Form fill-in is just one interactive style.  Menu

selection has already been mentioned as another style.  Form designers

should consider using hypertext, a recent development in interactive

systems which provides a browsing environment.  For example, the

reader can display a definition simply by pointing at a word or phrase

with a mouse.  Hypertext would allow non-linear traversal of forms, as

appropriate for the data being filled in.  For example, in surveying

for medical information, gender data can be used to steer the user

around inappropriate survey questions.

 

     Form designers should have a repertoire of techniques for

designing and testing forms.  Expert systems might be developed to

help in form design and interaction design.  Effort placed in

designing expert systems would pay off handsomely in easing individual

design tasks.  Such systems also should produce forms that are more

consistent and complete than forms produced in a paper environment.

 

Quality Measures

 

     It is critically important to test user interfaces before

presenting them to the users.  Professor Ben Shneiderman of the

University of Maryland has identified five goals that lend themselves

to precise measurement:

 

                                  29

 

 

 

 

 

     1.   Time to learn - how long does a typical user take to learn

          to use the system?

 

     2.   Speed of performance - how long does it take to carry out a

          benchmark set of tasks?

 

     3.   Error rate - how many and what kinds of errors are made by

          typical users?

 

     4.   Subjective satisfaction - how much do users like using the

          system?

 

     5.   Retention over time - how well do users maintain their

          knowledge?

 

 

     It is not enough to guess how well a system meets these quality

measures.  It is essential to test the system.  A testing laboratory

is essential for any significant design work.  Design groups may build

in-house laboratories, or may seek help from existing laboratories. 

It often happens that persons who are skilled in computer programming,

data collection techniques, or statistical methods are not fully aware

of the skills and deficiencies of the user population.  It is not a

good idea to concentrate the entire design effort in the hands of task

specialists.  The human factors role must be an integral part of every

design team.  Large teams might include psychologists, sociologists,

and other human factors specialists.  Smaller teams should at least

assign one team member the role of human factors specialist.  If

nothing else, this person can play "devil's advocate" to be sure the

appropriate questions are raised.

 

     Data about user performance under current conditions must be

collected before beginning new systems.  It will not be possible to

determine the relative quality of a new system unless quantitative

measures of the quality of the old system are available.  The first

task of the design team must be to develop guidelines for the design. 

Such items as menu selection formats, terminology, screen layout, data

entry formats, error messages and recovery procedures, on-line help,

and training should be considered and decided upon before any other

significant design work is begun.

 

     Rapid prototyping is a powerful technique which allows .iterative

convergence to a design.  Partial system implementations are made

quickly, presented to potential users, and tested.  Further

development is based on these interim tests.  Because each step in the

development cycle is small, and tested incrementally, only small

corrections in direction are needed at each step.  Conceptual errors

are quickly uncovered and are easy to correct.  Rapid prototyping

methods contrast sharply with the more conventional "waterfall" design

methodology.  The waterfall method requires detailed up-front

specification of the design, with a

                                  30

 

 

 

 

 

full-blown design f lowing down to a full-blown implementation.  While

this method may be appropriate in situations where the goal is clearly

understood at the start, it has the disadvantage that changes made in

any phase of the design tend to be large and expensive.  This usually

discourages change and leads to Acceptance of a lower,quality product

or total abandonment of the design.  A disadvantage of rapid

prototyping is that formal specifications and documentation may never

get produced in the flush of excitement over the rapidly evolving (and

working) system.  The waterfall methodology is appropriate as the

final phase of a rapid prototype design.  Because rapid prototyping

quickly produces a working model and deep understanding of goals and

tradeoffs, waterfalling can be effectively used to provide the missing

rigor and discipline.

 

     Evaluation must continue. even after a design has been completed

and fielded. on-line suggestion boxes and trouble reports, designed

right into the survey forms, provide easy channels of communication

between the user and the designers.  A user who suggests improvements

or reports trouble should receive prompt responses and fixes.  Large

surveys might consider the use of a commercial bulletin board system

as the communications medium for problems, suggestions, and fixes.

 

                                  31

 

 

 

IV.B. Software Development

 

Introduction

 

     There are two types of software that will be discussed in this

section: software that helps in the creation of a survey questionnaire

and software that makes up the actual programming code to execute the

survey questionnaire in the field.  This distinction is directly

analogous to the usual notion of a highlevel programming language

(e.g., FORTRAN, COBOL) in which you describe the problem in terms that

humans can understand.  This high-level description is then passed to

a compiler that translates the description into an application program

the computer can understand.  For convenience, refer to the survey

creation software as the survey definition process and to the use of

the resulting application program as the survey application process.

 

     Most of the discussion will relate to the creation software. 

Historically, software development for automated field data collection

began with a mainframe application for CATI.  As hardware technology

progressed, CATI was moved first to a minicomputer and then to a

microcomputer.  The CAPI application became possible with the

development of the "light weight" portable microcomputer.  Software to

produce an automated questionnaire is perhaps the most important and

potentially the most costly ingredient in the automated field data

collection equation.  Ideally, such software should be available off-

the-shelf . Although there have been several attempts to develop such

software, success has been limited.

 

 

     To date, the development of automated questionnaire software has

been done in one of two ways.  The questionnaires are custom

programmed using one of a variety of general programming languages

(e.g., Pascal, C, FORTRAN), or they are custom programmed using a

specialized CAPI/CATI programming language.

 

     The specialized languages generally provide a means to describe a

variety of attributes: the question text; the answer text; the type of

answer expected, (e.g., single, multiple, fill-in, free text) ;

question paths (e.g., simple -- go to next question in order or

complex -- based on the answers to previous questions, or some related

calculation); response editing (e.g., restrictions to specific values

or - range of values) ; and in some instances, screen layout design. 

In either case, the development of an automated questionnaire usually

has required the skill of a computer programmer.

 

                                  32

 

 

 

Flexibility

 

     There are several issues that need to be considered in the

development or purchase of existing software for automating field data

collection of survey questionnaires.  Among these considerations is

the level of flexibility needed.  Flexibility is defined in terms of

the amount of control the automated questionnaire exercises over the

conduct of the survey and in terms of the features available to design

an automated questionnaire.

 

     With respect to the control, consideration must be given to the

extent the automated questionnaire will allow the interviewer or

respondent to exercise control over the conduct of the interview. 

That is, should the person controlling the interview have the same

control as in a paper-and-pencil conducted survey; total freedom to

roam anywhere in the Questionnaire and change questionnaire answers at

anytime or should the automated questionnaire be designed to limit the

person collecting the data to a specific process and skip patterns or

some level in-between? If so, what is that level? The answer to these

questions is critical because the software selected, particularly if -

it is a specialized package, might not have the specific capabilities

needed to implement the desired design.  The design of the

questionnaire software also will be affected dramatically by the level

of flexibility chosen.

 

     With respect to software flexibility,. there are several

capabilities that should be considered.  These capabilities are:

 

     1.   The question types: open ended, closed ended, single value,

          multiple values.

 

     2.   Case management: administration of each questionnaire, e.g.,

          status of completion, restart .incomplete questionnaire.

 

     3.   Back-up: ability to back-up to any question in the survey

          and change an answer, with the system thereafter

          automatically following the skip patterns implied by the

          changed answer.

 

     4.   Editing: ability to perform edits such as consistency,

          range, and specific value or values.

 

     5.   Screen manipulation: ability to create any screen design

          desired.

 

     6.   Comments: ability for person recording answers to record

          comments associated with any question.

 

                                  33

 

 

 

     7.   Skip patterns: simple and complex, e.g., skip based on

          answers to previous questions or some arithmetic

          calculation.

 

     8.   Context sensitive help: ability to get help based on place

          in survey.

 

     9.   Rostering: ability to handle household member enumeration,

          identification, and skip patterns based on the individuals.

 

     10.  output format: form collected data is stored, e.g., a flat

          file.

 

     ii.  Accessibility of collected data: how easy is it to access

          the data, e.g., quality control.

 

     12.  Coding: ability to code collected data automatically or

          manually.

 

     13.  Authoring system: ability to create questionnaire and

          software to execute the survey questionnaire (program code)

          simultaneously with no computer programming skills.

 

     14.  Output reporting: reports about the functioning of the data

          collection process and about the actual data collected.

 

 

     This list of features is not inclusive, but does contain the most

important features determining the level of flexibility.

 

 

Range

 

     There are several additional factors that are important to the

decision of level of flexibility and software design.  These factors

are the size and complexity of the survey questionnaire and the period

between major changes in the questionnaire or the preparation of an

entirely new questionnaire.  Complexity is defined by the number of

different question types, complexity of skip patterns, and need for

Fostering.  Size and complexity are directly proportional to software

development time.  The shorter the period between major software

developments, the greater the requirement for a user-friendly

authoring system.  An authoring system significantly decreases

development time and decreases computer programmer dependency.  The

size of the questionnaire also may impact the hardware and software

requirements.  Several software packages have certain restrictions

that may be affected by the size of the application.

 

                                  34

 

 

Automated Forms Design

 

     Unlike CAPI and CATI software, there are many off-the-shelf

software packages that can produce automated forms f or computer

assisted data entry.  Many CAPI and CATI specialized software packages

also can be used for this function.

 

Training

 

     The amount and type of training required to use selected survey

questionnaire development software is dependent upon the level of -

user-friendliness of the software.  For example, programming the

questionnaire in Pascal would require considerably more skill and

therefore more training than programming the questionnaire using an

authoring system.  Usually, it is necessary to have a skilled computer

programmer working with the survey questionnaire designer in order to

use the current software.  Under these circumstances the questionnaire

is most likely to be a pencil-and-paper questionnaire programmed for

the computer rather than one designed for the computer.  Computerized

questionnaires will improve in quality as their designers come to

understand and use the environment provided by the computer.

 

     Software documentation for the specific survey questionnaire

should be complete enough to insure easy revision of the questionnaire

by someone other than the original author.  For the general

programming languages there are many software packages available to

help in such documentation The liberal use of comments in the computer

programming code also is a good way of providing additional

documentation.

 

                                  35

 

 

 

IV.C. Data Collection Programs

 

Introduction

 

     When producing a survey, several factors will affect the

selection of a data collection method.  The three primary factors are

cost of resources, the time available to collect, edit, and summarize

the data, and the desired quality.  Because it is unusual to have all

three in abundance, trade-offs must be considered.

 

     Several other important factors relate to the design and

operation of the survey, and will affect the cost timing and quality

factors.  First, the survey may be one-time or ongoing.  A one-time

survey may want to maximize quality for a fixed cost, where an ongoing

survey - may want to maximize quality for a minimized cost.  With

ongoing surveys automated capabilities can evolve over extended

periods thereby spreading out the costs.  The second factor is the

target population, and whether it is a household or an establishment. 

The chance of finding PC's in establishments is greater than in

households, although not all households have telephones.  The third

factor is the operational nature of the survey, that is whether the

setup should be centralized or decentralized, and whether the PC's

would be networked.  Lastly, the sample size and complexity of the

questionnaire is relevant.

 

     The remaining nine factors relate to the characteristics of the

technology used to collect data.

 

     1.   The Speed at which data may be entered is determined by the

          technology's hardware (such as XT, AT, or 386 PC's, disk

          speeds, and phone lines) and software (the complexity of the

          questionnaire and therefore the length of the program).

 

     2.   The Size of the machine can refer to its weight or

          ungainliness (which is important in situations where it must

          be moved around) or its available memory (which limits the

          amount of data and the complexity of the program that can be

          stored on the machine).

 

     3.   The portability of a computer's software is important in

          situations where data collection is carried out on different

          computer systems.

 

     4.   The Type of Display selected may be based on environmental

          factors (where conditions are indoors and usually fixed, or

          outdoors and variable therefore screen color is important),

          and on the complexity of the questionnaire (and therefore

          screen size).

 

                                  36

 

 

 

 

     5 .  The Mode of Data Entry varies from keyboard, to push button

          phone, to voice data entry.

 

     6 .  Data verification is based on the importance of quality, the

          complexity of the data, and other factors as hardware speed

          and available memory.

 

     7.   The Database Generation refers to the way in which the data

          is brought together and integrated with the rest of the

          survey system.  This may mean using telecommunications, or

          simple computer tasks.

 

     8.   The Hardware selected is based on cost, amount of time

          available, data quality desired, power of the machine,

          amount of memory, and other available features.

 

     9.   Training is important in any survey, and the amount of time

          available and the background of the staff dictates the

          technology chosen.

 

 

     The priorities of these factors and the relationships between

them help to decide which data collection strategy to use.  A

discussion of these factors with regards to CATI, CAPI, and other

methods, follows.

 

 

CATI

 

Introduction

 

     In a CATI interview, the interviewer is helped by an interactive

computer system.  It provides data quickly and offers good

reliability, but a substantial cost investment is required to purchase

and set up the system.  The cost investment may be greater than other

electronic data collection techniques, but it saves money over face to

face interviews, since data entry is combined with data collection. 

It also can be used for follow-up of nonrespondents or edit failures,

or key in of mail questionnaires.  It can be used in a household or

establishment survey with complex questionnaires (typically a new or

infrequent survey where time series interruptions will not cause

problems, and where sample size is large, or small and used over a

longer period). It can be operated in a centralized or decentralized

manner, but it requires the respondent to have a telephone.

 

     Hardware: The first generation consisted mostly of mainframe

based systems, but the current generation consists of either multiuser

minicomputer systems, or distributed systems over a PC local area

network (LAN).  The minicomputers are often UNIX-based and

 

                                  37

 

 

 

used mainly in large centralized facilities that require greater

resources to pay for specialized support staff.  The PC's are mostly

DOS-based and are used in multi-location f acuities.  An added benefit

of PC,s (even in large facilities) is that many clusters of networks

can be used, and PC's can be added one at a time (lower initial cost).

 

     Speed: With minicomputers, the speed between questions could slow

as the number of interview stations increases, or if another computer

intensive program is run.  With PC's on a LAN, the speed between

interviews could slow as more stations are added to the network. 

Eventually, faster computers will solve this problem.

 

     size: The organization of the system (centralized or

decentralized) and the hardware (minicomputers or PC's) will affect

size requirements.  The system can range from a single stand-alone PC

to 100 or more workstations on a mainframe system.  The PC's and

minicomputers usually have from 5 to 60 networked workstations.

 

     Portability: The software should run on multiple hardware

platforms with different operating systems.  It should be written in a

portable language and use common user interface standards.  Today,

software costs are increasing while hardware costs are decreasing. 

Portable software should provide a cost savings across different

hardware platforms.

 

     Displays: The use of color can aid the interviewer, but the Color

Graphics Adaptor (CCA) standard is not clear enough for use over a

long time.  Either the non-composite monochrome, the higher resolution

Extended Graphics Adaptor (EGA), or the very high resolution Video

Graphics Array (VGA) standard should be used.  However, EGA and VGA

are more expensive.

 

     Data Entry: screens can be item based, screen based, form based,

or a combination of these.  Movement between items can be forward

only, or forward and backward.  Most systems have question skipping

and branching capabilities, interviewer notes can be added, and the

interviewer can resume at the point where the previous session ended.

 

     Data Verification:  The data quality is improved by incorporating

longitudinal (historical) editing, arithmetic calculations, range, and

consistency checks.

 

     Database Generation: Outputs consist of an audit trail and

response data. Often numeric and open ended data is stored separately,

then linked by respondent number.  Some systems include cross-

tabulation capabilities, and the ability to generate accurate and

timely reports is a benefit.

 

                                  38

 

 

 

     Training: one benefit is that centralized supervision and

monitoring is available (on-line and audio-visual).  It helps the

supervisor identify interviewers who need more training.

 

 

CAPI

 

Introduction

 

     In CAPI, the equipment is less expensive than CATI, but travel

costs are higher.  It requires the same amount of time as personal

interviews, but data quality is improved and the separate data entry

step is deleted.  One advantage of the personal interview setting is

that it causes higher response rates.

 

     Hardware: The following criteria can be used to evaluate

potential portable computers: interview duration and complexity,

memory capacity, weight, power source,and duration, screen size and

legibility, disk type and capacity, speed, serviceability (important

because service centers might not be locally available), portability,

durability, price, ease of use and software compatibility.

 

     Speed: The speed depends on the computer hardware and complexity

of the questionnaire.

 

     Size: A larger portable computer would be needed to put a complex

questionnaire in- 2 languages.  Even a small portable computer is not

necessarily portable as many have complained that they are too heavy

to carry around for very long.  Electrical outlets are not always

available.  The battery power required for additional memory and for

disk drives can add substantially to the weight requirements. 

Although small portable computers can be used on a table top or in

one's lap, interviews conducted on the doorstep require handheld

computers.  That technology is coming but has yet to arrive for

general use.  A smaller portable computer, or one with a different

keyboard would be needed for this environment.

 

     Portability: As in CATI, the questionnaire writing software is

often portable from one type of hardware to another.

 

     Displays: Different portable computers have different size

screens with various readability factors.  The various lighting

conditions that would be met in the field is also a factor.  For

example, a "back light" screen is required for dim lighting

conditions. If the interviews are conducted outdoors, glare reflection

is a problem.

 

     Data Entry: often the software that was designed for CATI is also

used for CAPI.  It provides forward and backward movement, and

incorporates skipping and branching between questions.

 

                                  39

 

 

 

     Data Verification: Similar to CATI, improved data quality results

from reduced clerical and machine activities, and being able to

incorporate various editing techniques.

 

     Database Generation: Data output can be consolidated more rapidly

due to reduced clerical and machine activities. Data transmission

options are mail, courier, or phone lines.  Data security and the

quality of phone lines may be a factor against using phone lines.

 

     Training: Basic interview skills are considered very important

(even more so than computer knowledge) . With this assumption,

training should on the computer and questionnaire details.  Training

materials can include a tutorial (helps coordinate the different

learning rates), self study materials, and hands on practice with

interviews.  Good software and manuals are also important.

 

 

CASI

 

     Data collection using TDE requires the respondent to have a

touchtone telephone, and a dedicated computer with a multiple phone

line capability at the other end. one benefit to the respondent is the

convenience to call in at any time.

 

     Existing TDE systems limit editing primarily because of limits on

hardware capacity, lack of visual clues and restriction to push

buttons on the telephone.  However, the computer can synthesize the

answer and play it back to the respondent thereby providing the

opportunity to verify or correct the answer.  TDE offers lower cost

than CATI (less labor and mail costs with key-entry costs born by the

respondent), and the data quality is good.  TDE has been able to

retain very high response rates over long periods when coupled with

appropriate nonresponse prompting.

 

     VRE again requires only a telephone and carries a cost profile

similar to TDE.

 

     Surveys which use PDE require the respondent to have access to a

microcomputer.  Data can be entered using the keyboard or a file

containing the data can be imported.  Displays are typically an

electronic image of the form on the screen.  Error checking and other

edits can be included, after which the data is transmitted back to the

required agency where it is combined with other data.  Computer -

security issues are important here.  Integrity checks to make sure the

data received is the same as the data sent must be part of the system. 

Appropriate manuals and other training materials including on-line

help should be provided.  This type of data collection would be

worthwhile in an establishment survey where respondents report data

monthly, quarterly, or over a given period.

 

                                  40

 

 

 

IV.D. System Interfaces for Data Conversion

 

Introduction

 

     Automated submission of data has the benefit of reducing

reporting errors because a keying step can be eliminated.

Traditionally, respondents entered data onto paper forms which were

mailed to central site where they were keyed into a computer system. 

With automated data submissions, intermediate keying steps can be

eliminated.

 

     Automated data transmission requires hardware and software

compatibility between the respondent site and the Federal site.  In

recent years the number and types of software and hardware options

have greatly multiplied into the current myriad of products and

technologies on the market.  Due to these developments, Federal

agencies are often looking at heterogeneous sources f or data

transmission.

 

     Federal agencies conduct many surveys with many types of

respondents.  These data sources, such as state and local governments

and businesses, will increasingly have capabilities for reporting data

in an automated way.  Many now have personal computers (PC's) while

others have only mainframes available.  Complexity arises as Federal

agencies, looking at a mix of hardware and software technologies

available at respondent sites, must select the best way to collect

data from these heterogeneous sources.

 

 

Planning for system Interfaces

 

     Managers of data collection projects can expect interface

problems, but these problems can be minimized by good planning. 

Knowledge about the availability of communications capability,

hardware, and software at respondent sites will aid managers in their

planning for system interfaces for data collection.

 

 

Communications Capability

 

     Perhaps the most important issue for system interfaces is

communications.  Communications may be thought of as networking or as

linking technologies together.  With networking capability, data can

be transmitted across telephone lines or special private line

arrangements such as local area networks (LAN's).  See the section on

Networks Planning in this report for a discussion of networking

issues.  A related issue is maintaining the confidentiality of data

transmitted in such a manner.  See the section on Computer Security in

this report.

 

                                  41

 

 

 

Hardware

 

     Hardware is needed at both the respondent site and the Federal

site for data transfer.  The type of hardware available at the

respondent site will often decide what options the Federal survey

managers will offer for submitting data.  It may be necessary for the

Federal site to have hardware for data conversion available, for

example, hardware to read both 5 1/4 inch and 3 1/2 inch diskettes. 

Also, communications may need to be set up between hardware devices. 

The section on Hardware Planning in this report discusses these issues

further.  Three common types of hardware links are discussed below.

 

     Mainframe to Mainframe: Data can be transmitted from one

     mainframe to another via a communications network.  Either the

     respondent or the Federal site can specify record layout and

     formatting instructions for data submission.  Front-end

     processors can do data conversion before the data are sent to the

     host computer.  Another option is submission of a computer tape

     in a specified format.

 

     PC to PC: A link between two PC's can be established using a

     network system.  Another way to transmit data from one PC to

     another is to mail the data on diskette.  The record layout and

     diskette format would be agreed upon by the respondent and the

     Federal site.  Because diskette sizes vary, the Federal site may

     need conversion hardware and software to read diskettes of

     different sites.  Another option is to provide software on a

     diskette to the respondents.

 

     Mainframe to PC: This type of hardware link combines the options

     described above.  Again, a link can be established using a

     communications network.  If the PC is at the respondent site, a

     diskette with software may be provided to set up the PC to send

     data over to the mainframe in the-appropriate format.

 

Software Compatibility

 

     Although Federal survey managers usually cannot provide hardware

to respondent sites to use for data transmission, they often can

provide software for this purpose.  If the respondent's software is

used, the Federal site must have the same software or be able to

convert the data to the correct format.  Not only can different

software products be incompatible, but two versions of the same

software product can be incompatible.  One version may have a higher

level of functionality than the other.  Again, there must be planning

for document transfer.  See the section on

 

                                  42

 

 

 

Software Development in this report for more guidance on planning for

software compatibility.

 

                                  43

 

 

 

IV.E. Computer Security

 

Introduction

 

     Computer security refers to the continued operation of computer

applications at acceptable levels of risk to the organizations) being

supported by the applications.  Risk is usually measured in terms of

potential loss, specifically losses that occur from:

 

     1.   Disclosure of information to unauthorized parties (i.e.,

          loss of confidentiality),

 

     2.   Modification or other adverse actions that affect the

          expected quality of information (i.e., loss of integrity),

          and

 

     3.   Destruction or other adverse events that affect either the

          availability of the information when it is needed or the

          availability of the computer system to process that

          information (i.e., denial of service/loss of availability).

 

The types of losses described above can result from accidental and

intentional events, as well as from natural hazards.

 

     When estimating risk, it is important to consider direct losses

(e.g., the cost to replace modified or destroyed information), as well

as indirect losses (e.g., the inability of the organization to meet

its mission which can lead to public embarrassment, congressional

wrath, loss of lives, legal actions, competitive disadvantage, etc.).

After estimates of risk are derived, it is necessary to select and

implement cost-effective safeguards (e.g., physical, administrative,

technical, management) to reduce these risks to acceptable levels.

 

     With respect to automated statistical surveys, the types of

losses discussed above can occur during data entry from the

respondent, during transmission of the survey information to the host

computer system, and within the host system.  While the ideas

discussed below are generally applicable to all of the survey types

addressed in Section III of this report, this section will focus on

surveys collected through or with the use of a computer where the

following occurs:

 

     1.   Data entry using a terminal or computer system to collect

          the response information (i.e., not directly applicable to

          response information collected over the telephone).  The

          data entry Process may "batch" the respondent's information

          for later transmission to the host computer for processing

          or may have the respondent connected

 

                                  44

 

 

 

          directly to the host system where the survey data is being

          captured in real-time (and may be processed in real-time).

 

     2.   Transmission of the response information over

          telecommunications lines/circuits, including future ISDN

          networks discussed above, and transmission on magnetic media

          (e.g., floppy disk) through public and private mail delivery

          services, and

 

     3.   Receipt and processing of the survey information by a host

          computer system.

 

Problem Areas

 

Data Entry

 

     During the data entry process, the following issues need to be

addressed with respect to computer security.

 

     Identification and Authentication:  Respondents and other users

of computer systems that are used to collect survey information must

be positively identified and authenticated to assure the validity of

the survey and to hold users accountable for their accidental or

intentional actions.  While passwords are still the most widely used

method of authenticating the users claim of identity, other methods

such as biometrics and smartcards can be used when increased

protection is desired--usually at increased cost.  Passwords can be

effective for authentication when used in accordance with FIPS 112,

Password Usage Standard.

 

     Access Control: Access to information on computer systems should

be strictly controlled so that users only have access to information

they are authorized to see or change.  Most commercial computer

systems provide mechanisms that support this function.  Systems that

appear on the National Computer Security Center's Evaluated Products,

List contain operating system level access controls that provide

protection from unauthorized disclosure of information.  Access

controls are important on multi-user systems that are used to collect

survey data in order to prevent the survey data from being

intentionally or accidentally read, modified or destroyed.

 

     Accountability: Unless computer systems contain mechanisms for

recording and analyzing users, computer security relevant actions, it

will not be possible to hold users accountable for actions that cause

computer-related losses.  When users know that a computer system has

an effective audit trail collection and processing mechanism, they are

less likely to make mistakes or to attempt unauthorized access to

information for fear of being caught. When survey data is collected on

systems that provide

 

                                  45

 

 

 

accountability mechanisms, it will be easier to determine if the

survey data have been tampered with or have been disclosed to

unauthorized users.

 

     Confidentiality: Besides access controls discussed above for

preventing survey data from being disclosed to unauthorized

individuals, cryptography can be used to protect data while it is

being stored in a computer system or on other magnetic media such as

floppy disk or magnetic tape.  FIPS 46, Data Encryption Standard

(DES), defines the only government-wide standard for encrypting and

decrypting unclassified computer data.  Since the DES has also been

widely accepted by the commercial sector, there are many off-the-

shelf-products that can be purchased for implementing DES

cryptographic protection.

 

     Integrity: During data entry, the integrity of survey data can be

affected by entering false/inaccurate data or by modifying data

already entered.  Approaches for addressing these issues include;

 

     1.   Editing through the use of error detecting- or correcting

          software that determines reasonableness of input data with

          respect to any number of criteria such as character

          composition of data input, numerical bounds checks, data

          dependent checks on previously entered data, etc.

 

     2.   Access control (see above) that prevents unauthorized users

          from gaining access to the survey data

 

     3.   Cryptographic check sum as defined in FIPS 113, Data

          Authentication Standard that places a cryptographic "seal"

          on the survey data for the purpose of detecting modification

          of the survey data from some initial state.  This technique

          is useful when the survey data is stored-in computer memory

          or on magnetic media such as floppy disk or magnetic tape.

 

     4.   Accountability is the primary method for detecting

          modification to survey data by individuals who ARE

          AUTHORIZED (i.e., access controls do not apply) to access

          the data.  While effective against both accidental and

          intentional modification, authorized users that

          intentionally modify data can subvert accountability

          controls if they have a high degree of technical knowledge

          about the computer system.

 

     5.   Software-engineering assurance techniques should be used in

          developing the data entry and other system

 

                                  46

 

 

 

          software to preclude errors from being introduced into the

          survey data through faulty software.

 

     Restart/Backup/Recovery:  It is necessary to plan for

restart/backup/recovery activities whenever the data entry process is

interrupted or the survey data is destroyed.  Techniques such as

maintaining backup files, permitting restart points in the data entry

process, and planning for an alternative data entry processing

capability are all directed at maintaining continuity in the data

entry process.

 

Transmission

 

     During transmission, the respondent's survey data are sent from

the data survey system to the host system that will process the survey

data.  While authentication applies primarily to transmission of

survey data through telecommunications networks, confidentiality and

integrity techniques are applicable to telecommunications networks and

mail delivery of magnetic media.

 

     Authentication of host computers (e.g., the host computer of the

data entry system) to the transmission network is required by and

provided for most telecommunications networks to prevent unauthorized

use of the network and to facilitate billing for network services. 

Sometimes, depending,on the sensitivity of the survey data, it might

be necessary to have the transmission network authenticate itself to

the data entry host system before sending such data over the network. 

In this way, the data entry system can be sure that the survey

information is being sent over the actual network rather than being

given to an intruder that is spoofing the data entry system into

giving the intruder the survey data.  If the network lacks capability

for authenticating itself, then techniques used for confidentiality

and integrity described below may be considered as alternative methods

of protection.

 

     Confidentiality: The most common technique for preventing

disclosure of information within transmission networks is to use

cryptography.  As discussed above, the DES is the only government-wide

standard for encrypting and decrypting unclassified computer data.

 

     Integrity: integrity with regard to transmission of survey data

is the assurance that the survey data has not been altered, either

accidentally or intentionally, during the transmission process. 

Cryptographic checksum techniques, as described above in the section

on Data Entry Integrity, are effective in providing this protection.

 

     Availability/Reliability of Network Services: Sometimes,

particularly in real-time data collection and transmission, continuity

of the transmission service can be very important to the

 

                                  47

 

 

 

success of the survey activity.  Discontinuities due to the

unavailability of the network or some of its intermediate nodes or due

to noise in the transmission lines can result in survey data being

lost, erroneous, or delayed.  This could be particularly annoying to

a-respondent that has to keep repeating the survey data entry process

or is unnecessarily prompted for nonresponse. it is possible to

minimize such problems by using networks that provide error

detecting/correcting procedures, dynamic routing around unavailable

nodes, and other services that assure network availability and

reliability.

 

Host Computer System

 

     Computer security concerns at the host computer are similar to

those at the data entry computer.  The reader should refer back to

these discussions to supplement the material contained in the

corresponding areas below.

 

     Identification and Authentication: All users of the host system,

including the respondent data entry system, should be required to

identify and authenticate themselves to the host system to assure the

validity of the survey and to hold users accountable for their

accidental or intentional actions.  The same authentication techniques

that were discussed for the data entry system apply to the host

system.

 

     Access Control: Access to information on the host systems should

be strictly controlled so that users only have access to information

they are authorized to see or change; in particular only authorized

users should be permitted to access survey data on the host system.

 

     Accountability: The host computer system should contain

mechanisms for recording and analyzing users, computer security

relevant actions in order to hold users accountable for actions that

cause computer-related losses, particularly losses to the survey data.

 

     Confidentiality: Besides access controls discussed above for

preventing survey data from being read by unauthorized individuals,

cryptography can be used to protect data while it is being stored in

the host system or on other magnetic media such as a floppy disk or

magnetic tape.  As with the data entry system, the DES should be used

for this purpose.

 

     Integrity: on the host computer, the integrity of survey data can

be affected-by entering false/inaccurate data during the data

 

                                  48

 

 

 

entry process or by modifying data already entered.  Approaches f or

addressing these issues include;

 

     -    editing

 

     -    access control

 

     -    Cryptographic check sum

 

     -    accountability

 

     -    software engineering/assurance techniques

 

     Restart/Backup/Recovery: This is necessary when the host computer

system's processing is interrupted or the survey data is destroyed. 

Techniques such as maintaining backup files, permitting restart points

in the host's processing sequence, and planning for an alternative

host processing capability are all directed at maintaining continuity

in the host' s processing of the survey data.

 

                                  49

 

 

 

IV.  P. Hardware Planning

 

Introduction

 

     Hardware issues are related to the type of Computer Assisted

Statistical Survey system and the particular software to be used.  The

adage that says to "choose the software first and then the hardware"

may be accurate if the software is already available. if software

needs to be developed, however, it may be better to settle the main

hardware issues first.

 

     Hardware issues may be divided into the types of hardware needed

and the criteria used for selecting products.  We will explore these

issues for current and forthcoming products.

 

 

Current Hardware - General Issues

 

     There are certain hardware issues that arise no matter what. the

application.  They may be categorized into ergonomic, performance,

capacity, and cost issues.  Ergonomic issues include keyboard layout

and touch (a tactile response reduces input errors), screen visibility

and readability, and adjustability of the computer.

 

     Performance and capacity can usually be improved only at higher

cost.  However, if the hardware is optimally designed for the

application in mind, no higher-cost may be incurred.  For example,

performance can be further divided into CPU and I/O speed.  It may

suffice to maximize only CPU or I/O speed.  Software techniques also

may be employed to improve performance: use a RAM disk for files that

are frequently accessed, delay I/O operations until they can be more

conveniently done, and use machine language routines for CPU intensive

operations.

 

     Core memory requirements are driven by software needs.  The main

question is whether the DOS RAM address space of 1 megabyte is

sufficient or not. If it isn't, various options are available.  By

swapping pages of memory in and out as needed, the address space can

be expanded.  Note that extra memory is not usable without a software

driver.

 

Respondent Data Entry

 

     If respondents will be using their computers, try to find out as

much as possible about the machines they have.  Respondents may not

have access to a personal computer (PC) even in a large company.  For

example, an accounting department may have a mainframe, but not a PC. 

IBM-compatible computers are the most common in the business world,

but they may be earlier models.  Software that respondents will be

using should be tested on minimal

 

                                  50

 

 

hardware configurations.  Don't assume that respondents have extended

or expanded, memory.  A hard disk probably can be assumed.

 

     The 5 1/4" diskettes are now the most common but the new 3 1/2"

diskettes are coming into use.  Capability for reading either type

would be helpful.  There is a compatibility problem between 5 1/4"

high density (1.2 megabyte) disk drives and lower density drives.  The

latter cannot always read disks formatted by the former even at lower

densities.  Also, writing high density data on a lower density disk

can corrupt the contents.

 

 

CATI

 

     Computers for CATI must support interactive processing, e.g., a

multi-user mini-computer or a PC network.  Speed is the most important

factor.  The time from entry of one item to display of the next should

be less than two seconds.  To minimize data transfer problems, the

system used for data entry should be the same or compatible with the

one used for subsequent processing.

 

CAPI

 

     The main criteria for CAPI computers are screen readability,

speed, and weight.  Many portable computers are too heavy and awkward

to carry around.  A truly portable computer is necessary.  While the

lightest portable computers now weigh 4 to 7 pounds, the screens on

these machines may not 'be good enough.  Full-sized screens with good

visibility require extra battery power that implies a total weight of

about 10 lbs.

 

     Screen visibility and readability have come a long way.  Many

types of screens are available: cathode-ray tube (CRT), liquid crystal

display (LCD) backlit supertwist LCD, gas plasma, DC plasma, and

electroluminescent display.  Quality varies so much from vendor to

vendor and within each type that it is difficult to make

generalizations.  Factors to judge include screen contrast,

resolution, blur when scrolling, size, adjustability, and power

consumption.  The screen should be tested in environments that

approximate actual interview conditions such as dim lighting.

 

     Good performance is now available, but the cost can be high.  The

3 1/2" diskettes are used on portable computers; their smaller size

and harder cover make them preferable.  The carrying case should

protect the computer if it is dropped or banged. it also should have a

government emblem or insignia to identify the interviewer.

 

     The battery charge on a portable computer may last up to four

hours, but some models portable battery packs that can be inserted as

needed.  Respondents might allow the use of their AC outlets.

 

                                  51

 

 

 

A low battery indicator is helpful; nickel-cadmium batteries should

not be recharged before power runs out.  A car battery adapter is

useful on the road.

 

CASI

 

     Touchtone data entry (TDE) and voice recognition entry (VRE)

require special hardware cards and sufficiently powerful computers. 

The current BLS TDE configuration uses a 286 PC with 640k RAM.  One PC

can support many phone lines.  BLS estimates that for a survey with

1.5 minute calls received during a 2 week collection period, one phone

line is needed for every 500 respondents so that during peak

collection periods respondents will get a busy signal less than 5% of

the time.

 

     Facsimile (FAX) transmission requires a hardware card or a

separate FAX machine.  There are machines that combine Fax, image

scanning, laser printing, and photocopying.

 

     Telecommunication usually means analog transmission over phone

lines.  Digital computers must have a way of sending and receiving

analog signals; the device that handles this is called a modem.  The

main distinction between different modems is the speed of

transmission.  Bits per second (also erroneously called baud) rates of

1200 and 2400 are the most common while 300 and 9600 Are also used. 

As a rule of thumb, about one byte is transmitted per 10 bits because

of parity and stop bits.  Therefore, sending and receiving a large

data set can take a long time.  Software should have error checking

capabilities.

 

Future Hardware

 

     Besides general technology trends (smaller, faster, less

expensive, more capable machines), a few specific observations can be

made.  International standards are taking on a new importance. 

Standards committees are no longer just reacting to de facto market

standards but are taking the lead before products are developed. 

Compatibility and interconnectivity with other products are often as

important as the capabilities of a product itself.

 

     The future for portable computers is bright.  Color screens and

more memory and disk space are going into smaller and lighter

machines.  Handheld PC's are starting to appear. computers the size of

today's miniature calculators are not far off.  Cellular telephones

will be combined with portable computers.  Peripherals such as

printers' are becoming more portable.

 

     Electronic Data Interchange (EDI) is changing business practices

by automating orders, invoices, etc.  As this becomes more widespread,

surveys could be designed to "piggyback" onto EDI

 

                                  52

 

 

 

to take advantage of the systems already in place.  Wide area computer

networks with electronic mail are becoming more like public utilities.

 

     Developments in digital- telecommunications (e.g., the Integrated

Digital Services Network, or ISDN) will have many hardware

implications -- see Network Planning.  Modems will no longer be

necessary because the entire path from computer to computer will be

digital.  Data transfer rates will be much faster.

 

     Optical and optical-electronic technologies are dramatically

increasing data storage capacities.  High definition television (HDTV)

and digital video interactive (DVI) will intensify graphic

applications.  Improved optical character recognition (OCR) will help

the transition from paper to completely electronic representation.

 

                                  53

 

 

 

IV.G Networks

 

Introduction

 

     The computer revolution has come upon us in a series of waves:

the first computers transformed the speed of computation by several

orders of magnitude; improved technology provided computer access to

large organizations; personal computers provided computers to

everyone; and the relative recent introduction of computer networks

created the information community which has brought information to

everyone.

 

     Networks have made possible the development of information

utilities that serve the entire spectrum of the human community,

providing services from computer games to newspapers for anyone owning

a personal computer.  The pervasiveness of these information services

enables survey information to be collected locally and transmitted

directly to a center processing utility.

 

     The Arpanet developed by the Department of Defense was the first

widespread network to join researchers, system developers, and

administrators into an information community.  Although electronic

mail or E-mail was the immediate gain from this network, the ability

to transfer files of data, to access remote databases, and to use the

computing services of a geographically remote computer showed the real

value of a network.

 

     Access to computer networks by the public has increased

dramatically as the network cost for an individual has dropped to the

cost of a local phone call.  Some commercial services cost less than a

monthly phone bill for unlimited access.  A new network technology is

about to transform our ability to use the distributed processing

systems available on a network by dramatically increasing the amount

of data that can pass over these networks.

 

Data Collection

 

     Networks will have a profound effect on data collection.  They

will provide the opportunity for close contact between the interviewer

and the respondent.  For example, CATI provides limited voice

interaction over a telephone.  Networks will provide visual and audio

interaction with television or computer screens.  They will enable the

interviewer to display previously collected data to the respondent,

and to use graphical diagrams and pictures to convey the conceptual

background to questions.  Moreover, it will provide the opportunity

for more frequent updates to survey information that will match the

data requirements rather than the economical constraints.

 

     High-speed networks will put interviewers in closer contact with

experts who can resolve troublesome issues while a survey is

 

                                  54

 

 

 

being conducted.  For example, CAPI interviewers do not have immediate

access to their supervisors.  With high bandwidth networks, the

CAPI interviewer can contact a supervisor in much the same way as a

CATI interviewer.

 

     The net result should be greater interaction and reduced costs as

the network bandwidth increases by an order of magnitude over the next

decade.

 

Background

 

     There has been a separate, and independent, evolution of networks

in this century for the transport of voice and data.  The classical

voice network was based on the telephone handset that converts speech

into electrical signals which are transported over the local loop via

a twisted pair of copper wires to a telephone system end-office. 

Traditionally, the signalling involved has been analog (the

transported signal varies continuously in time) and the communication

link established between two telephone handsets has been termed an

analog voice transmission circuit.

 

     The human ear is an extremely good filter, and has permitted

analog voice circuits to be established in which the analog voice

signal was noisy.  A good ear and contextual information made it

possible to understand the communication.  As the separation between

two handsets engaged in an analog voice communication link increased,

the electrical signals required amplification for continued

distribution.  Such amplifiers are often called repeaters and they had

the unfortunate characteristic of amplifying both the noise and the

electrical voice signal being transmitted.  Consequently, it was very

difficult to remove. specific noise components from the analog voice

signal.

 

     The analog telephone handset is connected to a local exchange or

end-office.  This is nothing more than a local switch that is in turn

connected to a trunk exchange.  This trunk exchange, in North America,

is a five level hierarchical arrangement of switches for routing

telephone calls.  It forms a circuit switched network that is

connected to an international access exchange and provides the

capability of global voice telephone communications. The network

described here was still made up of twisted-wire copper pairs in the

local loop and electromechanical switches that performed routing of

the voice calls until 1966.

 

     With the appearance of very large scale integrated (VLSI)

technology -- the computer on a, chip, the network switches evolved

into electronic switching systems.  The intelligence in the switches

allowed the established transmission fabric to be rendered more cost-

effective by simplifying maintenance through higher reliability

features and better strategically planned network maintenance.  As the

employment of more sophisticated electronics

 

                                  55

 

 

 

was accelerated in the switching matrix, the conversion of analog

voice signals to purely digital signals led to the appearance of

dedicated digital networks.  While the handset in most installations

remains analog, the local switch to which the handset is attached

performs an analog to digital conversion of the initial voice signal. 

From there the signal is entirely digital.  The digital transmission

networks that are dedicated to voice and data are called Integrated

Digital Networks (IDNs).  Standards have now emerged internationally

using the guidelines of Consulting Committee for International

Telephony and Telegraphy (CCITT) - The digital transmission systems

are rapidly evolving toward IDNs that are interoperable and make use

of the intelligence associated with each network switch that is

digital because each such switch may be regarded as a generalized

computer.

 

     In making distinctions between data applications and voice

applications using a modern IDN, networks used for data applications

can be characterized according to the activities of the terminals on

the network:

 

     1.   Start-stop terminals are used to generate interactive data

          traffic to and from the computer.  This traffic-tends to be

          low speed with occasional bursts as the computer responds to

          an interactive request for a specific file to be

          transferred.

 

     2.   Batch data transfers and data display image transfers that

          occur as bursts of data that can be placed on the network.

 

     3.   Continuous data traffic that is typically carried by circuit

          switched IDNs at data rates from 2.4 to 64 Kbits/sec

          (thousands of bits per second) . The data traffic within the

          network is often combined from separate low data rate bit

          streams, and interleaved into a single 64 Kbits/sec data

          channel for transmission across the network.  A packet

          switched network decomposes a digital message into smaller

          chunks of bits (typically 1008 bits or 2000 bits) and routes

          these chunks, called packets, through the network from a

          source to a destination on an end-to-end basis.

 

Current Network Systems

 

     The modern communications environment may be regarded as made up

of three basic functional blocks:

 

     1.   User terminals that support a human interface with the

          network.  They allow a human to interact with

 

                                  56

 

 

 

 

 

          another user terminal or a computer connected via the

          network.

 

     2.   A communications network that is transparent to the user and

          provides conventional information transfer capabilities.

 

     3.   Information service centers that provide computing functions

          at the center.

 

     Network systems breakdown conceptually into Local Area Networks

(LAN) and Wide Area Networks (WAN).  The IEEE definition for a LAN is

a "data communication system that allows a number of independent

devices to communicate with each other." A WAN is one that covers a

much larger area (e.g., nationwide or worldwide) , and has one or more

computer nodes that are central to the operation of the network. 

These specialized computer nodes support the routing -- storing and

forwarding -- of packets of information.

 

     The simplicity of Local Area Networks makes them useful for

specialized applications within a small organization.  They can

continue to operate with some of their devices broken or down because

any one unit does not affect the operational status of others. 

Moreover, LANs promote and extend a cooperative work environment for

both people and machines.

 

When discussing LANS, an understanding of the following terms is

important:

 

     Centralized    main or host computer that does all

                    data processing;

 

     Distributed    some remote computers do their own

                    processing;

 

     Gateway        hardware and software for two

                    technologically different networks

                    to communicate with each other;

 

     Bridge         linking two technically similar

                    networks to one another;

 

     Servers        network peripherals that support

                    specialized use by the entire

                    network community, e.g., file

                    storage servers and printers.

 

     These elements make up a multilayered communications facility

that represents a multitude of telecommunications networks that must

interoperate on both a national and a global scale.  Because

telecommunications have developed in different ways in various foreign

countries, there has been a continuing pressure for standards and for

the cooperation of all countries in the efforts of the (CCITT).  This

overall international telecommunications

 

                                  57

 

 

 

environment supports a communications arrangement that may be

logically segmented into:

 

     1.   A public communications network layer.  The public network

          (at least, in the United States) is required to provide

          uniform service of good quality and on an equal access

          basis.  It must permit uniform management of the network

          across the nation, and it must exhibit acceptable

          reliability characteristics to the public user.  The

          regional Bell operating companies provide local public

          telephone service.

 

     2.   A business communications network layer.  In this category,

          the communications structure is privately owned and

          operated.  There are a multiplicity of these proprietary

          networks developed by private companies to reduce the

          communications costs to corporations. Tymnet, AT&T; and the

          regional holding companies assist corporations in building

          such private structures.  Private networking will most

          likely increase in the future, but it may be implemented as

          virtual private circuits using the intelligent digital

          networks (IDNS) of the 1990's.

 

     3.   A business distribution network layer.  This type of network

          transmits from one site and is received by many sites. 

          Cable TV and the broadcasting of commercial television shows

          are examples.

 

     The evolution of the IDNS must support the following operational  

characteristics of the local and national telecommunications system:

 

     1.   The current arrangement of public telephone networks and

          packet switched networks does not support the simultaneous

          operation of voice and data services.  The simultaneous

          transmission of speech, data, telemetry and signalling will

          be natural in future IDN networks.

 

     2.   The message content must be transparent to the various-

          services employed by the network.

 

     3.   The embedded base of existing network equipment must be

          accessible- by the evolving IDN.  Such things as classical

          two-wire telephony must be supported.

 

     4.   The security and privacy of information must be available

          for all users of the network.

 

                                  58

 

 

 

     5.   The appropriate levels of network management for handling

          accounting, performance, configuration control, reliability

          and security of information must be available on the

          network.

 

Planned Systems

 

     The ultimate evolution of the current intelligent digital network

is the Integrated Services Digital Network (ISDN) which has been

emerging in the industrialized nations for the last ten years.  It is

a technology that ultimately will place end-to-end digital signalling

capability throughout the network.  It has been slowed because of two

major factors:

 

     1.   The lack of standardization between vendors of transmission

          equipment within the United States and Canada, as well as

          widely divergent option selections that are specified by

          CCITT in terms of its so-called ISDN standard reference

          model.  This latter situation has resulted in the inability

          of the Postal, Telephone and Telegraph Agencies of various

          nations including the United States to establish ISDN

          environments that could exchange information.  The ability

          to exchange information is called interoperability.  Two

          ISDN networks that can exchange information transparent to

          two end users, one on each network, are said to be able to

          interwork.

 

     2.   The enormous established base of analog switching equipment. 

          This base is measured in the 10's of billions of American

          dollars and represents an investment by service providers

          such as AT&T; and end user organizations that cannot simply

          be replaced in a short period.

 

     The United States government through the Brooks Act of 1987 has

mandated that all agencies of the government must move to a common

communication backbone that is to be an ISDN environment as soon as

acceptable standards can be put in place.  The National Institute of

Standards and Technology (NIST) has been actively pursuing the

realization of standards since February, 1988.  The General Services

Administration (GSA) with the awarding of the FTS2000 contract to AT&T;

and Sprint is now working to develop an ISDN migration-plan that will

be acceptable to all government agencies.  This plan may have to

proceed on an agency-by-agency basis because different agencies will

have unique problems in their telecommunications environment.  The

result is to be an intelligent network that will offer many services

using digital signalling, and that will provide individual users with.

an extremely friendly

 

                                  59

 

 

 

interface with their ISDN workstations (i.e., handsets, PCI's,

integrated voice, data, and video consoles).

 

     To the user, the ISDN environment appears as a highly intelligent

network in which, aside from the network access points, no clear

distinction can be made as to where their personal computer or

mainframe ends and the network begins - in a sense, the computer

becomes a part of the network and the network appears as a

geographically dispersed computing environment.  In essence, the

intelligence that resides in the individual switching machines is made

available to the users of the network as a menu of services which can

enhance the capability of the user to do a variety of functions.  In

an attempt to capture the needs of the user, NIST and the industrial

telecommunications community created the North American ISDN User

Forum in the Spring of 1988.  This forum has been generating user

applications for ISDN.  As of June 1989, 81 applications had been

cataloged.

 

     Because of the high-level of intelligence invested in the ISDN

environment, such concerns as user authentication at both the sending

and receiving ends, end-to-end integrity of a message, and security of

the information sent, can be dealt with by the network in a manner

transparent to the user.

 

     It must be recognized that the ISDN environment is a multimedia

services facility that allows end-to-end transport of voice, data or

slow-scan video.  Facsimile (FAX) transmission is also part of this

media mix.  The current ISDN implementations in North America can

support a maximum bit rate per channel of 64 Kbits/sec.  This is

called narrowband-ISDN.  A separate standardization process is also

taking place in North America and around the world.  It is called

broadband-ISDN with anticipated bit rates more than of 600 Mbits/sec,

an increase by a factor of roughly 10,000 over narrowband-ISDN.  This

network will provide services with an ultimate impact on the business

and commercial customers of North America that will be larger than all

the capabilities now associated with narrowband-ISDN.  The usage of

broadband-ISDN, in conjunction with rewiring the North American

continent with fiberoptic circuits, will revolutionize information

processing.

 

     With the emergence of a single, seamless ISDN communications

fabric the proliferation of private networks should be greatly reduced

in both private industry and the Federal Government.  This should

substantially reduce the costs, of network operations, administration

and maintenance. in particular, one governmental agency has estimated

an annual cost savings of $7 million in lust moving to an ISDN

environment in terms of the reduction of network management charges. 

These savings do not address the potential increases in productivity

through the acquisition of the new user services provided by an ISDN

facility.  The NIU-Forum is considering the cost-benefit concerns of

organizations they move to a fully-ISDN equipped telecommunications

environment.  This work

                                  60

 

 

 

helps the unsophisticated user to use the intelligent network to carry

out well-defined functions such as efficient data collection.

 

     A further aspect of an ISDN environment is that the network could

act as a highly intelligent protocol converter.  In a sense, it could

function as a concurrent multiple gateway between many different types

of data networks.  Uploading and downloading of data would be taken

care of automatically and in a manner transparent to the users. 

Verification of the data sent on an end-to-end basis also would be

done automatically by the network.  In a multi-media environment media

conversions (voice-to-datal data-to-image, image-to-data, data-to-

voice, and image-to-voice) also could be done by the ISDN facility. 

The key here is the high intelligence of network, and the transparency

of the ISDN operations to its attached user community.

 

                                  61

 

 

 

 

V.   REFERENCES

 

A. CATI

 

Curry, Joseph; "computer Assisted Telephone Interviewing: Technology

and Organization management"; Sawtooth Software; June 17, 1987.

 

Groves, Robert M.; editor et al; Telephone Survey Methodology; John

Wiley & Sons; 1988.

 

Nicholls, William L.; "The Impact of High Technology on Data

Collection"; CATI Research Report No. GEN-1; Bureau of the Census;

February 24, 1989.

 

Werking, George; Tupek, Alan; and Clayton, Richard; "CATI and

Touchtone Self-Response Applications for Establishment Surveys";

Journal of official Statistics; Vol 4; No. 4; 1988; pp 349-362.

 

B. CAPI

 

Danielsson, L.; and Maarstad, P.A.; "Statistical Data Collection with

Handheld Computers - A Test in Computer Price Index"; Unpublished

report of Statistics Sweden; Orebro, Sweden; 1982.

 

National Center for Health Statistics; "Report of the 1987 Automated

National Health Interview Survey Feasibility study - An Investigation

of CAPI": November, 1988.

 

National Center for Health Statistics and Bureau of Census; "Report of

the 1987 Automated National Health Interview Survey Feasibility Study,

An Investigation of Computer Assisted Personal Interviewing"; U.S.

Department of Health and Human Services; National Center for Health

Statistics; November, 1988.

 

Netherlands Central Bureau of Statistics; "Automation in Survey

Processing"; Select Report 4; Central Bureau of Statistics; Voorburg,

Netherlands; 1987.

 

Nicholls, William L.; "The Impact of High Technology on Data

Collection"; CATI Research Report Number GEN-1; U.S. Department of

Commerce; Bureau of Census;, February 24, 1989.

 

Rice Jr., Stewart C.; Wright, Robert A.; and Rowe, Ben; "Development

of.  Computer Assisted, Personal Interview for the National Health

Interview Survey 1987"; Proceeding of the Survey Research Methods

Section, American Statistical Association; 1988.

 

Rothchild, Beth B.; and Wilson, Lucy B.; "Nationwide Food consumption

survey 1987: A Landmark Personal Interview Survey

 

                                  63

 

 

Using Laptop Computers"; Proceedings of the Bureau of the Census

Fourth Annual Research Conference; pp 347-356; U.S. Department of

Commerce; Bureau of the Census; 1988.

 

Sebestik, Jutta; Zelon, Harvey; DeWitt, Dale; O'Reilly, James M.; and

McGowan, Kevin; "Initial Experiences with CAPI"; Proceedings of the

Bureau of the Census Fourth Annual Research; pp 357365; U.S.

Department of Commerce; Bureau of Census; 1988.

 

van Bastelaer, Alois; Kessemakers, Frans; and Sikkel, Dirk; "Data

collection with Hand-Held Computers: Contributions to Questionnaire

Design"; Journal of official Statistics; Vol.4; No. 2; pp 141-154;

1988.

 

C. CASI

 

Clayton, Richard, L.; and Winter, Debbie L.S.; Voice Recognition and

Voice Response Applications for Data Collection in a Federal/State

Establishment Survey"; Official Proceedings of Military and Government

Speech Tech '89, Media Dimensions; November, 1989

 

Ponikowski, Chester; and Meily, Sue; - "Use of Touchtone Recognition

Technology in Establishment Survey Data Collection"; Presented at the

First Annual Field Technologies Conference, St. Petersburg, Florida;

1988.

 

Werking, George; Tupek, Alan; and Clayton, Richard; "CATI and

Touchtone Self-Response Applications f or Establishment Surveys";

Journal of Official Statistics; Vol 4; No. 4; 1988; pp 349-362.

 

D. H -machine interfaces

 

Card, S.K.; Moran, T. P.; and Newell, A.; The Psycholocry Computer

Lawrence Erlbaum Associates; Hillsdale, NJ; 1983.

 

Conklin, Jeff ; "Hypertext: An Introduction and -survey"; IEEE

Computer; pp 17-41; Sept, 1987.

 

Draper, Norman D.; User Centered System Design; Lawrence Erlbaun

Associates; Hillsdale, NJ; 1986.

 

Hartson, H.R. (ed); Advances in Human-Computer Interaction; Ablex

Publishing Co.; Norwood, NJ; 1985.

 

Myers, Brad A.; Creating User Interfaces by Demon on; Academic Press;

San Diego, CA; 1988.

 

                                  64

 

 

 

Shneiderman, Ben; Designing the User Interface; Addison-Wesley;

Reading, MA; 1987.

 

Shu, Nam; Visual Programming; Van Nostrand; New York, NY; 1988.

 

E. Computer Security

 

Department of Defense; Trusted Computer System Evaluation Criteria;

DoD 5200.28-STD; 1985.

 

Federal Information Processing Standards Publication (FIPS PUB) 39;

Glossary for Computer Systems Security; February, 1976.

 

Federal Information Processing Standards Publication (FIPS PUB) 461;

Data Encryption Standard; January, 1988.

 

Federal Information Processing Standards Publication (FIPS PUB) 73;

Guidelines for Security of Computer Applications; June, 1980.

 

Federal Information Processing Standards Publication (FIPS PUB) 112;

Standard on Password Usage; May, 1985.

 

Federal Information Processing Standards Publication (FIPS PUB) 113;

Standard on Computer Data Authentication; may, 1985.

 

Gasser, Morrie; Building a Secure-Computer System; van Nostrand

Reinhold; New York; 1988.

 

National institute of Standards and Technology Publication List 91;

Computer Security Publications; January, 1988.

 

Pfleeger, Charles P.; Security in Computing; Prentice Hall; New

Jersey; 1989.

 

P. Networks

 

Arni, D.; "Standards in Process: Foundations and Profiles of ISDN and

OSI Studiest"; National Telecommunications and Information

Administration; Report 84-170; U.S. Department of Commerce;

Washington, DC; December, 1984.

 

Browne, T.; "Network of the Future"; Proceedings of the IEEE;

September, 1986.

Lutchford, J.; "CCITT Recommendations on the ISDN: A Review"; IEEE

Journal on Selected Areas in Communications; May, 1986.

 

Madron, Thomas W.; Local Area Networks: The Second Generation; John

Wiley and Sons; 1988.

 

                                  65

 

 

 

Stallings, W.; Handbook of Commuter-Communications Standards, Volume

1: The Open System Interconnection (OSI) Model and OSI-Related   

Standards; MacMillan; New York; 1987.

 

Stallings, W.; ISDN: An Introduction; MacMillan; New York; 1989.  U.S.

Department of Commerce; "NTIA TELECOM 2000: Charting the Course for a

New Century"; National Telecommunications and Information

Administration; NTIA Special Publication 89-21; U.S. Department of

Commerce, Washington, DC; October, 1988.

 

G. Applications

 

Clayton, Richard L.; and Harrell, Louis J., Jr.; "Developing a cost

Model for Alternative Data Collection, Methods: Mail, CATI, and TDE";

ASA Proceedings of the Section of Survey Research Methods., 1989.

 

Energy Information Administration; "PEDRO - Respondent User Guide to

the Petroleum Electronic Data Reporting Option"; Version 3.0; February

3, 1989.

 

Groves, Robert M; Survey Errors and Survey costs; John Wiley and Sons,

New York, 1989.

 

Statistical Policy Working Paper 15; "Quality in Establishment

Surveys"; Office of Management and Budget; July, 1988.

 

H. Standards

 

National institute of Standards and Technology Publication List 58;

Federal information Processing Standards publications; June, 1989.

 

                                  66

 

 

 

VI. Appendices

 

Appendix VI.A. Costs

 

Introduction

 

     The choice of a collection method is usually based on a

combination of performance and cost factors.  For traditional methods,

these factors are easily identified and the selection of a collection

mods is not difficult.  With recent technological advances, new

methods described in this report expand the array of potential

collection tools and challenge the survey designer to reevaluate old

cost and performance assumptions.  The decision of which method or

methods to use is now more difficult.

 

     This section reviews the structure of costs in the data

collection function covering several collection methods including

mail, CATI, CAPI, TDE and VRE.  It also briefly describes the impact

of automated collection on costs, particularly versus mail operations. 

This profile of costs is limited to data collection; ,considerations

of impact on sample design, questionnaire-changes, edits, and other

issues are excluded.

 

Collection Methods Defined

 

     CATI: The application of CATI is usually considered to address

timeliness and other quality problems.  The computer assists by

automatically controlling questionnaire branching, conducting on-line

editing for reconciliation directly with the respondent, scheduling

future calls and capturing a variety of management information about

the interview.  Thus, most data collection activities are conducted

through the CATI system.  The use of CATI generally vastly reduces or

eliminates routine mail handling activities and postage costs.  CATI

adds new costs in equipment purchase and replacement and telephone

charges.

 

     CAPI: This method extends the benefits of controlled branching

and on-line edit reconciliation to improve the quality of data

collected by personal interviewing. in surveys already using personal

visit collection, CAPI adds direct costs of computer hardware for each

data collector and software design and maintenance.

 

     Self-response -- Prepared Data Entry: By offering Prepared Data

Entry to respondents, the collecting agency adds the costs of software

design and maintenance, and possibly the costs of telephone charges

for electronic transmission of the completed questionnaire.

 

     Self-response -- Touchtone Data Entry and voice Recognition

Entry:    These methods include many of the same sample monitoring

 

                                  67

 

 

 

features of CATI and eliminate many of the labor-intensive activities

associated with the traditional mail methods.  TDE and VRE methods are

currently used as a replacement of mail collection.  By comparison,

the regular mail handling to and from respondents is reduced to a

single postcard to remind the respondent that it is time to call in

their data.  TDE and VRE further reduce manual operations by

transferring key entry to the respondent.  Short nonresponse calls may

be employed to remind respondents to call in their data as publication

deadlines approach. while reducing labor costs, TDE and VRE involve

added costs for computer hardware and software development and

maintenance.

 

 

Cost Model

 

     The data collection function is the series of activities that

follow sample selection and precede estimation.  Data collection is

comprised of a series of activities for capturing the data, converting

the data to machine-readable form, performing editing and edit

reconciliation, and follow-up for nonresponse.  The conduct of these

activities varies greatly under mail, CATI, CAPI and self-response

modes (PDE, TDE and VRE).  Major recurring cost categories for these

collection modes are outlined in Table 1.

 

Table 1. major Recurring Cost Categories by Collection Mode

 

Major Cost Categories         Mail CATI CAPI PDE  TDE  VRE

 

LABOR

mail out                      x              x    x    x

mail return                   x              x

data entry                    x    x    x

edit reconciliation           x    x    x         x    x

nonresponse follow-up         x         x    x    x    x

software development/maint.        x    x    x    x    x

interviewer training               x    x    x

 

NON-LABOR

postage                       x              x    x    x

telephones                         x              x    x

computer hardware                  x    x         x    x

travel                                  x

 

 

     The cost categories presented in Table 1 can be used to evaluate

the coats of other collection methods.  By comparing the activities of

the alternative method to the current method, a rough determination of

affordability can be made.  Detailed cost studies would be necessary

for each specific survey application.

 

                                  68

 

 

 

Assumptions

 

     Realistic assumptions are a vital part of an analysis of costs. 

Several assumptions should generally be made about the level of

workload and equipment requirements.  These may include the number of

units per CATI interviewer during normal collection period, and the

number of minutes per interview.  The TDE cost assumptions include the

length of the average call, effects of peak calling periods, the

number of incoming lines per TDE board, and the average proportion of

units receiving nonresponse prompting actions.  Also, the number of

boards that can be placed in the microcomputer should be included.

 

     The following factors, independent of collection mode, should be

included in the model: salaries and benefits, administrative overhead

allocations, standard non-personnel services, postage, amortization of

computer hardware to cover replacement, and telephone charges,

including fixed monthly line charges and variable call costs.

 

     The following factors are generally difficult to quantify and

often cannot be treated equally for all methods: start up costs for

research and development, ongoing systems design and maintenance,

training, and emergency back-up features for CATI and TDE.

 

 

Other Important Considerations

 

     Critical decisions concerning changes in the data collection

methods are not made solely on costs; there are many other

considerations to include in these decisions.

 

     Organizational Impact: The design of an effective production

environment is essential to timely, ongoing output of data.  For

example, the success of CATI and TDE in compressing the collection

period may pose peak period staffing problems.  Also, the cost model

assumes the managers can perfectly capture and reallocate resources as

collection methods change.  For example, TDE eliminates key entry. 

The costs are only truly saved if these resources can be captured and

reinvested in new equipment and telephone charges, and with remaining

savings redirected toward improving the quality of other survey

functions.  Also, it is assumed that postage savings also are

identifiable and may be similarly captured and redirected.

 

     Staffing for Research and Development: The development of new

techniques usually requires a small staff dedicated to achieving the

change desired.  Also, this staff must have a variety of skills,

including economics, statistics, methods test design, computer systems

design, questionnaire development, and analytical, writing, and

presentation skills.  This combination of individuals may be difficult

to identify and remove from ongoing production

 

                                  69

 

 

 

tasks.  Given the frequency of new issues and problems, this group may

require special attention from management and latitude in trying

creative approaches to solving the wide range of problems that will

inevitably arise in development efforts.

 

     Systems Design, Programming, and Maintenance: There are

significant start up costs, although these can be easily amortized

over large, recurring surveys.  These costs will vary with the

complexity of the application and the experience of the development

staff.  Ongoing maintenance depends on the frequency and magnitude of

the changes.

 

     Training: Training requirements for staff to maintain manual

operations, such as would be needed under mail, are small.  Under

CATI, a broader range of skills is required, including telephone

communications skills and some working knowledge of the computer.  The

TDE system requires little special knowledge, keeping costs low.

 

     Emergency Procedures: As we increasingly rely on technology to do

work for us, we are increasingly at risk when it fails.  All

implementation approaches should include back-up procedures and

equipment at appropriate locations to ensure uninterrupted service to

respondents.  Telephone based methods may require back-up computers   

and associated equipment standing ready for instant replacement.  In

addition, TDE and VRE applications consider establishing "call

forwarding" services ready to route incoming TDE and VRE calls to an

alternative collection site if the primary collection computers

malfunction.

 

Quality Costs

 

     The costs of quality are notoriously difficult to identify. 

Often, it is easier to invert this idea to address costs of poor

quality.  For example, address refinement workload for solicitation is

a cost of poor quality in the sample frame.  Some edit reconciliation

activities compensate for poor quality of collected data that may stem

from deficiencies in concept or questionnaire design.  Efforts

expended to prevent future costs of poor quality, while often

difficult to justify, generally pay off in lower ongoing costs.

 

 

Future Costs

 

     The choice. of collection mode, or which combination, will depend

on the particular survey application and the existing cost structure. 

However, it is important to view investments in data collection over

the long-term as the relative costs of each of the above inputs do not

remain constant over time.  Table 2 shows recent annual data on cost

trends for the major cost inputs.

 

                                  70

 

 

 

     Labor and labor-intensive inputs, such as postage, are

increasingly more expensive, while capital-intensive factors, such as

telephones and computers, become less expensive.  Based on these data,

and other historical cost trends, there may be a growing advantage to

switching to collection methods that use less labor and more capital.

 

Table 2. Recent Annual Changes in Costs of Inputs into Data Collection

 

Cost Category  Recent Annual Cost Changes (source)

 

Labor:         +5.8% for state and local government employee

               compensation (ECI for the 12 month period ending June

               1989)

 

Postage:       +4.5% for the 1st class postage (U.S.P.s. for the rate

               increase in April 1988 to 25 cents)

 

Telephones:    -1.3% for interstate toll calls (CPI-U unadjusted

               change December 1988 to December 1989)  

               -2.5% for intrastate toll calls (CPI-U unadjusted

               change December 1988 to December 1989)

 

Travel:        +3.9% for private transportation (CPI-U unadjusted

               change December 1988 to December 1989)

 

Computers:     -10.0% for microcomputers (PPI experimental price

               indexes for the 12 months ending January 1990)

 

     Survey managers should project unit costs for their surveys for

alternative collection methods over a ten year period using recent

price trends.  This approach illustrates that decisions to implement

alternative methods should be viewed in terms of estimates of future

price levels.  Decisions on conducting research and development

testing need not await a current favorable cost benefit situation.

 

Conclusion

 

     The decision on exactly how to use each collection mode will vary

by survey application.  For example, CATI and TDE could be combined to

address chronically late mail respondents.  These units will first be

-converted to CATI collection to improve their reporting behavior in

terms of timeliness and accuracy.  These units will remain under CATI

collection f or about 6 months; a period adequate f or reducing

nonresponse problems, determining exact data availability dates (for

subsequent nonresponse prompting), educating respondents to the

importance of their data

 

                                  71

 

 

 

and reinforcing timely reporting behavior.  Then, the units will be

converted to TDE collection to reduce costs while retaining sample

control. voice recognition collection could be used for those units

without touchtone phones or for those respondents who prefer voice

collection.

 

     The approach outlined here is a basic tool for survey managers in

assessing the potential application of new collection methods.  Survey

researchers should not be dissuaded by current costs  from considering

the use of automated collection methods.  Recent cost trends suggest

that the cost-effectiveness of collection methods changes over time. 

This should be considered in decisions concerning choice of collection

methods for the future.

 

                                  72

 

 

 

Appendix VI.B. Quality Improvements offered by CASIC

 

     Quality problems generally result from inadequate planning or

control of one or more steps in the survey process.  CASIC cannot

replace or compensate for poor planning, but it may offer vast

improvements in control by reducing manual intervention, promoting

consistent procedures, by using supplementary data sources, and on-

line editing to improve the accuracy of the data collection process.

 

     The automation of the questionnaire is the primary way casic

improves control, by offering consistent procedures, on-line editing,

and use of other information, to monitor and control the interview

which otherwise would have proven too difficult or burdensome on the

interviewer.

 

     While CASIC offers the potential for improvements, actual

reductions in error components can only be made through efforts to

delineate error potential and incorporating specific error-reducing

techniques in the questionnaire.

 

     Some error reductions may be great, and others may be small. 

However, none will result without thorough evaluations of error

sources and planning to address each.  Often, knowledge of the

magnitude of various errors may be necessary to decide on the cost-

effectiveness of addressing some error sources.

 

     The automation of the data collection process directly reduces

some sources of error.  For example, telephone collection of data may

reduce the potential f or processing error resulting from mailing the

wrong form to a respondent.  Other indirect benefits can be obtained

through automation, including reductions in coverage error.  For

example, on-line evaluation of respondent characteristics provides

immediate identification of out-of scope respondents.

 

     This section discusses several error components that may be

reduced through CASIC methods.  The structure, definitions and

background of this discussion were derived from Statistical Policy

Working Paper 15, entitled "Quality in Establishment Surveys." Readers

are encouraged to refer to this document for more information on error

definition, sources, control methods and measurement aspects.

 

Specification Error

 

     Specification error occurs at the planning stage of a survey when

specification is inadequate or inconsistent with the objectives of the

survey.  It can result from the difficulty of measuring abstract

concepts or from poorly worded questionnaires and instructions.

 

                                  73

 

 

 

     CASIC methods may reduce specification errors in several ways. 

For example, difficult concepts may require very detailed

questionnaires with complex branching patterns to obtain correct

measures. CATI and CAPI can allow greater flexibility in structuring

questionnaires than would be possible using paper forms.  Also,  CASIC

provides a means for correcting specification error once identified. 

If one or more questions are difficult to use during collection,

or.responses seem improper, corrections can be made centrally and

software transferred quickly to all collection points.  Given printing

timing and costs, use of paper forms probably would not allow such

mid-stream changes and the survey results could be severely

compromised.

 

     While developing and printing questionnaires, skip pattern

indicators may be omitted, or the tedious work of proofreading

multiple variations may lead to errors.  Use of CASIC instruments are

just as susceptible to this error as are written forms.  Automated

questionnaires, and the associated code, must be checked thoroughly to

ensure their accuracy.  Forms also may be faint or smudged leading to

difficulty for the respondent.

 

     Traditional methods for measuring specification error include

record check studies, cognitive studies, questionnaire pretests, and

comparison of results with independent estimates.  CASIC can

contribute to these approaches.  First, record check surveys that

scrutinize detailed definitional areas may be very complex.  Such

detailed branching is a strength of CATI and CAPI.

 

 

Coverage error

 

     Coverage error includes both undercoverage, the exclusion of in-

scope units; and overcoverage, the inclusion of out-of-scope units. 

CASIC may reduce overcoverage if the questionnaire includes checks for

scope-determining characteristics.  Data for sample units failing

these criteria may be noted for review or exclusion, or the interviews

may be ended rather than waste time.  Also, duplication errors,

stemming from duplicates on the sample frame may be identified through

an automated records review at any point during collection.  Again,

such benefits are only possible with initial planning.

 

 

Response Error

 

     Response error is the difference between the correct value and

the value collected.  Respondent error is the failure to report the   

correct value, and interviewer error is the failure to record the data

properly.

 

     Respondent error may be controlled by comparing current data to

previously reported data.  Such on-line logic and internal

 

                                  74

 

 

 

consistency edits can identify and resolve response errors directly

with the respondent, rather than waiting for post-collection editing

to catch errors for often spotty reconciliation follow-up.  The power

of an automated questionnaire also reduces interviewer error through

instantaneous editing on any data entry mistakes large enough to

trigger edit failures.

 

     Interviewer consistency also may be controlled by monitoring

interviewer practices and assuring conformance with specified

procedures.  Most large, centralized CATI facilities allow supervisors

to listen to interviews in process and to view screens simultaneously.

 

Nonresponse error

 

     Nonresponse errors follow from failures to collect complete

information from all units in the selected sample.  There are three

types of nonresponse error: noncontacts, unit nonresponse and item

nonresponse.  Each can be addressed through CASIC methods. 

Noncontacts of selected units may be the result of interviewer

oversight, failure to locate the designated respondent due to

incorrect address or telephone number, or failure to get the form to

the respondent.  CASIC cannot address weaknesses in mailing procedures

except by replacing them with accurate telephone contact.  This would,

of course place additional burden on the accuracy of telephone

numbers.

 

     interviewer oversight would be addressed by monitoring sample

status data that can be collected during-interviews.  For example, a

detailed CATI system may capture information each time a call is

placed, the number of attempts made to each number and the result of

each attempt, such as "no answer" or "busy." Noncontacts may then be

classified as not attempted versus unsuccessful attempts.

 

     Unit nonresponse occurs when no information is received from the

respondent.  The survey designer must strive to make reporting as easy

as possible to reduce intentional nonresponse.  Almost any effort that

improves the respondent's understanding of the survey is worth the

cost.  The convenience of reporting is essential, as is the clearest

and shortest possible interview.  One CATI application reduced sample

attrition by over one third compared to mail, attributed mostly to

strong scheduling and building strong rapport with the respondent and

providing information about the importance of the survey and its

timing needs.

 

     Item nonresponse occurs when the respondent does not answer

certain questions during the interview.  This error may occur when the

respondent cost of compiling data is too great, or the data are not

easily available during the collection period. of course, some data

may be sensitive or confidential.

 

                                  75

 

 

 

     Item nonresponse also may occur through the failure of the

interviewer to ask questions or follow procedures.  It is in this area

that CASIC is most beneficial.  By using software to control

interviews, CATI and CAPI interviewers are not allowed to make errors

of omission or purposely to skip questions.

 

     Another important part of reducing item nonresponse is to use a

priori knowledge about the respondent.  For example, in establishment

surveys, information about the record keeping practices of the

respondent may be retained on the computer for access during the

interview that could provide special branching to elicit firm-specific

data.  This approach would generally be too cumbersome without

computer assistance.

 

 

Processing Error

 

     Processing error stems from the faulty use of correctly designed

survey methods.  It encompasses many collection and post-collection

errors and the printing of the questionnaires.  Also, processing error

may arise from clerical handling of forms whether in mailing or key

entry.

 

     CASIC methods, by reducing or eliminating these labor intensive

and error-prone activities, can substantially reduce processing

errors.  CASIC respondents in recurring surveys may receive a mailed

form once per year rather than once each month or quarter, reducing

the opportunity for mail-related errors.  All CASIC methods ensure

that data entry and other coding is done by a well trained interviewer

or by the actual respondent, thus reducing keypunch error.  All CASIC

procedures should include repetition of the incoming data for

verification with the respondent.  CATI and CAPI interviewers repeat

the data aloud as they are keying it, and CASI methods must provide

for repeating the data for verification. by the respondent. on-line

edits again play a role in. assuring that data errors are caught

before they get to the post-collection stage.

 

     Another source of processing error is data processing by

computer.  All the benefits of CASIC methods described above may be

diminished by errors in computer processing.  Failures in designing

and constructing CASIC methods may substantially reduce data quality. 

For example, poor branching or non-exhaustive response options may

prevent knowledgeable interviewers or self-response systems users from

properly completing interviewers.

 

     Quality, as discussed above, is often defined in terms of

statistical error or lack of accuracy.  However, the idea of quality

contains several other elements.  For example, the element of

timeliness is critical to most surveys.  Accurate data that are too

late to be of use have little quality.  The use of CASIC methods, like

CATI and TDE, has proven useful in improving the

 

                                  76

 

 

 

timeliness of data in one large establishment survey, thus offering

the potential to reduce the number and magnitude of estimate

revisions.  Quality also includes costs.  Two identical products with

differing costs are of different quality.

 

     Also, quality control should be applied to the process of methods

development.  A high quality CASIC application must be easily

understood and easily used by interviewers and respondents.  Anything

less is of low quality.

 

Conclusion

 

     CASIC methods have great potential for improving the control over

data collection activities and the quality of the resulting data as it

moves toward the post-collection survey functions.  This discussion of

survey error and the application of CASIC methods is not exhaustive of

either current or potential approaches.  Many other creative

approaches will be developed to further use the power of computers to

aid in improving the quality of Federal surveys.

 

     Equally important to add to the discussion of quality is a

caution that the mere use of CASIC methods does not automatically

guarantee higher data quality.  Failures in designing and testing

questionnaires or in using other standard survey practices will

inevitably result in data quality problems.

 

     The increased reliance on software development has important

implications for hiring and training skilled survey designers. 

Statistical methods knowledge and experience alone are not sufficient

qualifications to achieve satisfactory results.  Previously distinct

boundaries between occupational groups will continuously blur or

disappear. in the future survey design will likely be increasingly

accomplished through teams of skilled workers from different

occupations.  Just as statisticians must be familiar with software

design techniques to understand their implications, systems analysts

and programmers must be familiar with the statistical aspects of the

survey and questionnaire design.  Managers of automated surveys cannot

avoid having a background in all aspects of the design, implementation

and maintenance of integrated systems.

 

                                  77

 

 

 

Appendix VI.C. Survey Examples

 

     The following examples.provide additional examples of current

CASIC applications.  Each provides a point of contact for additional

information.

 

                                  78

 

 

 

            National Agricultural Statistics Service (NASS)

                         Agricultural Surveys

 

Collection Type -- CATI

 

Point of Contact

 

USDA - NASS

CATI Section, Survey Management Branch

Research and Application Division

1400 Independence Avenue

Washington, DC 20250

 

 

Type of Data to be Collected

 

     The Agricultural Surveys are conducted in January, March, June,

July, September, and December to collect data on crops, livestock,

grain stocks, and other information from farmers.  Starting with the

March 1987 survey, data were collected using Computer Assisted

Telephone Interviewing (CATI) to replace the paper-and-pencil mode. 

CATI is a computer driven telephone interviewing system developed to

replace a paper questionnaire with a more efficient, error-reducing

questionnaire.  It can edit the data as it is entered by accepting

only valid responses; checking sums and edit limits; carrying forward

responses required for subsequent questions; and refusing answers

inconsistent with current or historical responses.  CATI provides

question branching and some systems can handle each state's customized

version of the Questionnaire.  Currently, 14 of the 45 field offices

are collecting data with CATI using 183 calling stations, and in 1989,

over 70,000 farmers were contacted to obtain Agriculture survey data. 

CATI usage will expand rapidly with the installation of the new PC

Local Area Networks (LANS) in the-field offices.  By 1992, all 45

field officer. will be equipped with a PC LAN and there should be

about 750 calling stations available -for making CATI calls.

 

Approach to Respondents

 

     The Agriculture Surveys CATI application is written using the

Computer Assisted Survey Execution System (CASES) software developed

by the University of California at Berkeley.  CASES has an automated

sample delivery system that is in use and an automated call scheduling

and dialing option will be initiated in the future.  Other features

make CASES one of the most powerful systems on the market today. 

These include: interactive editing (coding), sample management,

records keeping, conversational survey Analysis (CSA), audit trails,

jump-back menus, and full screen mode with cursor control.  The

interview sessions are initiated by the interviewer.

 

                                  79

 

 

 

The computer program controls branching to or skipping among

questions, and validates the data as it is entered.  In addition, the

interviews are more personalized, probing questions are standardized,

use of historic data is standardized, and the questions can be more

sophisticated than those on paper questionnaires.

 

Transmission

 

     Data collected via CATI is currently up-loaded to an IBM

mainframe leased from the Martin Marietta Corporation where a SAS edit

is done, and data summarized.  Since the survey data is currently

collected via different modes (CATI, telephone, on paper. personal

interview, and mail), it is necessary to convert the data to one

standard system for summarization.

 

 

Factors Affecting Choice of Method

 

     The implementation of the CATI for collecting Agricultural Survey

data has resulted in higher quality data and a reduction in time and

cost of collection.  This is due to combining the collection, entry,

validation, analysis, and conversion of data.  More complex

questionnaire design is possible since the program controls branching

and logic.  CATI works particularly well in situations where a short

implementation schedule exists.

 

Quality Issues

 

     Significantly fewer errors occur, as data is validated at the

time it is reported and keyed.  The data validation currently includes

internal data checks but some work has been done on using historic

edit checks as well.  Since the program controls the logic, you are

assured that all questions are asked consistently.  A totally menu

driven system is being designed and will be in operation soon.

 

                                  80

 

 

 

National Health Interview Survey (NHIS)

Computer Assisted Personal Interview (CAPI) Case Study

 

Collection Type -- CAPI

 

Point of Contact

 

Division of Health Interview Survey

National Center for Health Statistics

3700 East-West Highway

Hyattsville, MD 20782

(301) 436-7085

 

 

Type of Data to be Collected

 

     The case study involved the collection of health data f rom

approximately 500 households in two Census Regions: Chicago and

Charlotte. The questionnaire consisted of the NHIS core questionnaire 

that contains more than 600 questions on the composition of the

household, demographic characteristics, health status of the

individuals, health care visits and incidents, and other pertinent

health care data.  The respondents are contacted at their residence,

and are not contacted again unless the interview was not completed on

the initial visit or additional clarifications are needed.  Because

this effort was a feasibility study for CAPI, only a small portion of

the normal survey respondents were contacted.  The normal survey size

is 50,000 households per year.

 

Approach to Respondents

 

     CAPI was used to obtain the survey information.  A portable

computer containing the survey questionnaire was carried-into the

household by the interviewer.  The portable computer was a Toshiba

1100+ weighing approximately 10 lbs.  The survey questionnaire was

programmed in the Computer Aided Survey System (CASS) language

developed by Dawn and Charles Palit at the University of Wisconsin. 

The interviewer conducted the survey by reading the questions from the

computer screen and entering the answers on the keyboard.

 

Transmission

 

     The survey questionnaire data is collected on 3 1/2" floppy disks

by the interviewer.  The disks are collected from each interviewer in

the region, merged at the regional office, and then mailed to the

computer center in North Carolina for uploading to the mainframe

computer.

 

                                  81

 

 

 

Factors Affecting Choice of Method

 

     The choice of CAPI provided several advantages.  First, improved

timeliness of survey data availability through the ability to quickly

put the survey into the field and the subsequent elimination of the

keying of the completed questionnaire.

 

     Second, improved data quality because (1) significant editing can

be done as a part of the data collection process; (2) there is greater

flexibility for questionnaire design, e.g., more opportunity to make

changes closer to the field implementation date; (3) good measurements

for non-sampling error are easily provided as a part of the process;

and (4) immediate interviewer quality control is available from an

analysis of the data, e.g., time to complete a section or the entire

questionnaire.

 

                                  82

 

 

 

                 Current Employment Statistics Survey

                      Bureau of Labor Statistics

 

Collection Type -- CATI, TDE, VRE

 

Point of Contact

 

Division of Monthly Industry Employment Statistics

U.S. Bureau of Labor Statistics

Room 2089 441 G Street, N.W.

Washington, D.C. 20212

202--523-1446

 

Type of Data to be Collected

 

     The Current Employment Statistics (CES) survey collects data from

over 300,000 nonagricultural business establishments each month

covering employment, hours and earnings.

 

     The CES is voluntary and is conducted in a Federal-State

cooperative system in which BLS provides the statistical standards and

procedures for use in each state and the District of Columbia, Puerto

Rico and the Virgin Islands. in this way, the resulting data can be

aggregated to National totals, and are comparable among the states,

which produce estimates at the state and metropolitan area levels.

 

     The national data are first published after only two weeks of

collection.  Then, based on additional sample receipt, revised

estimates are published after 3 more weeks of collection, followed by

final estimates after a total of 8 weeks of collection.  The short

collection period poses the toughest problem for the CES survey.

 

 

Approach to Respondents

 

     Under mail collection, respondents return the form sometime after

their data become available.  Given the very short, two week

collection period before the publication of preliminary estimates, any

delay in completing the form, or returning it to the state has severe

implications for response rates.  Under CATI collection, respondents

are called on a pre-arranged date, if possible, the same day -as the

firm's data are available.  The data are entered and edited during

this call, and the next month's call is scheduled.

 

     The conversion of respondents from mail to CATI includes sending

selected units a package of materials with information on the

importance and uses of the CES, data, and instructions on

 

                                  83

 

 

 

reporting by telephone.  As respondents are converted to TDE or VRE

collection, another package is sent containing instructions on how to

participate using these methods.

 

     Under TDE and VRE, respondents receive an "Advance Notice"

postcard during the reference period that serves as a reminder that it

is time to call in their data.  The collection microcomputer is

available 24 hours, 7 days a week to receive calls.  A few days before

the end of each collection period, the TDE and VRE collection files

are checked, and those respondents for which data are missing receive

a short call to ask that the data be called in.

 

     After the first month of collection by TDE or VRE, respondents

are called to discuss the new method, to identify and correct any

problems that may have been encountered, and to insure trouble-free

collection.

 

 

Transmission

 

     Under the mixed mode of collection in the CES program, responses

are received by mail, CATI collection and TDE self-response.  In the

Federal/State cooperative system, the state collects the microdata,

through the appropriate mix of methods, for electronic transmission to

the central computing facility in Washington.  The State data are then

aggregated for the production of national estimates.  At each level,

the microdata are subjected to rigorous editing for logical,

consistency, and longitudinal checks.

 

Factors Affecting Choice of Method

 

Timeliness

 

     BLS has been conducting research and development in the area of

computer assisted methodology since 1984.  Currently, over 5300 units

are collected via CATI each month.

 

     The use of CATI within the CES program is limited by the

resources available.  The current implementation strategy is based on

targeted use of CATI for specific segments of the sample which warrant

special treatment and commitment of funds.  These segments include

large, "certainty" units, and late respondents.  These units are

converted to CATI collection for a short period, usually 6 months, to

educate respondents on the importance of the CES data and the

reporting timing requirements and to improve reporting habits.  After

reporting improves, these units will be returned to either TDE self-

response collection, or mail, if there is no access to a touchtone

phone.  Thus, CATI is seen as a transitional tool

 

                                  84

 

 

 

for improving the overall timeliness of the CES sample over a period

of just a few years.

 

Costs

 

     While CATI is a very strong method for improving the timeliness,

it is currently more expensive than the mail collection process that

has been used for decades.  The high costs of CATI prompted BLS to

pursue development and testing of TDE and VRE methods.  These

automated self-response methods offer lower costs through reducing or

eliminated many manual activities and postage involved in mail

collection.  Respondents without touchtone phones will be collected

using voice recognition.

 

 

Quality Issues

 

     By every measure, CATI proved.superior to mail collection, and

TDE has shown the ability to maintain high response rates over

extended periods of more than two years.  The tests of VRE collection

show similar ability to maintain high response rates.

 

 

Performance Measure           Collection Method

 

                              Mail      CATI      TDE/VRE

 

Sample received for:

preliminary estimates         50%       85%       85%

 

revised estimates             75%       99%       99%

 

final estimates               87%       100%      100%

 

Sample attrition

(annual rate)                 10-15%    2-4%      2-4%

 

     Besides reducing nonresponse error for the preliminary estimates,

the CES program uses a CATI system to evaluate and correct response

error.  Large scale tests using telephone record check surveys have

shown that this approach is useful for insuring that the reported data

conforms as closely as possible to CES definitions.

 

                                  85

 

 

 

                Energy Information Administration (EIA)

             Reserves Information Gathering System (RIGS)

                             Form ET.A-23

 

Collection Type -- PDE

 

Point of Contact

 

Reserves and Natural Gas Division

Energy Information Administration

1114 Commerce St., Room 804

Dallas, Texas 75242-2899

(214) 767-2200

 

Type of Data to be Collected

 

     There are approximately 600 respondents who are oil or gas well

operators who produce at least 400,000 barrels of crude oil or 2

billion cubic feet of gas annually.  There are 15 detailed questions

in this annual survey.  A system of reporting on PC diskettes was set

up on an operational test basis for the collection of 1988 data.  Ten

percent of 1988 production was reported with RIGS.

 

Approach to Respondents

 

     The questionnaire runs on IBM PC compatible computers with at

least 360K bytes of RAM and two floppy drives or a floppy drive and a

hard disk drive.  The user only needs to know basic Dos functions. 

The program is menu driven, and on-line help is available as well as a

toll-free telephone hotline during business hours.  It comes with a

fifty page Users Guide.

 

Transmission

 

     Respondents copy the data files onto a floppy disk and mail the

disk (with the cover page sent to them) to EIA.  They also have the

option of sending in the original paper form.

 

Factors Affecting Choice of method

 

     RIGS was developed to provide respondents with an alternative,

more user-friendly means for reporting data.  The PC compatible

computer was chosen because of its wide availability.  Use of the mail

avoids security concerns about data transmission.  EIA processing is

done on a secure machine.

 

                                  86

 

 

 

Quality Issues (Human Interface)

 

     RIGS includes data edit checks to prevent inadvertent entries and

an on-line correction capability.  Company totals are automatically

calculated.  Respondents are requested to keep a copy of the data

files and a printed copy of the output in case EIA's quality control

analysts need to contact them.  Reduction of follow-up calls is a

significant benefit.

 

                                  87

 

 

 

Internal Revenue Service

Electronic Filing System Office

 

Collection Type -- PDE

 

Point of Contact

 

Operations and Marketing Branch

Electronic Filing System Office

Internal Revenue Service

1111 Constitution Avenue, N.W.

Washington, DC 20224

(202) 535-6394

 

 

Type of Data to be Collected

 

     In the early 1980's, the Internal Revenue Service (IRS) decided

that the electronic transmission of returns by tax preparers to IRS

would be both a practical and cost-beneficial alternative to the

mailing of paper tax returns when a refund is claimed.  According to

the Agency, the benefits of electronic filing would include: (1)

reduced manual labor costs required to process, store, and retrieve

returns, (2) faster processing and retrieval of tax data, and (3)

reduced interest IRS is required to pay to taxpayers who file timely

refund returns, but who are not issued refunds within the interest-

free period allowed to the IRS to process these refunds.

 

     Further, IRS reports show that electronically transmitted returns

are processed with significantly fewer errors than paper returns. 

According to IRS figures for the 1988 filing season, as of April 29,

1988, 20 percent of paper returns processed by IRS had errors and only

5.5 percent of, those filed electronically had errors.  For taxpayers,

electronic filing can mean refunds up to 3 weeks sooner, and because

IRS can deposit these refunds directly into taxpayer bank accounts,

refunds may arrive 3 to 4 days earlier than that.  For tax preparers,

the ability to provide electronic filing services to taxpayers

promises a competitive business edge.

 

Approach to Respondents

 

     in 1986, the program was initially tested in three metropolitan

areas, and five preparers electronically filed 24,820 returns to the

Cincinnati Service Center.  In 1987, 69 preparers in 7 metropolitan

areas electronically filed 77,612 returns.  For the 1988 filing

season, IRS expanded its electronic filing program to 16 IRS districts

and a second service center in Ogden, Utah.  With the expansion in

1988, the number of preparers increased to 2,339.  Of that total,

1,114, or about half, filed all of the 583,077

 

                                  88

 

 

 

electronic returns for 1988.  Furthermore, H & R Block offices

accounted for 82 percent of the total returns filed electronically

during the 1988 filing season.

 

 

Transmission

 

     To operate electronic filing at each of the two service centers

in 1988, IRS bought the International Business Machines Corporation

(IBM) Series I computer, a local area network, and the related

computer software.  The network has IBM and IBM-compatible personal

computers, high-resolution graphics display workstations, laser

printer, tape drives, and optical disk drives.  IRS uses the Series I

to receive preparers' transmissions of electronic returns and to

transmit certain information to preparers.  The local area network was

expected to do two primary functions: (1) retrieve and visually

display the electronic returns on the tax examiners I workstations for

error correction, and (2) permanently store these returns.

 

     The basic components needed to prepare and transmit electronic

returns include a computer, IRS-approved software to prepare tax

returns, and the communications equipment and IRS-approved software to

transmit the returns to IRS.  In addition, IRS tests and verifies the

preparers' competence in transmitting electronic returns.

 

     The electronic filing process begins when a preparer transmits

electronic returns to the service center.  The Series I receives the

transmission and writes the data onto a magnetic tape.  The tape is

then manually transferred from the Series I to the service center

mainframe computer processing.  The mainframe generates an

acknowledgment file specifying the received returns and whether each

is accepted or rejected, and then writes this file onto magnetic tape. 

This tape file is hand carried from the mainframe to the Series I for

electronic transmission to the individual preparers.  Mainframe

processing also identifies electronic returns containing errors -

After IPS corrects the errors, tapes containing data from accepted

error-free returns are sent with data from returns filed on paper to

the IRS National Computer Center in Martinsburg, West Virginia, where

the master files of tax account data are updated.

 

                                  89

 

 

 

Energy Information Administration

Petroleum Electronic Data Reporting Option (PEDRO)

 

Collection Type -- PDE

 

Point of Contact

 

Petroleum Supply Division

Energy Information Administration

1000 Independence Avenue, S.W.

Washington, D.C. 20585

 

Type of Data to be Collected

 

     The Petroleum Supply Division (PSD) of the Energy Information

Administration (EIA) decided in 1987 to investigate electronic forms

submission to collect the Petroleum Supply Reporting System (PSRS)

survey forms.  Ten of the major petroleum companies who file the

mandatory "Monthly Refinery Report" were contacted to assess their PC

and communications capabilities.  The respondents contacted showed

interest in investigating the use of Pc's to collect this data.  Most

of these were already using PC's for business, personal, or academic

purposes.  The respondents either had a PC in their office area or had

access to one in another office.  Software such as Lotus 1-2-3 and

Dbase III could usually be found on these PC's.  Some PC's were

equipped with communications capabilities and those respondents were.

already using telephone lines for company reporting.  It appeared to

be the appropriate time for the PC to enter the PSRs data collection

process.

 

Approach to Respondents

 

     Early in 1988, PSD developed the Petroleum Electronic Data

Reporting Option (PEDRO) and began providing its respondents with a

software diskette by which they could create an electronic image of

the form on a PC screen and enter,their data in the appropriate cells. 

Firms having the necessary software capabilities can use their

database to feed the data directly to the electronic survey form,

eliminating keying and transcription errors.  User-friendly software

with help functions has been added to data entry functions to provide

quick reference to definitions, conversion factors or other

information to speed the completion of the survey form.  This

eliminates the need to search hard-copy files for survey forms

instructions, product definitions, conversion tables, etc.

 

                                  90

 

 

 

Transmission

 

     The data received on EIA survey forms are subjected to rigorous

edit tests before they are accepted for inclusion in the EIA database. 

These data are later summarized to produce EIA publications and

reports used by the industry, the Congress, and the public. 

Timeliness and accuracy are needed in every step of the data

collection process.  Collecting data via electronic means allows EIA

to pursue another approach to saving time by providing respondents

with electronic forms software that also does the survey edits and

isolates anomalies for review before submitting the survey response to

EIA.  Issues which would require an EIA data analyst to contact a

respondent by telephone for resolution are highlighted immediately. 

This allows the respondent to correct any errors or attach a

resolution indicator/comment to explain any anomalies.  Additional

telecommunications software has been added to allow the direct link

between the respondent's PC and the EIA system.  Now the capability

exists on a PC to create, quality check, and transmit an electronic

file directly to EIA.  This file is immediately accessed by EIA

processing software and security and data transmission integrity tests

are done.

 

     The PEDRO software contains electronic forms for data entry,

software for statistical editing and establishes a communications link

between the respondent's PC and the EIA Computer Facility.  The

functions are menu-driven and use macro languages and script files to

eliminate rudimentary tasks.  The PEDRO system only requires that the

respondent's PC run DOS software and be equipped with

telecommunications capability.

 

                                  91

 

 

 

Energy Information Administration (EIA)

Annual Survey of Nuclear Utilities

 

Collection Type -- PDE

 

Point of Contact

 

Nuclear and Alternate Fuels Division

Energy Information Administration

1000 Independence Avenue, S.W.

Washington, D.C. 20585

(202) 254-5558

 

Type of Data to be Collected

 

     The Nuclear and Alternate Fuels Division of the EIA conducts an

annual survey of nuclear electric utilities that own commercial

nuclear reactors.  The BIA collects data on over 100,000 nuclear fuel

assemblies that are owned and managed by these utilities.  These data

are collected in support of the programs of the Department of Energy's

Office of Civilian Radioactive Waste Management.  A system of

reporting on PC diskettes was set up in 1986 and began with the

collection of 1985 data.

 

Approach to Respondents

 

     The respondents are supplied with a program diskette containing

compiled software and a data diskette.  The data diskettes -have the

respondent's prior data submissions that are needed for comparison

purposes and space for the current submission.  The respondents load

the program and data diskettes on their compatible PC's and enter the

current data which is. verified by the data entry program as it is

keyed.  They print a copy of the data submission, sign a certification

statement for it, and return the printed copy and statement to the EIA

with the diskette.

 

Transmission

 

     The diskettes are mailed from the EIA to the respondents and the

completed data diskettes are returned to the EIA by mail.  

Telecommunication between the EIA and the respondents is not needed. 

When the diskettes are received at the EIA, they are loaded onto a PC

and checked.  The data are uploaded from the PC's to the EIA mainframe

over local telephone lines.  Note that since these data are for public

utilities, they are in the public domain and thus not confidential or

proprietary.  Certain issues of data security do not apply for this

survey.

 

                                  92

 

 

 

     The diskette form of submission is preferred, but not mandatory. 

Respondents have the option of filing a paper form.  Now there are

approximately 70 utilities required to report for approximately 125

reactors, and all reports are filed on diskette.

 

Factors Affecting Choice of Method

 

     The major advantages of the diskette collection are:

 

     Data accuracy has been improved by (1). editing the data as they

     are keyed and (2) in some instances, data entry by technical

     rather than clerical personnel.  The second reason suggests that

     a higher level of technology in data collection may result in.

     the availability of a higher level of respondent skill to

     complete the survey.

 

     More data, including data of a more complex nature, can be

     collected using the diskettes compared to using paper forms.

 

     Data are available sooner.

 

     In planning such a system, government agencies must be careful to

create a, system that does not require or endorse a particular brand

of hardware or software.  Software licensing agreements also must be

carefully reviewed to ensure they are not violated when software it

provided to respondents.

 

                                  93

 

 

 

Appendix VI.D. A Taxonomy of Information Gathering Using a Computer

 

     During this study there have been wide-ranging discussions on

naming conventions for information gathering using a computer.  The

discussion has been so wide-ranging that the name of the committee has

changed at least 3 times.  This note was originally titled "Acronyms

for Survey Technologies." However, it provides a good model of the

different procedures for collecting information with computer

assistance.  The title of this section has been changed to reflect

this model.

 

     We can distinguish two aspects of the data collection process

which may include automation: (1) assistance during the interview and

(2) interaction with the respondent.  A computer or other technology

may be involved in one or both.  Here is a system of acronyms using

codes to show how each part is handled:

 

     Operation types:

 

     CA = computer assisted 

     MA = manually assisted

 

     Interaction types:

 

     PI = personal interviewing (person to person)

     SI = self interviewing (respondent reads the questions)

     TI = telephone interviewing (person to person on the phone) 

     TO = touchtone interviewing (respondent talks on the phone to a

     machine that discerns touchtones)

     VI = voice recognition interviewing (respondent talks on the

     phone to a machine that discerns voices)

 

From these we get various possibilities, old and new:

 

     CAPI = computer assisted personal interviewing

     CASI = computer assisted self interviewing

     CATI = computer assisted telephone interviewing

     CATO = computer assisted touchtone interviewing

 

     MAPI = manually assisted personal interviewing

     MASI = manually assisted self interviewing

     MATI = manually assisted telephone interviewing

 

 

A third aspect in some cases is how the data are sent to the

processing center:

 

     MA = mail

     NE = network (wide area computer network)

     TE = telephone line (direct line to computer)

 

                                  94

 

 

     The diskette form of submission is preferred, but not mandatory. 

Respondents have the option of filing a paper form.  Now, there are

approximately 70 utilities required to report for approximately 125

reactors, and all reports are filed on diskette.

 

Factors Affecting Choice of Method

 

     The major advantages of the diskette collection are:

 

     Data accuracy has been improved by (1) editing the data as they

     are keyed and (2) in some instances, data entry by technical

     rather than clerical personnel.  The second reason suggests that

     a higher level of technology in data collection may result in the

     availability of a higher level of respondent skill to complete

     the survey.

 

     More data, including data of a more complex nature, can be

     collected using the diskettes compared to using paper forms.

 

     Data are available sooner.

 

     In planning such a system, government agencies must be careful to

create a system that does not require or endorse a particular brand of

hardware or software.  Software licensing agreements also must be

carefully reviewed to ensure they are not violated when software is

provided to respondents.

 

                                  93

 

 

 

Appendix VI.E. Glossary of Technical Terms

 

286, 386       Short for 80286, 80386.

 

80286, 80386   Microprocessors from Intel used in PCIS.

 

ASCII          American Standard Code for Information Interchange; a

               seven bit representation of alphanumeric characters and

               control codes.

 

ASCII file     A file with ASCII codes; loosely, a text file.

 

AT             The name for the second microprocessor generation of

               personal computers.  These personal computers use the

               80286 microprocessor.

 

Audit trail    A record of changes made to a data set over its

               lifetime.

 

Authoring      Computer software that allows a non-computer

system         programmer to write a CAPI survey questionnaire

               instrument.

 

Batch          Computer processing with no human involvement after

               start-up; the opposite of interactive.

 

Baud           Baud rate; the number of times per second that a signal

               in a communications channel changes states; often

               confused with bps.

 

Benchmark      The use of some standard computer program (e.g., a sort

               program) to measure the use of computer resources in a

               particular environment.  This could include

               computational speed and storage resources.

 

Bit            Binary digit; symbolically, a one or zero.

 

bps            Bits per second; the number of bits transmitted each

               second over a communications channel.

 

Bridge         A communications channel between two technically

               similar networks.

 

Byte           Eight bits.

 

CAPI           Computer Assisted Personal Interviewing is a personal

               interview usually conducted at the home or business of

               the respondent using a portable computer.

 

                                  96

 

 

 

Case           Portion of the CAPI software that handles the

management     administrative management of the survey. This portion

               usually includes keeping track of the status of each

               interview, interviewer assignments, and other similar

               administrative tasks.

 

CASI           Computer Assisted Self Interviewing (CASI) involves

               data collection without the direct presence of an

               interviewer.  CASI can take several different forms

               which are differentiated by the means of collection. 

               These include Prepared Data Entry (PDE) where the

               respondent answers questions displayed on a computer

               terminal; Touchtone Data Entry (TDE) where the

               respondent answers computer generated questions by

               pressing buttons on a telephone; and Voice Recognition

               Entry (VRE) where the respondent answers questions by

               speaking directly into a telephone.

 

CASIC          Computer Assisted Survey Information Collection.

 

CATI           Computer Assisted Telephone Interviewing CATI) is a

               computer assisted survey process which uses the

               telephone for voice communications between the

               interviewer and the respondent.

 

 

CCITT          Consulting Committee for International Telephony and

               Telegraphy; standards setting organization from which

               have emerged international standards following their

               guidelines in the area of computer networks.

 

Centralized    Interviews carried out from one central location (e.g.,

               nationwide).

 

Centralized    Main or host computer provides all of the processing   

               computing power. 

 

Chip           See microchip.

 

CPU            Central processing unit; the computer part which

               interprets and executes instructions.

 

CRT            Cathode ray tube; the most common type of computer

               screen.

 

Decentralized  CATI interviews carried out from several geographically

               dispersed.locations (e.g., states).

 

                                  97

 

 

 

Distributed    Computing power is distributed over a number

processing     of computers which may be co-located or geographically

               distributed.

 

Disk           A circular, magnetized medium which holds electronic

               data.

 

Disk drive     A device which reads a disk electronically.

 

Diskette       A floppy disk.

 

DM             Direct Manipulation: A type of human-computer interface

               which accentuates the user's feeling of directly

               operating on responsive display objects. Example:

               Macintosh user interface.

 

DOS            Disk operating system; an abbreviation for MS-DOS or

               PC-DOS, the original operating system for IBM PC's.

 

Download       The processing of transferring a file from a mainframe

               computer or host to a connected personal computer or

               terminal.

 

EDI            Electronic data interchange; the automated exchange of

               business information such as invoices.

 

Establishment  Business.

 

Floppy disk    A bendable disk, usually 5 1/4 inches in diameter,

               although increasing use is being made of unbending

               disks 3 1/2 inches in diameter.

 

Gateway        A communications channel used to pass data between two

               different networks to communicate with each other.

 

Hard disk      An unbendable disk and its disk drive; holds more data

               than a floppy disk.

 

I/O            Input and output.

 

IDN            Integrated Digital Networks; digital transmission

               networks which are dedicated to voice and data.

 

ISDN           Integrated services digital network; an emerging

               technology which offers many new telecommunication

               services such as the mixing of the transmission of

               voice and data.

 

                                  98

 

 

 

File server    A computer, usually on a Local Area Network that

               provides a group of users with storage facilities to

               store and access their files.

 

GB             Gigabyte(s).

 

Gigabyte       Loosely, one billion bytes; strictly, 1,073,741,824 (2

               to the 30th power) bytes.

 

Interactive    Computer processing which prompts for and accepts human

               input.

 

KB             Kilobyte(s).

 

Kilobyte       Loosely, one thousand bytes; strictly, 1024 (2 to the

               loth power) bytes.

 

LAN            Local area network; the interconnection of

               microcomputers at one site.

 

Mainframe      A large computer; often designed to serve many users at

               one time, although some mainframes, often called

               supercomputers, are designed to provide high-speed

               computing; their purchased costs are often in excess of

               a million dollars.

 

MB             Megabyte(s).

 

Megabyte       Loosely, one million bytes; strictly, 1,048,576 (2 to

               the 20th power) bytes.

 

Microchip      A printed circuit etched on a silicon chip.

 

Microcomputer  A small computer, e.g., costing less than $10,000.

 

Microprocessor A CPU on a microchip.

 

Minicomputer   A medium sized computer; larger than a microcomputer

               but smaller than a mainframe; costing on the order of

               $100,000.

 

MS-DOS         Microsoft's DOS for PC's.

 

On-line        (1) A peripheral device is on-line when it is connected

               and ready for use; (2) involving interactive use of a

               computer.

 

One-time       Non-repeating survey.  Data is collected once, or over

               great intervals (e.g., 5-10 years).

 

Ongoing        Repetitive survey (e.g., weekly, monthly or yearly).

 

                                  99

 

 

 

PC             Personal computer; broadly speaking, any microcomputer;

               narrowly speaking, an IBM-compatible computer; even

               more narrowly speaking, IBM's first microcomputer.

 

PC-DOS         IBM's version of MS-DOS (they are virtually identical).

 

PDE            See Prepared Data Entry

 

Prepared       Prepared Data Entry ( PDE) where the respondent

Data Entry     answers questions displayed on a computer terminal.

 

Print server   A computer, usually on a Local Area Network that

               provides a group of users with a range of printing

               services.

 

Question path  See skip pattern.

 

RAM            Random access memory; the core memory for a computer's

               CPU.

 

RAM disk       RAM used as if it were disk space.

 

Sampling Unit  A selected element for data collection in a survey'

               usually selected from a defined population of units by

               a random mechanism.  In a survey of households in a

               state, the sampling unit is the household.

 

Skip pattern   The sequence questions are asked in a survey

               questionnaire instrument; this sequence is often based

               on the answer to each question.

 

Target         Collection of survey units about which you wish to

population     make some measurement, but to quantify it, a sample is

               obtained and an estimate is calculated.

 

TDE            See Touchtone Data Entry

 

Touchtone      Touchtone Data Entry (TDE) allows respondents to

Data Entry     call and answer questions posed by a computer using the

               keypad of their touchtone telephone for well-controlled

               and inexpensive collection.

 

User-friendly  Software that provides an interface to the

software       user that is simple and intuitive; thus making the

               software easily to use.

 

UNIVAC I       The name of the first digital computer in widespread

               commercial use.

 

                                  100

 

UNIX           An operatin system initially designed for small

               computers, but currently in use over a wide range of

               computers.

 

Upload         The process of transferring a file from a personal

               computer or terminal to a mainframe computer or host.

 

Voice          Voice Recognition Entry (VRE) allows respondents to

Recognition    call and answer questions posed by a computer by

Entry          speaking directly into the telephone.  The machine

               translates the incoming sounds for verification with

               the respondent and storage in a data base.

 

WAN            Wide Area Network.

 

Waterfall      A straight-forward approach to software development

methodology    by stepping through specification, design,

               implementation, debugging and testing without ever

               looking back -- as opposed to moving back and forth

               between these steps as the objectives become more

               clearly understood.

 

WYSIWYG        Pronounced whizzy-wig.  What You See Is What You Get. 

               A style of presentation to users in which the displayed

               material is essentially identical in form to the final

               product.  Example: modern word processing software.

 

XT             The name given to IBM to an early version of the

               Personal Computer which had internal disk storage

               (i.e., a hard disk) that could hold 10 or more

               megabytes of data.

 

 

                                  101

 

 

 

                       Reports Available in the 

Statistical Policy 

Working Paper Series

 

 

     1.   Report on Statistics for Allocation of Funds (Available

          through NTIS Document Sales, PB86-211521/AS)

     2.   Report on- Statistical Disclosure and Disclosure-Avoidance

          Techniques (NTIS Document Sales, PB86-211539/AS)

     3.   An Error Profile: Employment as Measured by the Current

          Population Survey (NTIS Document Sales PB86-214269/AS)

     4.   Glossary of Nonsampling Error Terms: An Illustration of a

          Semantic Problem in Statistics (NTIS Document Sales, PB86-

          211547/AS)

     5.   Report on Exact and Statistical Matching Techniques (NTIS

          Document Sales, PB86-215829/AS)

     6.   Report on Statistical Uses of Administrative Records (NTIS

          Document Sales, PB86-214285/AS)

     7.   An Interagency Review of Time-Series Revision Policies (NTIS

          Document Sales, PB86-232451/AS)

     8.   Statistical Interagency Agreements (NTIS Document Sales,

          PB86-230570/AS)

     9.   Contracting for Surveys (NTIS Document Sales, PB83-233148)

     10.  Approaches to Developing Questionnaires (NTIS Document

          Sales, PB84-105055/AS)

     11.  A Review of Industry Coding Systems (NTIS Document Sales,

          PB84-135276)

     12.  The Role of Telephone Data Collection in Federal Statistics

          (NTIS Document Sales, PB85-105971)

     13.  Federal Longitudinal Surveys (NTIS Document Sales, PB86-

          139730)

     14.  Workshop on Statistical Uses of Microcomputers in Federal

          Agencies (NTIS Document Sales, PB87-166393)

     15.  Quality in Establishment Surveys (NTIS Document Sales, PB88-

          232921)

     16.  A Comparative Study of Reporting Units in Selected Employer

          Data Systems (NTIS Document Sales, PB90-205238)

     17.  Survey Coverage (NTIS Document Sales, PB90-205246)

     18.  Data Editing in Federal Statistical Agencies (NTIS Document

          Sales, PB90-205253)

     19.  Computer Assisted Survey Information Collection (NTIS

          Document Sales, PB90-205261)

 

 

Copies of these working papers may be ordered from NTIS Document

Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650
(wp19.html)