Question generation


Question generation process

The Q attributes for a specific authentication session are selected from the predefined group of M attributes available in the entire authentication process. Before we continue with explanation how the questions are formed once the set of Q attributes is known, it is necessary to consider several important criteria regarding the selection of the group of M attributes to ensure that the design with challenge questions satisfies the security, usability, and privacy requirements of the authentication system. Since the security of the overall system is directly linked to the confidentiality of the challenge question answers, the conscious decision needs to be undertaken to classify data, i.e. attribute values, as they are being created, stored or transmitted. The classification of data should determine the extent to which data needs to be controlled. This is achieved by forming the attribute metadata table containing all relevant information about the attribute, such as name, definition, data type and format, ranges of values permitted, null rules for handling blank, zero or null values, level of sensitivity. This knowledge is used in the first phase of our program to make the right choice of M attributes between numerous attributes in database, but is also advantageous in determining the format of the challenge questions since the question format depends on the type of attribute value.

According to the attribute value type, an attribute could be defined as: numerical, categorical, or complex structured. Numerical data are the most common data type used to store information. Categorical or discrete data have a finite, but possibly large, number of distinct values, with no ordering among the values, such as geographic location, profession, student courses, etc. Complex data have sophisticated data structures, set- or list-valued data, documents, multimedia data etc. Because it is more intuitive to represent and understand results obtained by statistical means, the other data types should, whenever possible, be transformed into numerical. Hence, the process of generating question formats is mainly focussed on this type of data.

Types of data attributes

Some techniques used to generate metadata for the numerical attributes are introduced and explained here. The idea is to divide the range of the attribute values into intervals by implementing some of the well-known data discretisation techniques. The range of values allowed for the given attribute could be defined logically by meaning or experimentally by extremes of the attribute values obtained using the entire data set in database. The five-number summary where each numerical attribute can be accompanied with the five numbers, i.e. quartiles of a set of data, is also used in this work as a measure of data dispersion. The five numbers, Q0, Q1, …, Q4, are minimum, the first quartile, median, the third quartile and maximum respectively. They can be used further in the questions of type "Is your attribute_name less/greater than median?", "…greater than the_first_quartile and less than the_third_quartile?", etc.
Using the 5-number summary


This information is also available to view as a PowerPoint presentation.