Neeraj Bhatia's Blog

November 20, 2010

How to Generate Random Data for Test Cases using DBMS_RANDOM in Oracle

Filed under: Oracle - General — neerajbhatia @ 08:42
Tags: , ,

Generating random numbers is a capability of any application software system, and so of Oracle database. And that’s my today’s point of discussion. I am sure most of the readers must be already aware about the Oracle-supplied DBMS_RANDOM package and its capabilities. So this post is more for me to document my learning for future references and for the novice members of our Oracle community.

There are various applications of random numbers, one of them is to prepare data set for testing purposes. As Jonathan Lewis discussed in philosophy post, one doesn’t has to wait for the various scenarios to come, he can always construct relevant test cases and develop his knowledge. Obviously first step is to build relevant and appropriate testcases where underlying data is simulated to the actual scenario. This is where random number generation comes into the picture. In another example web-based application developers may want to generate random password strings for the new registered users and later, on first login ask for password reset.

Regular blog readers must have noticed several “innovative” ways of generating random numbers through dbms_random. dbms_random package is one of the package available in Oracle to generate random numbers, I said “one of the package” as in release 10g, much-better dbms_crypto package was introduced which can also generate random numbers. Although you can write PL/SQL code to generate random numbers but Oracle suggests that because DBMS_RANDOM calls Oracle’s internal random number generator so it’s always faster and efficient.

The dbms_random package is created as part of Oracle database installation. The script dbmsrand.sql which can be found in the built-in packages source code directory ($Oracle_Home/rdbms/admin) contains the source code for this package’s specification. This script creates sys.dbms_random and its public synonym with the same name and grant execution privilege to public.

There can be two areas of discussion w.r.t. dbms_random: Oracle’s logic of random number generation and second how to use procedures to generate random numbers in various practical scenarios. The second area recently come to my attention and I think it’s worth to discuss it. Readers are welcome to read dbmsrand.sql script for understanding of Oracle’s algorithm of random number generation; I may discuss it later.

Concept

In software application terms random number generator is a computer program designed to generate a sequence of numbers or character strings that lack any pattern, i.e. appear random. There are two principal methods used to generate random numbers. One measures some physical phenomenon that is expected to be random and then compensates for possible biases in the measurement process. The other uses computational algorithms that produce long sequences of apparently random results, which are in fact completely determined by a shorter initial value, known as a seed. The latter type is often called pseudorandom number generators. A “random number generator” based solely on this way cannot be regarded as a “true” random number generator, since its output is inherently predictable.

So any pseudo-random number formula depends on the seed value to start the sequence. If you start with the same seed, you will get the same sequence of values from the formula. To create a random and unpredictable sequence, the seed must be a truly random number. To get this truly random number for the seed, most programs use the current date and time, converted to an integer value (for example, converted to the number of seconds that have elapsed since January 1, 1970). Since this is a different number every time you start the program, it makes a good seed. Some programs generate multiple seed values through some logic and then take the average of these to generate sequence of random numbers.

Getting started with dbms_random

As discussed, there are two steps to generate the random numbers: first initialize the package and second generate the actual data. By initializing we mean providing a seed value. By default, the package is initialized with the current user name, current time down to the second, and the current session. Following is an excerpt from the “dbmsrand.sql” script, which do implicit initialization.

-- initialize if needed

IF (need_init = TRUE) THEN

seed(TO_CHAR(SYSDATE,'MM-DD-YYYY HH24:MI:SS') || USER || USERENV('SESSIONID'));

If the package is seeded with the same seed and then accessed in the same way, it will produce the same results in both cases. This is important in cases when one wants to generate re-producible test case. In such a case, same seed is recommended.

Summary of dbms_random Subprograms

Following chart displaying the subprograms of DBMS_RANDOM package. The names in the ovals are procedure, while names in the rectangles are functions. Also, subprograms with shaded-boxes (i.e. Initialize, Random and Terminate) are obsolete now.

 

dbms_random_subprograms

1)   Initialize Procedure: The syntax of Initialize procedure is dbms_random.initialize(val in binary_integer). It initializes the package with a specified seed value. In 10g this procedure is obsolete as it simply calls the SEED Procedure.

2)   Seed Procedure: This procedure resets the seed that is automatically set by Oracle. This is an overloaded procedure and accepts number or character string as an input.

3)   Normal Function: Returns random numbers in a normal distribution. This function returns random numbers in a standard normal distribution.

4)   Random Function: Generates a random number greater or equal to -power(2,31) and less than power(2,31). This procedure is obsolete in 10g and replaced by Value function.

5)   Value Function: Again this function is overloaded and without any arguments it returns a random number, greater than or equal to 0 and less than 1 with 38-digit precision means 38 digit to the right of the decimal . Alternatively, this function returns a random number which is greater than or equal to low and less than high values passed as arguments.

6)   String Function: This function generates a random string. It takes two parameter opt and len which are for string type to be generated and the length of the string. Following are some pre-assigned values to the opt parameter, providing any other character as an argument will result in uppercase alpha characters.

  • ‘u’ OR ‘U’ – Returns string in uppercase alphabetic characters
  • ‘l’ OR ‘L’ – Returns string in lowercase alphabetic characters
  • ‘a’ OR ‘A’ – Returns string in mixed case alphabetic characters
  • ‘x’ OR ‘X’ – Returns string in uppercase alpha-numeric characters
  • ‘p’ OR ‘P’ – Returns string in any printable characters.

7)   Terminate Procedure: When we are finished with the package we can call Terminate procedure to close the package. However, this procedure is obsolete now so it does nothing.

Examples

So far we have covered the theoritical aspects of dbms_random package. It’s time to see some examples. Let’s start our discussion with normal() function to generate random numbers which are in normal distribution. The following code will generate 100,000 numbers which are in the normal distribution.

test@ORADB10G> exec dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select trunc(dbms_random.normal * 1000) from dual connect by level <=1e5;

It’s worth to note that I’ve called seed() procedure, thus making this code reproducible. Following graph is generated after applying the truncate function on the data. This is a famous bell-curve showing that data is following normal distribution.

Numeric Data: Let’s suppose we want to generate 5 random numbers between 1 and 10,000 with 2-digits precision.

test@ORADB10G> exec dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select trunc(dbms_random.value(1,10000),2) rand_num from dual connect by level <=5;
RAND_NUM
----------
634.58
8284.76
2318.97
2183.62
3702.22

If we want to generate random numbers less than 10,000 and divisible by 100 or in other words only numbers in hundreds. Here is the code to achieve this:

test@ORADB10G> execute dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select trunc(dbms_random.value(1,10001),-2) Rand_Hundreds from dual connect by level <=5;
RAND_HUNDREDS
-------------
600
8200
2300
2100
3700

Character String: Let’s suppose we want to generate dummy names in the form of first name space last name where both fist name and last name are 4-10 characters variable-length string in proper case (the first letter of each word in uppercase, all other letters in lowercase).

test@ORADB10G> exec dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select initcap(dbms_random.string('L',dbms_random.value(4,10)))
        || ' ' || initcap(dbms_random.string('L',dbms_random.value(4,10))) Rand_Names from dual connect by level <=5;

RAND_NAMES
-------------------------
Vgfj Lyyv
Hmtvmie Etsdpdajyg
Awzadckd Fvjji
Cridnrykc Ewpe
Umvyvpjx Xreohmnvu

Seems the names of aliens :-)

Random Dates: Generating random values are tricky and needs understanding of Julian date formats. Oracle can store dates in the Julian formats. In this format integer offsets from January 1, 4712 B.C. in the past are stored. The format mask ‘J’ can be used with date functions (TO_DATE or TO_CHAR) to convert date data into Julian dates and similarly TO_NUMBER function can be used to use Julian dates in calculations.

Having said this, you can generate random dates in a given range by finding the integer that corresponds to your desired start date, and then adding a random integer to it. For example, suppose you want to generate random dates between 1-Jan-1980 and today (i.e., 19th Nov, 2010). Here is a sample code you can use:

test@ORADB10G> execute dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select to_date(to_char(to_date('01/01/1980','dd/mm/yyyy'),'J') + trunc(dbms_random.value(1,11280)),'J') Rand_dates from dual connect by level <=5;
RAND_DATES
----------
16-DEC-81
02-AUG-05
28-FEB-87
29-SEP-86
08-JUN-91

This code is adding random numbers in the range 1-11280 to the base date (1/1/80). The maxinum value 11280 is number of days between the base date and today.

Take another example where we want to generate random dates in last one year, in other words after 19th Nov 2009.

test@ORADB10G> execute dbms_random.seed(0);
PL/SQL procedure successfully completed.

test@ORADB10G> select sysdate- trunc(dbms_random.value(1,365)) Rand_dates from dual connect by level <=5;
RAND_DATE
---------
26-OCT-10
21-JAN-10
26-AUG-10
31-AUG-10
07-JUL-10

So far we’ve discussed to generate totally random data, however if want to have some control on the data being generated, means repeation based on some logic, that’s where modulus functions comes handy. Following code can be used if you want to generate numbers between 1 to 10. Please note that data generated in this fashion is not random, but mod() function is very useful in some cases.

test@ORADB10G> select mod(rownum,5)+1 Mod_10 from dual connect by level <=10;
 MOD_10
----------
2
3
4
5
1
2
3
4
5
1

That’s really it. I’ve tried to cover different scenarios to generate random numbers, however one can write complex code that suits his scenario. I hope this makes your purpose.

Cheers!

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 159 other followers