I’ve been asked quite a few times about the difference between number and character columns in Oracle. So I decided to write this article in order to explain how things work.
Many people, mainly those with programing background, assume that Oracle, like programming languages, contains character columns that work like STRING in programing languages, and number columns that work like INTEGER, DOUBLE and similar in programing languages. If this is the case, working with numbers in Oracle is much more efficient than text, the value has a static size of 4 or 8 bytes, the CPU works with number natively in math calculations, and therefore, number columns are better (when relevant).
However, this is not the case.
In Oracle, the common numeric datatype is NUMBER. This datatype is of variable length, meaning, it doesn’t always take 4 or 8 bytes. Moreover, its internal structure is totally different than integer or double or any other numeric type in programing languages. And even more importantly, there are also misconceptions about performance.
What is this NUMBER Datatype?
Let’s start with the basics, the NUMBER datatype and its parameters.
The NUMBER datatype contains, obviously, number only values. The value can be a whole number or a number with decimal point. When defining a NUMBER column, we specify NUMBER(X,Y). X represents the total number of digits in the entire number, and Y represents how many digits out of the total can be located after the decimal point. The default for X is 38 (which is the maximum Oracle allows), and for Y is 0 (whole numbers only).
How Much Space does NUMBER Take?
In order to understand how Oracle saves numbers, we will use 2 built-in functions: vsize and dump. For those of you who are not familiar with these functions, vsize returns the real size (in bytes) of a specific value, and dump shows us the real internal representation of the value.
We’ll start with a simple example, the numbers 1, 1000 and 1001, and check how much space they take (we already said that the size is dynamic). To do this we will use the vsize function:
SQL&amp;gt; create table test_number(n number); Table created. SQL&amp;gt; insert into test_number values(1); 1 row created. SQL&amp;gt; insert into test_number values(1000); 1 row created. SQL&amp;gt; insert into test_number values(1001); 1 row created. SQL&amp;gt; select n,vsize(n) from test_number; N VSIZE(N) ---------- ---------- 1 2 1000 2 1001 3
As we can see, the numbers 1 and 1000 take 2 bytes, while 1001 takes 3 bytes.
How is NUMBER Saved in Oracle?
Now let’s see how Oracle saves these numbers. To do that we’ll use the dump function.
SQL&amp;gt; select n,dump(n) from test_number; N DUMP(N) ---------- ------------------------- 1 Typ=2 Len=2: 193,2 1000 Typ=2 Len=2: 194,11 1001 Typ=2 Len=3: 194,11,2
First, we need to understand what the output of dump is. The output consists of 3 parts: Typ is the datatype, len is the size in bytes, and after the colon we see the data itself.
In this case, the type is 2, which is NUMBER. Note the size difference (as we saw with the vsize function) and the difference in the data.
Now we will understand how we get from the data we see in the output of dump to the number that Oracle saves. Warning, this is a little bit techy, I hope you’ll follow.
The data from the dump function consists of 2 parts:
- The first byte (8 bits) also consists of 2 parts: the first bit (the most significant bit) is the sign. It will be 1 if the number is positive and 0 if it is negative. The other 7 bits represent the exponent of the number, or the location of the decimal point.
- All other bytes contain the number itself in base 100, meaning, each byte contains 2 decimal digits of our number.
To visualize it, dump of a number looks like this:
SEEEEEEE DDDDDDDD DDDDDDDD …..
The first byte contains the sign bit (marked as S, 1 for positive, 0 for negative), the other bits in this byte (marked as E) are the exponent and all other bytes (marked as D) are the data.
With positive numbers, we need to do the following to translate the data we see to the actual number:
- To get the real exponent, we need to subtract 64 from the first byte (after we removed the sign bit). Also, note that the base here is 100 and not 10, so exponent X means 100x and not 10x.
- We need to subtract 1 from every data byte (every byte except the first one).
The number 1001 is represented by 3 bytes: 194,11,2. The binary form of 194 is 11000010. We can see that the number is positive and the exponent is 2 (1000010 is 66, we subtract 64 and get 2). The second byte is 11, subtract 1 and get 10. The third byte is 2, subtract 1 and get 1 (which is actually 01). Our number, then, is 0.1001 with exponent 2 (100 power by 2 is 10000), so the number is 1001.
The number 111.222 is represented by 194,2,12,23,21. We’ve already seen that 194 means positive number with exponent 2. Subtract 1 from every byte and we’ll get: 01,11,22,20. Our number is 0.01112220 with exponent 2 (which is 10000), which is 0111.2220 (or 111.222).
For negative numbers we need to perform 2 additional steps:
- The most significant bit in the first byte will be 0. In order to calculate the exponent we will need to perform ones’ complement (which is simply flipping all the ones into zeroes and vice versa). After the conversion, we will subtract 64, the same as with positive numbers.
- In the data bytes, we need to subtract 1 like in positive numbers. However, the numbers are saved as 100 minus the real number. So after subtracting 1, we will need to subtract the value from 100 in order to get the real data.
And one last thing about negative numbers, they always end with 102.
The number -12300 is 60,100,78,102. 60 in binary is 00111100 which means that this is a negative number. The ones’ complement is 1000011, which is 67, so the exponent is 3. The last byte is always 102. We will subtract 1 from the other data bytes and get 99,77. After subtracting from 100 we will get 01,23. Our number is 0.0123 with exponent 3 which is -12300.
This is Strange…
I guess that at some stage along the article you asked yourselves: OK, we got it, but why did they do it like this? Why would they do something so complicated? I don’t know the full answer, but I do know this:
- If Oracle had used numbers like in a programing language (integer, long, double, etc.), there would have been full dependency on the operating system. Numbers representation in 32-bit systems is different than in 64-bit systems and between different operating systems (Linux, Windows, different UNIX systems…). In this case it would have been very difficult to move data between different systems (even export and import).
- In the programing languages model, the biggest number we can use is 10 digits (in 32-bit systems) or 19 digits (in 64-bit systems), while in Oracle today we can save up to 38 digits. Maybe 19 digits number is large enough for most uses, but I guess that 10 digits number is not.
- One last thing, and this is what Tom Kyte answered when I asked him exactly this, is precision. If we use double or float there is some data loss. A small portion of the number might get lost (e.g. the number 1 is actually 0.99999999 which is translated to 1 when using it). In financial system this is crucial. It can cause real loses, which is really unacceptable. On the other hand, I read that numbers in Oracle can also be inaccurate in certain cases. You are welcome to read about it in Oracle Magazine of July-August 2011 on page 61 under the title “what is 1/19”. Here is a link: http://www.oraclemagazine-digital.com/oraclemagazine/20110708?pg=64#pg64
I’ll close with what I started, performance. As I stated at the beginning, there are people who think that NUMBER columns are more efficient than text (CHAR and VARCHAR2) because it is native to the CPU. Generally, this is true, but not in Oracle’s model. We saw that Oracle’s model is not really “numbers” and not native to the CPU (and us…). The database needs to perform all kind of manipulations in order to be able to do mathematical calculations. If so, the performance of numbers will not be much better than text.
I’m not telling you to save numbers as strings in the database, that is really bad. But if you want to use strings as primary key or as a main property in your table, you don’t need to add a meaningless numeric column, just to index a number instead of a string.
Bottom line, sorting a number column, or scanning an index of numbers is not much more efficient than sorting or scanning an index of strings.
Don’t believe me? Try it yourselves.
Everything I explained here about the structure of numbers is in MOS note# 1007641.6 (it seems that this note is no longer available, thanks for that jkstill)
Performance tests I did by myself.
Hope you enjoyed, even though it was techy and a bit long.