Coding of numbers in computers: Complete file

2023-11-07 23:00:00

As processors and more generally digital electronic systems operate only on binary digits called bits, number processing requires coding.

The different representations involve several aspects:

the presentation of formats;

basic arithmetic operations (addition, subtraction, multiplication, division) with possible overflow problems;

the instructions of the instruction sets for the different operations, with variations depending on the instruction sets.

N-bit integer formats represent unsigned or signed integers. If positive numbers always have the same representation, different representations of negative numbers have been defined: sign and absolute value, 1’s complement, 2’s complement, the latter being the only one used for decades. For integer operations, the number of output bits is different from the number of input bits. An n-bit + n-bit addition provides an n+1-bit result and an n-bit * n-bit multiplication provides a 2n-bit result. Dealing with carries (addition) or high-n bits (multiplication) poses problems for both scalar instructions and SIMD instructions of different instruction sets.

The single-precision (32-bit) and double-precision (64-bit) float formats have been standardized since the mid-1980s (IEEE 754). They are presented along with newer 16-bit and 8-bit reduced formats used in deep neural networks. Block float formats and decimal floats are also presented. The Posit format, proposed by JL Gustafson as an alternative to the use of IEEE 754 floating point formats, is presented and discussed.

While integer and float formats that have existed for decades are used in general-purpose processors, computational models such as neural networks and energy consumption issues have led to the emergence of reduced formats that can be added to certain general-purpose instruction sets or implemented in specialized processors, IPs, FPGAs, etc.

1699637241
#Coding #numbers #computers #Complete #file

Leave a Replay