Home / Expert Answers / Computer Science / floats-are-represented-in-binary-in-three-parts-the-sign-bit-the-expo-pa276

(Solved): Floats are represented in binary in three parts: the sign bit, the expo ...



Floats are represented in binary in three parts: the sign bit, the exponent, and the fractional part (called binary representWell start with functions that give us pieces of our 16 -bit float. Well reuse these in future functions.
Now well make soNow we want to check for these signals. As before, examining the implementation of is_infinity() may help figure out

Before implementing conversion, well want a function to help us debug. This function provides an internal view of our float.

Well want to use many of the functions weve defined above to help us convert a 32 -bit float to a 16 -bit float. If youd l

Lastly well want to create a human-readable representation of our 16 -bit float. Please use the functions weve defined buil???????

Floats are represented in binary in three parts: the sign bit, the exponent, and the fractional part (called binary representation or internal form). Using the 32 -bit float representation, the sign bit takes 1 bit, the exponent takes 8 bits, and the fractional part takes 23 bits: 00000000000000000000000000000000 In this lab, however, we will be using a modified 16-bit version of a float with much less precision. In this modified 16 -bit float, the sign bit takes 1 bit, the exponent takes 5 bits, and the fraction takes 10 bits: 0000000000000000 The following rules hold when converting our miniature float \( f \) from its internal form to base-2 exponential format: 1. If exponent \( =11111 \) and fraction \( =0 \), the value represented is \( \pm \infty \) 2. If exponent \( =11111 \) and fraction \( \neq 0 \), the value represented is \( N a N \) (not a number) 3. If exponent \( =00000, f=(-1)^{\text {sign }} \times 0 \). fraction \( \times 2^{-14} \) 4. If \( 00001<= \) exponent \( <=11110, f=(-1)^{\text {sign }} \times 1 \).fraction \( \times 2^{\text {exponent }-15} \) The exponent - 15 part of the expression above is called subtracting the float's "bias". The main difference between the 16 -bit float and the 32 -bit float is the number of bits and the value of the bias. When working with 32 -bit floats, the bias is 127 . The lab12.h file contains a type definition of this float, called float16. Note that we will only be working with in binary representation; using non-bitwise operations or conversions on is undefined. You are allowed to use functions in and any other libraries imported in lab12.h. We'll start with functions that give us pieces of our 16 -bit float. We'll reuse these in future functions. Now we'll make some functions to handle the special signals in a float. See the top of this part to learn more about these signals. First we'll implement functions that return 32 -bit floats giving these special signals. is already implemented. Examine its structure, since is very similar to implement. Now we want to check for these signals. As before, examining the implementation of is_infinity() may help figure out Before implementing conversion, we'll want a function to help us debug. This function provides an internal view of our float. It's the 16-bit equivalent of the function format_float32_bits() in lab12b.o. We'll free the returned string in our testing. We'll want to use many of the functions we've defined above to help us convert a 32 -bit float to a 16 -bit float. If you'd like to manually debug, take advantage of format_float16_bits() and the pre-defined format_float32_bits () and run some custom tests in your main( ) function. // Converts a 16 -bit float into a 32 -bit float and returns it. The 32-bit float \( / / \) needs to have (approximately) the same value as the original 16-bit float. \( / / \) We'll be looking for an accuracy of 3 decimal points \( / / \) //f the floati6 was NaN or a signed infinity, the float16 should also be NaN \( / / \) or the same signed infinity. Consider using the float32_nan() and \( / / \) float32_infinity() from before float float16_to_float32(floatli6 f) \{ ? Lastly we'll want to create a human-readable representation of our 16 -bit float. Please use the functions we've defined building up to this, otherwise it'll be really messy. We'll free the returned string in our testing.


We have an Answer from Expert

View Expert Answer

Expert Answer


Here is your answer Floats are represented in binary in three parts: the sign bit, the exponent, and the fractional part (called binary representation
We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe