in this class we understand string UTF 8 encoding in python and how stored in memory
0:00 intro
0:54 Communication requires a global standard
1:15 UTF-8 chart
1:40 demonstration on memory storage
2:08 string storage
2:47 integer and string storage differences
3:17 latin characters example
Welcome back. In today’s lesson, we will explore how characters and strings are represented in memory.
In our previous session, we examined how the system handles integer storage. By comparing these two methods, you will gain a clearer perspective on how data is managed at a low level. This foundational knowledge is essential for our upcoming lessons on type casting, where we will learn how to convert data between different types efficiently
Strings are composed of various characters, including letters like 'a' and 'b,' as well as special symbols like question marks and currency signs. While these characters are easily accessible on our keyboards, global digital communication requires a universal standard to ensure that data is understood consistently across all systems. One such widely adopted standard is UTF-8 encoding. Python utilizes this standard by default to store characters
Let’s examine the UTF-8 encoding table. In this standard, the uppercase letter 'A' is assigned the decimal value 65. When you type a capital 'A' on your keyboard, the system stores that value—65—in a single byte of memory. It is important to remember this specific one-byte allocation for the character 'A.' Let’s look at a demonstration to see how this is handled at a low level.
Similarly, let’s locate the digit '9' in the encoding table. Notice that when you press '9' on your keyboard, it is mapped to the decimal value 57 in memory. Now, consider a string variable, such as x = "A9". In this case, the computer stores a sequence of characters. Let’s observe the following demonstration to see how these individual character values are organized and stored together in memory.
now understand the differences.
take an example x equal 9 and y equals A9.
in the first case 9 is integer. so binary value 9 is stored in memory.
in the second case 9 is character format means binary value 57 is stored in memory.
in memory level both types are different this is the point you should catch.
Let’s look further into the UTF-8 encoding table. For example, the character 'Latin Capital Letter S with Acute' is assigned the decimal value 346. To store this value, the system requires two bytes of memory rather than one. This universal logic applies to all global scripts, including Chinese, Hindi, and Japanese characters; each is assigned a specific, fixed decimal value under the UTF-8 standard. I encourage you to search for the full UTF-8 table online to appreciate the vast scale of this encoding system. You should now have a clear understanding of how Python represents and stores characters in memory. Happy learning!
visit our learning app at www.wisdomers.in for quizzes and projects