CS50 — Lesson 5 notes — Data Structure (array, tries, linked list ,binary tree, other abstract data types, hash tables)
26-Oct-2021
Project完成了! 我又踏上了website的旅程, 繼續由CS50開始進發!!!
(最近推的人直播量和長度都上升了, 看不完… …)
Going through:: memory like a canvas, what you do with those memory
Arrays
if you already have 3 numbers in the array, how would you add one more number into the array?
→ You need to delete and add the new array with new number of data
;( It takes the time as Big O of n to insert. If the data is large, you need to take O(n)
:) It takes O(log n) for searching the element → much faster (if using binary search)
:) Easy for sortinng than the others
For Omega notation, for search and add, it could take for only omega of constant steps.
Struct typedef → create one new structure with different type of data type inside of it
* →the pointer to the address of certain data type
. → access the field inside data struct defined
These three syntax could be mixed into one syntax ->, going to array’s field’s address
Linked list
-ask for malloc for a number and a pointer to point to the next number
- the last one would point to null (0x0)←default to be nothing
nul vs null → nul <-the end of string, null → pointer value of pointing nothing
Quickly freeing the memory ←-????? if use malloc????
;( twice memories required
:( Search is linear search, more time is required(O(n)//Ω(constant))
:) Insertion is much easier O(constant)//Ω(constant)
:( Take long time to sort (linear)
:) No need a big chunk of continuous memory like array required
Each combination of (number + pointer to the next number) would be called node (data structure that encapsulates information node)
typedef struct node
{
int number,
struct node *next, <-- not a must but they usually use this one
}
node;
C does not have node as data structure, so you need to declare in line 1 so that it could recognize in line 4 and 6.
To remind you, in line 6,we add node there is to eliminating the time of typing node every time calling node.
Why it is not an int *next but struct node *next?
It would throws away the pointer following the next number if only the int was pointed to — but pointing to the node (whole stuff) would allow compiler acknowledge there are pointer following the int
start with creating the listlist *node = NULL; <-- make sure it is pointing at nothing
node *n = malloc(sizeof(node)); <-- may have garbage value, malloc wont initialize the value for you (is that mean the value is not even declare in this stage? -- > this sentence is already declaration)if (n != NULL)
{
(*n).number = 1; <-- ()go to there first
equal to n->number = 1;
n->next = NULL;
}
My explanation: if the pointer *n point to the address already stores information, go to the number field of the address and set it to 1.Then the list should point to the first value assigned
list = n;
set list to the address of stored in n
What is valgrind ./file?
Show the bytes that you used
free → free the pointer that point to certain address, so that it could point to the new one (indeed freeing the address stored in pointer)
introduce function realloc(array_to_resie, size of memory that you would allocate (malloc)) → then you don’t need to copy of the information in the array by your own
but you still need to add the new number in tmp then assign the address in tmp to list, then you need to free list
Array VS Malloc
Array(static allocation): decide size of array beforehead
Malloc(dynamic allocation): realloc could be use to resize the array
:( They are both contiguous memory, if there are no such memory exist, you may fail to store information
Linked List
The node could be stored in non-contiguous memory and uses pointer to stitch them together.
typedef struct node
{
int number,
struct node *next,
}
node;//the list point to nothing
node *list = NULL;
//find space for the first node
node *n = malloc(sizeof(node));if(n != NULL)
{
list = n;
(*n).number = 1;
n->number = 1;
n->next = NULL;
}node *n = malloc(sizeof(node));if(n != NULL)
{
list->next = n;
(*n).number = 2;
n->number = 1;
n->next = NULL;
}node *n = malloc(sizeof(node));if(n != NULL)
{
list->next->next = n; <--poorly designed(could be done in loop)
(*n).number = 2;
n->number = 1;
n->next = NULL;
}
Memory could be used more efficiently
No need to reallocate memory (reserve time)
The memory should be freed reversely.
why need to free(list) and free(list -> next)?
free(list) is freeing node 1; free(list->next) (next mean pointing to node with number 2) is freeing node 2
But this sequence is bad, as if you free list first, then computer cannot go access to list in the list->next
Fail to use list[1] to print the linked list → it is not contiguous
When do we need to free the pointers?
ONLY ADDRESS that return malloc to you require to be free → how about tmp
Trees (second dimensional)
Binary search trees

Rules:
1. Any left tree is smaller than the root(the most upper one)
2. Any right tree is larger than the root

:)Search and insert is O(log n) (if not balanced, could be O(n))
:( Memory would take three times of array
Search in Binary Tree
using bool ← stored in library stdbool.h
Why not using 0 and 1 instead of true and false
return → short circuit the code and leave, but return with value means you should return sth to the computer
Hash tables
allows you to associate keys and value
Take the input and get its output location accordingly
(take in the input → hash function) → put input into suitable bucket
What happen if the initial char is the same?
Hash table would have array of alphabets and let the data point. The element inside array would be linked list.
You can generate list friend does not john.
Hash functions take in input and produce out as location
eg, a → hash →0 // z → hash →25
You may increase the number of bucket (spaces for each one with same first and second letter)
:) Search for constant time (O(n) ← each one have the same name)← faster than linked list
:( Memory required increased dramatically
Tries (retrieval)
Create tree out of array
Use binary tree to link the node presents by array
Will end with boolean value/true values to tell where the word stops
Node could be reuse in this case

Why this one need to be series?
我認為是因為一個字都是由幾組array組成, 不一組一組地說明的話, 萬一有第二組字開頭一樣, 而中間開始不一樣電腦就不能判斷要如何group下去。
For search O(1) ← max will only be depending on the number of letter including in the name instead of the amount of data we have
:( SOOOOO many unused memory, and many memory used
Abstract data structures
data structure → queues: FIFO properties (first-in-first-out)
~ add sth: enqueue// remove sth: dequeue
-if you use array, as example, but array didn’t move the data forward ← Out of memory //// they may use linked list instead
data structure → stack: LAFO properties (last-in-first-out)
- gmail/ tray in restaurant
data type: dictionary
~ with keys and values, words and definitions