20201214 updated:
Add 'when to use ... algorithm' section.
Preface
This is an ABClevel introduction of computer science algorithms. It mainly includes the following parts: general introduction, several useful data structures, Divide and Conquer algorithm, Greedy algorithm, Dynamic Programming, Graph Traversal, BackTracking, Branch and Bound, Lower Bound Theorem, and NP.
Introduction
 Algorithm: a precise statement to solve a problem on a computer

property of algorithm: Terminate in a finite amount of time
 Design of Algorithm:
 Devise a method
 Express the method (with pseudolevelcode, flowchart..)
 validate the design (proof of correctness)
 Design of algorithm doesn't including implementing the algorithm with highlevel language
 Recursion: A recursive algorithm calls itself on a smaller input.
Algorithm name(params)
begin
basic step #when the input is the smallest part
name(smaller part)
combine subsolutions of smaller part
end
 Validation: often use induction to proof the correctness
Analysis of algorithm:
 determine the time and space requirement (complexity) of algorithm
 since space(memory) becomes cheap, we focus on time complexity
 purpose of analysis:
 see if the algorithm meets the spped requirement before more investment
 compare between different algorithms with same purpose

Time complexity T(n): it is the number of operations in algorithm, as a function of input size n. In RAM model, we treat each arithmetic&boolean operations and each relation(comparison) as an operation.
 Space complexity S(n): number of space needed. (e.g., only a variable is declared, then S(n) = 1)
 Order of Growth: we care when n is large. And we ignore the constant factors. We use 'BigO'.
 BigO. If f(n)<= kg(n), then f = O(g)
 BigΩ. If f(n)>= kg(n), then f = Ω(g)
 BigΘ. If f(n)= kg(n), then f = Θ(g)
 Choose the tightest and simplest g() that you can find: keep the largest order term, drop all the other terms, drop the constant factor.
 Master Theorem: find the time complexity of recursive algorithm. Let T(n) = aT(n/b)+f(n). if
 f(n) = O(n^{(log_b a)epsilon}), then T(n) = Θ(n^{log_b a}).
 f(n) = Θ(n^{log b a}), then T(n) = Θ(n^{log_b a}*logn)
 f(n) = Ω(n^{(log_b a)+epsilon}) and af(n/b)<=cf(n) for some c<1, then T(n) = Θ(f(n))
Data Structure
 Data structure: an organization of a dataset + some operations to be performed (insert, delete..) + (actual organization of data > called builtin organization. e.g., tree)
 Design of a data structure:
 input: the specifications of data set(data type, size..) + specifications of operations
 output: the organization of data set + algorithm to perform the operations
 When we need data structure: when some operations are called many times for a dataset, it can be more efficient if we have a efficient data structure.
Stack
 The stack is a data constructure that follows 'last in, first out'(LIFO). It supports 3 operations:
 push(S,a): add new data
a
to S.  pop(S): delete and return the most recently added entry.
 top(S): return but not delete the most recently added entry.
 push(S,a): add new data
 Data organization: array with an index to next empty slot/linked list with a pointer to the most recently added entry
Queue
 The queue is a data structure that follows 'first in, first out'. It supports 2 operations:
 enqueue(Q,a): insert a new entry to Q
 dequeue(Q): return and delete the oldest added entry
 Data organization:
 Array: index head pointing to the oldest entry; index tail pointing to the next empty slot.
 linked list: one pointer head to the oldest entry; one pointer tail to the most recently added entry.
Records(class)
 Record is an aggregate of several elements called fields.
 A pointer is an address, a index of the 1st byte of the physical record in memory.

Use
new
to create a predefined record. It calls the OS to find+reserve a free chunk of memory large enough to hold an entire record. The chunk is empty and you can add field information. 
a selfreferenced record is a record where at least one filed is
pointer to a same type record
.
Linked list
 the organization is a sequence of selfreferenced record. every record has a field of pointer which point to the next record.
 Two pointers in the structure: Pstart to the first record, Pend to the last record.
 three actions: insert(), find(), delete()
 used in situations where onedirection navigation is enough. can implement stack and queue.
 Doublelinked list: add another pointer to the previous record besides the pointer to next record.
Graph
 A graph G(V,E) consists of a finite of vertices V and some edges E: directed/undirected, weighted/unweighted.
 graph can represent map, network, relationship...
 adjacency: if (x,y) in E, we say x,y are adjacent/neighbors. With direction, (x,y) is edge, then x is adjacent to y; y is adjacent from x.
 degree: undirected: the number of neighbors of x; directed: indegree: number of vertices coming to x; outdegree: number of vertices leaving x.
 simple path: a path without repeat node.
 length of path: number of its edges/the sum of weights of edges.
 distance of x,y: the length of the shortest path from x to y.
 adjacency list: an array A[1: num(V)] of pointers to linked list, where A[i] is a pointer to the start of the linked list containing all the nodes adjacenry from node i.
Tree
 Tree is undirected, connected, uncyclic graph: there is exactly one simple path between each pairs.
 root: a node is assigned as the root.

depth: the depth of x is the distance(number of edges) from root r to.
 height: the height of x is the distance from x to its furthest descendant.
 depth/height of tree: height of root r.
 level: root level = 0. then 1, 2, ... depth of x = level of x. height of tree = the largest level of descendant.
Binary tree
 A rooted tree where each node has at most 2 children(left/right child).
 each node can be represented as a record(field: data, left/right child, parent).
 perfect binary tree: a binary tree where each nonleaf node has 2 nodes and all leaves are same level. They are labelled from root as 1,2 ... top to down, left to right. (note. the level of height is from 0,...) The canonical label of children of node n is 2n and 2n+1.
 almost complete binary tree: the binary tree containing the first n nodes of a perfect binary tree (all the nonleaf level is perfect)
BinarySearchTree
 Binary Search Tree is a binary tree which has a comparator: all the nodes in the left subtree of x have keys <= key of x; all nodes in the right subtree of x have keys > key of x.
 Search(T,a): T(n) = O(depth(T)) + 1 = O(height)
Search(T,a)
begin
# T is pointer to root
nodeptr p
p = T
while (p != null) and (p != a):
if a< p: # no '=' in this ifelse
p = p.leftchild
else:
p = p.rightchild
return p # a's pointer or null if not exist
end
 Insert: must be inserted as a leaf. T(n) = T(search) + T(create a new node) = O(height) + O(c) = O(height)
procedure Insert(T,a)
nodeptr p
p = T
boolean done = False
while (done == False):
if a <= p.value:
if p.leftnode = null:
p.leftnode = new <node>
p.leftnode.value = a
done = True
else:
p = p.leftnode
end if
else:
if p.rightnode = null:
p.rightnode = new <node>
p.rightnode.value = a
done = True
else:
p.rightnode = p
end if
end if
end while
end
 Delete(T,a): The procedure is too long. I show the structure here. The time analysis mainly include 2 search(one large in 1 and one small in 5) and several constant time operation. T(n) = O(height)
 Search(T,a); if a not found, return null.
 If a found, use p as the pointer of a.
 If p is a leaf, then remove(let p.parent > null) a.
 If p has one child, replace p's location with p's child (p.parent < p.child)
 If P has two child. search the largest node in the left subtree of p. Use q as the pointer to that largest node.
 use q's value for p.
 If q has no child, then remove q.
 If q has one child, go to 4.
 q can't have two children(then it can't be the larger than its right child)
Heap
 Heap is a almostcomplete binary tree where x's value is smaller than x.child's value. Heap has a comparator and two operations: deletemin(), insert(H,a).
 Insert(H,a). The while loop takes O(height). In heap, height = logn. So T(n) = O(logn)
procedure Insert(H,a)
create a new node (n+1) and let node.value = a
let x point to this new node
while (x< x.parent) and (x not root):
swap x and x.parent 's value
x = x.parent
end while
end procedure
 Deletemin(). While loop takes O(logn) time. Other step takes constant time. So T(n) = O(logn).
function Deletemin(H)
begin
x = root of H
r = key of x
remove r from x
let n = the last node.
let x.key = n.key
remove n
# restore the heap
while x.key > any x.child.key:
swap x's key with the smaller x.child (:= p)'s key
x = p
end while
return r
end Deletemin
 Implementation heaps with array: node i is ith entry of Array. The value is Array[i].
 as perfect binary tree, children of node i is 2i and 2i+1. parent of i is i/2. Swap node i,j = swap A[i] and A[j].
 physically, insert and deletemin also takes O(logn). Because swap takes constant time, here notice that 'insert' is actually 'append' to the end, so T(append) = constant time
 Tree is conceptual implementation, array is physical implementation
 Creation of heap:
 Insert n times: T(n) = O(nlogn)
 Recursive method(not introduced): T(n) = O(n)
 Heap sort:
 build the input array into a heap
 Deletemin n times and store the return to the output list.
 T(n) = T(build) + nT(deletemin) = O(n)+nO(logn) = O(nlogn)
UnionFind
In this section we show the design process based on a special dataset and two specified operations
 dataset: n disjointed set, each set has one element. Conceptually, each set is a rooted tree. We don't care about Operations:
 Uion(A,B): union two input sets
 Find(x): find the set that contains entry x
 first physical implementation: a single array Parent[1:n]. If i is root, P[i] = 0; otherwise P[i] is the index of its parent.
Procedure U(i,j)
A[i] = j
end U
# T = O(1)
function Find(x)
r = x
while A[r] != 0:
r = A[r]
end while
return r
end find
# T = O(height)
 First revision
 Union: make the root of the larger tree the parent of smaller tree root
 Use size as the A
 When union i(larger) and j(smaller), let A[i] = A[i] + A
 T(find) = O(logn)
 Second revision: (Path Compression)
 Find: make each node on the path from x to root as a immediate child of r.
Function find(x):
int tem,s,r
while Parent[r] > 0:
r = Parent[r]
end while
# r is now root
s = x
while s != r:
tem = s
Parent[tem] = r # make path node a child of root
s = Parent[s]
end while
Divide and Conquer
 general steps
 examine the size of the problem
 if small enough, directly solve the basic problem
 if not, divide the problem into several small part
 solve each small part by recusively calling the algorithm
 merge the subsolutions of small part into a global solution
Merge Sort
Procedure MergeSort(in A[1:n],i,j; out B[1:n])
generic C[1:n]
if i == j:
C[i] = A[i]
else:
m = (i+j)/2
MergeSort(A,i,m,C)
MergeSort(A,m+1,j,C)
merge(C,i,j)
end Mergesort
Procedure Merge(in C,i,j; out B[1:n])
int k=(i+j)/2, u=i, v = k+1, w = i #u,v traverse two array, w traverse output B
while (u <= k) and (v<= j):
if C[u] < C[v]:
B[w] = C[u], w++, u++
else:
B[w] = C[v], w++, v++
end if
end while
# deal with unbalanced case
if (u>k):
B[w:j] = C[v:j]
elif (v>j):
B[w:j] = C[u:k]
endif
end Merge
 Time of Merge is O(cn). Because there is at most n times comparison and each comparison comes with some insertion. Time of Mergesort is T(n) = 2T(n/2)+cn. > T(n) = O(nlogn).
Quick Sort
 Partition the input Array A around an element a: put all entry <= a to the left of a, all entry > a to the right of a. Partitioni takes O(n) because n comparison happen.
Procedure QuickSort(in\out A[1:n],i,j)
if i==j:
return
r = partition(A) # select partition index and partition around it
QuickSort(A,i,r1)
QuickSort(A,r+1,j)
end QuickSort
 Time complexity: T(n) = T(r1) + T(nr) + O(n) # left sort+right sort+ partition
 worst case(unbalanced). r = 1. T(n) = T(0)+T(n1) + n > T(n) = n^2.
 average case. T(n) = O(nlogn)
 The constant factor of QuickSort is smaller than other sort algorithm. So it is quicker on average.
 Partition algorithm 1. At most n/2 times of swap. Other time is constant time. T(n) = O(n).
Function Partition(input: A[1:n],i,j)
p = i, q = j
while (p<q):
while (p<j) and (A[p]<=A[1])
p ++
end while
while (q>i) and (A[q]>=A[1])
q 
end while
if (p<q):
swap (A[p],A[q])
p ++
q 
end if
end while
swap(A[1],A[q])
return A
Order Statistics
 Base D&C method
Function select(A[1:n],k)
if n = 1:
return A[1]
end if
p = partition(A[1:n],k) # p is index
if p=k:
return A[p]
elif p>k:
select(A[1:p],k)
else:
select(A[p+1:],kp)
end if
end select
 Time analysis:
 worst case: T= max(c, T(r), T(nr))+cn = max(T(1),T(n1))+cn = T(n1)+cn > T(n) = cn^2.
 average: T(n) = O(nlogn)
 Sceond D&C method: quickselect.
 main structure is same, but use
wisepartition
to avoid unbalance.  T(n)<= 20cn > T(n) = O(n)
 main structure is same, but use
function wise_partition(A[1:n])
nm = get_wise_partition_element(A[1:n])
swap (A[1],A[nm])
r = partition(A) # use first A[1] for index= use nm for partition
return r
end wise_partition
function get_wise_partition_element(A[1:n])
m = n/5
divide A into arrays whose length = 5 #A[1:5],...
sort each small arrays #timecomplexity
B[1:m] = the array of middles of the sorted small arrays
mm = quickselect(B[1:m],m/2)
return mm
end
Binary Search
 We have a sorted array X[1:n] and a number
a
. We try to find its index in X. T(n) = O(logn).
funcion binary_search(X[1:n],a)
if n == 1:
return 1
else:
return 1 #not in
if (a = X[n/2]):
return n/2
elif a< X[n/2]:
binary_search(X[1:n/21,a])
else:
binary_search(X[n/2+1:],a)
end if
end binary_search
 Advantage of D&C compared to straightforward loop method:
 D&C subproblems can run parallelly to physically accelerate
Greedy Algorithm
 General Strategy
 select the best (vary from Q to Q) item from remaining input
 remove it from input and add it to output
 Selection sort: In each step, scan the remaining items and pick the smallest item. T(n) = O(n^2).

Heapsort: still greedy. But use better structure minheap. T(n) = T(create) + nT(delete_min)+n*O(c) = O(nlogn)
Function HeapSort(input A; output B):
#preprocess
H = create_heap(A);
k = 1
while k<n:
x = delete_min(H)
B.append(x)
k++
end while
end HeapSort
 Optimal Merge: use minheap. I feel minheap is good for greedy algorithm since it always search for the 'smallest' item. T(n) = O(nlogn)
Procedure OptimalMerge(input A1,...AN):
while (there are more than one array input):
find two array with smallest number of items
union the two selected arraies into one new array
 Prove nonoptimality of greedy algorithm(by counterexample)
 construct an input
 based on greedy algorithm, generate the output
 Manually give a better output solution

KnapSack Problem: (optimality) select the item with the highest price per unit. If it can fit on the capacity, take all; else, take the largest fraction that fills the capacity.
Minimum Spanning Tree
 Spanning Tree: T is a spanning tree of graph G is T has all nodes in G and each edge in T is from G. If G is weighted, the spanning tree with smallest sum of weights is called minimum spanning tree.
ComputeMST(input: W; output T)
put all nodes in W to T with no edges
while T has less than num(nodes)1 edges:
select the smallest weight edge e from W
REMOVE e FROM G
if adding e to T doesnt create a cycle:
add e to T
end if
end while
return T
end COmputeMST
 Use appropriate structure. To find the smallest edge > minheap. To check if it makes a cycle > unionfind (if they belong to same root, then cycle). In while, there are E times instead of n1 times because T can keep unchaged in many rounds. T(delete) = O(logE). T(union) = O(logn). E>n. So T(n) = Elog(E)
function ComputeMST(input: W; output T)
put all nodes in W to T with no edge
H = create_heap(1:E)
Parent[1:V] = [1,...]
while T has less than V1 edges:
e = deletemin(H) = (a,b)
r_a = find(a); r_b = find(b)
if r_a != r_b:
add e to T
Union(r_1,r_2)
end if
end while
return T
end function
 Proof of optimality. T be generated MST. T' be a MST. Transform T' into T without chaning the weights
 Select min_weight_edge e in TT'. Add this edge to T'.
 This adding makes a cycle in T'. Drop an edge e' in T'T to delete this cycle. (weight of e' must = weight of e)
 Keep doing this and finally T' > T.
 If no steps can be operated, it means there is only one MST. T = T'.
SingleSource Shortest Path (SSSP)

special path. Let Y is a set of nodes, a special path to a node outside of Y is a path where each immediate node belongs to Y. Dist[n] = the smallest distance of special path to node n. Dist[n] >= smallest distance[n]. When n is also in Y, Dist[n] = smallest distance[n].
 Greedy algorithm: let starting node s. Y={s} initially. In each step, select the node outside of Y that has smallest DIST. add this node to Y. update the DIST of other nodes. T(outfor) = n. T(innerfor) = n. T(n) = O(n^2)
Function SSSP(input W; output D)
array Y[1:n]=[0,....0]
Y[s] = 1
array D[1:n]; D[i] = W[s,i]
for i = 2 to n:
node_selected = min(D[i]Y[i] = 0)
Y[node_selected] =1
for node Y[node] = 0:
D[node] = min(D[node], D[node_selected]+W[node_selected,node])
end for
end for
return D
end SSSP
 Proof when v in Y, then D[v] = smallest distance[v]. Can be proof by induction and countradiction.
 Let the first u nodes all satisfy. Now we come to next node v.
 we assume D[v]!=d[v]. Then based on definition, d[v]< D[v].
 Since that, we can assume on the smallest path d[v], the path leave the shortest path at some point(we call it P) and go other lane. Let the portion from s to P in the smallest path as Q. Then we have
 D[P] <= Q (This is because the definition of D) < dis[v] < D[v].
 Here we get D[P] < D[v]. Based on our selection, v is next to select which means D[v] should be smallest among all outside Y node. Here's the contradiction. Proved.
Dynamic Programming

Principle of Optimality: A problem satisfy the Principle of Optimality if the subsolution of the any optimal solution of the problem is optimal solution to the subproblem. In general we prove this by contradiction(we assume one subsolution is not optimal, then it is easy to see the solution is not optimal).
 Design steps
 Develop a math notations for the problem, solutions, subsolutions
 show the proof of optimality
 Write the relationship between solution and subsolutions
 design algorithm to calculate the subsolution and solution from bottom to top (with POO, it guarantee we generate optimal solution)
Allpairs shortest path problem
 Keyspecial path: is a path from i to j where the label of each intermediary node is <=k.
 Let A(n)[i,j] be the shortest nspecial path from i to j.
 A(0)[i,j] = W[i,j].
 shortest distance from i to j is A(n)[i,j]
 POO is proved previously: subpath of shortest path is shortest.
 Reccurency Relationship:
 Divide kspecial path into two groups.
 Group A: those kspecial path that not go through node k.
 Group B: those kspecial path that goes through node k.
 so A(k)[i,j] = min(min(groupA),min(groupB))
 group A = (k1) special path
 A(k)[i,j] = min(A(k1)[i,j],groupb)
 for each kspecial path in group B, i>k>j, let Q = i>k, R = k>j. It is obvious that len(Q) = A(k1)[i,k], len(R) = A(k1)[k,j]
 min(groupb) = A(k1)[i,k]+A(k1)[k,j]
 A(k)[i,j] = min(A(k1)[i,j], A(k1)[i,k]+A(k1)[k,j])
 Divide kspecial path into two groups.
Function APSP(input: W; output: A)
for i in 1 to n:
for j in 1 to n:
A(0)[i,j] = W[i,j]
for k in 1 to n:
for i in 1 to n:
for j in 1 to n:
A(k)[i,j] = min(A(k1)[i,j], A(k1)[i,k]+A(k1)[k,j])
end for
end for
end for
return A
end APSP
 Reduce space complexity: The original space complexity is O(n^3). Since when we update A(k), we don't need A(k1). So we replace as
A(k)[i,j] = min(A(k)[i,j], A(k1)[i,k]+A(k)[k,j])
. Now space complexity is O(n^2). Time complexity doesnt change: O(n^3).
Optimal Binary Search Tree
 Return a binary search tree that the search time is minimized.
 For every missing leaf, create an imaginary node as E_n. Let r_{ij} be the index of root of node a_{ij}
 C(T) = \sum_{s=1}^n (depth(a_s) + 1) +\sum_{s=0}^n depth(E_s)
 During POO, we get C(T_{ij}) = C(T_l) + C(T_r) + W_{ij}. W_{ij} = P_{i+1}+...P_{j} + Q_{i} + ... Q_{j}. T_l = T_{ik}, T_r = T_{(k+1)j}
 So the recurrency relationship is C(T_{ij}) = min(C(T_{ik}) + C(T_{(k+1)j}) + W_{ij}); C_{ii} = 0
 Dynamic Programming tend to have T(n) = O(n^3). It is more consuming than greedy method, but more powerful. But only work when POO is true.
Graph Traversal
 Graph traversal technique visits all the nodes in a graph in a systematic way. In the case of binary tree, there are inorder, preorder, postorder search. T(n) = O(n) since all nodes get traversed once.
Procedure InOrder(T):
if T == null:
return
inorder(T.left)
visit(T)
inorder(T.right)
end inorder
 Graph search methods: Deepfirst search and Breadthfirst search.
 Sort a binary search tree: use inorder traversal to traverse the bst. We sort it. T(sort bst) = O(n).
Depth First Search
 DFS steps
 find an unvisited node v, treat it as the current node
 select an unvisited neighbor of current node, visit it, treat it as the current node
 if there is no unvisited neighbor of current node, backtrack to its parent, treat the parent as the current node.
 repeat the 2~3 step until there is no unvisited node that can be visited (means in the same tree)
 if there is still unvisited node, go back to step 1 (means nodes in other trees)

DFS implementation. We first treat with the node that comes last. So we use stack.
 When visiting a node, push it to the stack (top).
 When backtracking from a node, pop it. The new current node is the parent of the popped node, and it is the top of stack.
 when stack is empty, then no unvisited node in this tree (step1~4)
 Time complexity: each edge is dealt twice and each node is processed once so T(n) = O(E+V)
Procedure DFS(input:G)
while G has unvisited node:
node x,y,v
stack S
v = an unvisited node of G
visit(v)
push(v,S)
while S is not empty:
x = top(S)
if x has unvisited neighbor:
y = an unvisited neighbor of x
visit(y)
push(y,S)
else:
pop(S)
end if
end while
end while
end DFS
 If the graph is connected, DFS generates a spanning tree. If the weights are equal, it is then a MST.
Procedure RecursiveDFS(G,s)
visit(s)
while s has unvisited neighbor a:
RecursiveDFS(G,a)
end while
end RecursiveDFS
BreadthFirst Search
 steps
 select an unvisited node as the current node (and only this node is treated as the root node), visit it, its level is current level.
 For each node on the current level, in the order where the current level nodes are visited, visit all its unvisited neighbors. The new visited nodes form a new level.
 repeat step 2 until no more node can be visited.
 If there is still unvisited node, repeat step 1. (another tree)
 implementation: we first deal the nodes on lower level, so firstcomefirstserve: Queue.
 when visit a node, add it to the queue.
 the next current node(in step 2, the next node to be visit) is the head of queue; get it and remove it from queue.
Procedure BFS(G)
while G has unvisited node
queue Q
node x,y,v
v = an unvisited node of G # root
enqueue(v,Q)
visit(v)
while Q is not empty:
x = dequeue(Q) # the next current node
for each unvisited neighbor y of x:
enqueue(y,Q)
visit(v)
end for
end while
end while
end BFS
 Time complexity is T(n) = O(E+V)
 Application: SingleSourceShortestPath.
 The bfs path from root is the shortest path.
Biconnectivity
 Definition: The deletion of any single node still leaves the graph connected.
Backtracking
 Backtracking is a systematic method to generate all objects of a given combinatorial family (finite objects of finitesize family)
 uniform presentation
 each object in the family is represented by a array X[1:n] with a fixed n
 each element of X takes value from a predefined set S={0,1,..}
 the elements of X must satisfy some constraints C
 each Ccompliant instance of X is a object of the combinatorial family represented by (X[1:n], S, C)
 The algorithm is a DFS like traversal of the tree that represent the entire solution space.
 the root is the starting point X[1].
 Each path from root to the leaf is a object X.
 When we backtrack to the root, the algorithm finishes. We find all the path(objects)
Procedure Backtrack():
int r = 1
int X[1:n]. X[i] = a0
while r > 0:
genenext(X,r)
if X[r] = a0:
r 
elif r = n:
print(X)
else:
r ++
end if
end while
end Backtrack
Procedure Genenext(X,r):
X[r] = X[r] + 1
while X[r] <= am:
if X[r] in Constraints:
return
else:
X[r] ++
end if
end while
X[r] = a0
end Genenext
Barnch and Bound
 BB is a general optimization technique that applies where greedy and dynamic programming fail. But it is slower, it usually takes exponential time.
 general idea
 It is like a BFS for the optimal solution in the solution space. But not all leaves get expanded. Rather, a selected criteria will be used to determine which node to be expanded. （select the node with least cost to expand）
 Another criteria will tell the algorithm when the optimal solution is found. (if the cost of an entire path )
 the closer a predictor is to compliance, the faster it gets us to the optimal solution
 Predictor (approximate cost function):
 BB choose the live node with the best predictor value (least cost )
 BB then expands all its children
 BB compute predictor values for its children
 Termination: when the chosen best live node is a leaf, then terminate
Procedure BB(input: X):
Begin
E = root point
heap H
while True do
if E is a leaf node:
print(the path from E to root)
return # this is the optimal solution
else:
expand(E)
end if
if H is empty:
print(no solution)
return
end if
E = deletemin(H) #go to next point
end while
end BB
Procedure Expand(E):
begin
generate all children nodes of E
calculate predictor value of each child node # this varies question to question
insert child node to heap H
end expand
 Proof of optimality: in minimization question, if ch(chat) and c(cost function) satisfy the following two properties, the first expanding node that is also a leaf node is the optimal solution.
 for each node n, ch(n) <= c(n)
 for each leaf node, ch(n) = c(n)
 how to find a good ch?
 express c mathematically. relax some constraints.
 which makes calculation of c fast
 don't relax too much
 make sure the two properties in 5 are satisfied.
 express c mathematically. relax some constraints.
 BB can be used for finding suboptimal solutions: stop when a 'goodenough' solution is found.
Lower Bound Theorem
 l(n) is called a lower bound for problem P if every algorithm for P takes tim T(P) = OMEGA(l(n))
 we search for the highest lower bound which captures the intrinsic complexity of the problem.

The size of output is always a lower bound. The size of input is sometimes (when the input structure is random) a lower bound.

If the lower bound is l(n) and we have an algorithm whose time complexity is also l(n), we say the lower bound is achievable.

u(n) is called a upper bound for P if there is an algorithm that T(n) = O(u(n)). We want a small upper bound: until we develop an algorithm whose time complexity is a known lower bound. Then there is no improvement space. At this time, T(n) = THETA(P). This algorithm is optimal (in speed).
Lower bound on (comparisonbased) sorting algorithm
 Sorting is THETA(nlogn).
 Since we know sorting is O(nlog). We go to show the lower bound.
 here we only show the proof for comparison based sorting. But actually the theorem is universal. To do this, we build a model that fit each comparisonbased sorting algorithm.

The model is a comparison tree. It is a binary tree of comparison nodes. The time of the algorithm is the length of a path from root to leaf. So it is log(N). Here N = n!. n is the number of input. N is the size of comparison tree. So T = OMEGA(nlogn).
NP problem
 Polynomialtime problem: a problem is polynomial if there is a solution to the problem whose time is O(n^c) and c is a constant.
 Exponentialtime problem: a problem is exponentialtime if
 we can't find a solution with polynomial time
 we can find a solution whose time is O(n^{u(n)}) where u> infinity, u(n) > infinity
 Satisfiability Problem(SAT):
 input: a boolean expression F.
 yesno question: is there an assignment to F so that F equals to 1?
 definition of NPproblem: a yesno problem is said to be NP if
 its solution comes from a finite set of possibilities
 it takes polynomial time to verify the correctness of a solution
 NP algorithm. An NP algorithm is an algorithm that has 2 stages:
 a guessing stage using
choose()
to find a solution  a verification stage using polynomial time to verify the correctness of the solution
 a guessing stage using
 another definition of NPproblem: a problem is NP if there exists a NP algorithm for it.
 Here is an example: an NP algorithm for kclique
function kclique(input: G,k)
array X[1:V]
for i in 1 to V:
X[i] = choose()
end for
#verification
for i in 1 to V:
for j = i+1 to V:
if (X[i] == X[j]) or ((X[i],X[j]) is not an edge):
return False
end if
end for
end for
return Yes
end kclique
 Transform (of problem P to problem R) is an algorithm that
 takes polynomial time
 Answer(Q_P, I_P) = Answer(Q_R, I_R)
 Reduction: we say P reduces to R if there is a transform from P to R. Denote as P <=_p R. here 'p' represents polinomial time.
 If P reduces to R and R is polynomialtime, then P is polynomialtime.
 The relationship implies that R is at least as hard as P (or even harder than P). So <= here means 'easier than'.
 NPcomplete: a problem P is NPcomplete if:
 P is a NP problem
 all NP problems can reduce to P (P is 'most difficult' in all NP problems/as hard as all NP problem)
(The second criteria can become: there exists an NPcomplete problem R that can reduce to P)
 Cook's Theorem: SAT(satisfication problem) is NPcomplete. We can use this theorem to prove the NPcomplete of other problems.
 NPhard: a problem P is nphard if all NP problems can reduce to P. (That is, NPhard doesn't need to be NP. NPcomplete \in NPhard).
 Typically, if an yesno problem is NPcomplete, we say the original problem is NPhard.
When to use ... algorithm?
We should apply appropriate algorithms for each different questions. Here're some discussions.
Divide and Conquer

If a big problem can be divided into independent,nonoverlapping subproblems and we have a method to merge the subsolutions.

Else, if the subproblems are overlapping, we may try dynamic programming.
Greedy algorithm

Greedy algorithm is an optimization technique (maximize or minimize something..)

The solution must have a set of items. Greedy algorithm selects the locloptimal item one after another. Greedy algorithm in general does not guarantee the optimality.
Dynamic Programming

Dynamic programming is also for optimization. So when you see 'maximize','minimize', think about dynamic programming.
 Dynamic programming synthesis the subsolutions into solution, try all the possible splits(instead of all combinations because by using min() or similar method, only several combinations will be used) before it arrives the optimality. Dynamic programming can guarantee the global optimality with POO(proof of optimality).
 compare with Divide and conquer: dc only split at some predefined point(eg, mid point), dynamic split on each possible point and try every combination. And in dynamic, the subproblems are not solved independendtly.

We use dynamic programming when the problem can be divided into overlapping subproblems and the optimal solution can be achieved by using optimal subsolution (it is important that we can reuse previous subsolution for solution. I think that's the difference from brute force).
 Difference from Brute Force:
 Dynamic programming uses 'memorization' which means it reuses the solution of subproblems ahwile brute force doesn't.
 Dynamic programming try all splits but only applies several good combinations while brute force try all combinations.
Graph Traversal
 We need to use graph traversal(search) when we have graph structure.

Why to use graph structure: graph is a powerful and versatile structure to represent relationship.
Branch & Bound
 BB is designed for combinatorial optimization problem. You can use this when greedy and dynamic algorithm fail.