2020-12-14 updated:
Add 'when to use ... algorithm' section.

# Preface

This is an ABC-level introduction of computer science algorithms. It mainly includes the following parts: general introduction, several useful data structures, Divide and Conquer algorithm, Greedy algorithm, Dynamic Programming, Graph Traversal, BackTracking, Branch and Bound, Lower Bound Theorem, and NP.

# Introduction

1. Algorithm: a precise statement to solve a problem on a computer
2. property of algorithm: Terminate in a finite amount of time

3. Design of Algorithm:
1. Devise a method
2. Express the method (with pseudolevel-code, flowchart..)
3. validate the design (proof of correctness)
4. Design of algorithm doesn't including implementing the algorithm with high-level language
4. Recursion: A recursive algorithm calls itself on a smaller input.
Algorithm name(params)
begin
basic step #when the input is the smallest part
name(smaller part)
combine sub-solutions of smaller part
end
1. Validation: often use induction to proof the correctness

## Analysis of algorithm:

1. determine the time and space requirement (complexity) of algorithm
2. since space(memory) becomes cheap, we focus on time complexity
3. purpose of analysis:
1. see if the algorithm meets the spped requirement before more investment
2. compare between different algorithms with same purpose
4. Time complexity T(n): it is the number of operations in algorithm, as a function of input size n. In RAM model, we treat each arithmetic&boolean operations and each relation(comparison) as an operation.

5. Space complexity S(n): number of space needed. (e.g., only a variable is declared, then S(n) = 1)
6. Order of Growth: we care when n is large. And we ignore the constant factors. We use 'Big-O'.
7. Big-O. If |f(n)|<= k|g(n)|, then f = O(g)
8. Big-Ω. If |f(n)|>= k|g(n)|, then f = Ω(g)
9. Big-Θ. If |f(n)|= k|g(n)|, then f = Θ(g)
10. Choose the tightest and simplest g() that you can find: keep the largest order term, drop all the other terms, drop the constant factor.
11. Master Theorem: find the time complexity of recursive algorithm. Let T(n) = aT(n/b)+f(n). if
1. f(n) = O(n^{(log_b a)-epsilon}), then T(n) = Θ(n^{log_b a}).
2. f(n) = Θ(n^{log b a}), then T(n) = Θ(n^{log_b a}*logn)
3. f(n) = Ω(n^{(log_b a)+epsilon}) and af(n/b)<=cf(n) for some c<1, then T(n) = Θ(f(n))

# Data Structure

1. Data structure: an organization of a dataset + some operations to be performed (insert, delete..) + (actual organization of data -> called built-in organization. e.g., tree)
2. Design of a data structure:
1. input: the specifications of data set(data type, size..) + specifications of operations
2. output: the organization of data set + algorithm to perform the operations
3. When we need data structure: when some operations are called many times for a dataset, it can be more efficient if we have a efficient data structure.

## Stack

1. The stack is a data constructure that follows 'last in, first out'(LIFO). It supports 3 operations:
1. push(S,a): add new data a to S.
2. pop(S): delete and return the most recently added entry.
3. top(S): return but not delete the most recently added entry.
2. Data organization: array with an index to next empty slot/linked list with a pointer to the most recently added entry

## Queue

1. The queue is a data structure that follows 'first in, first out'. It supports 2 operations:
1. enqueue(Q,a): insert a new entry to Q
2. dequeue(Q): return and delete the oldest added entry
2. Data organization:
1. Array: index head pointing to the oldest entry; index tail pointing to the next empty slot.
2. linked list: one pointer head to the oldest entry; one pointer tail to the most recently added entry.

## Records(class)

1. Record is an aggregate of several elements called fields.
2. A pointer is an address, a index of the 1st byte of the physical record in memory.
3. Use new to create a pre-defined record. It calls the OS to find+reserve a free chunk of memory large enough to hold an entire record. The chunk is empty and you can add field information.

4. a self-referenced record is a record where at least one filed is pointer to a same type record.

1. the organization is a sequence of self-referenced record. every record has a field of pointer which point to the next record.
2. Two pointers in the structure: Pstart to the first record, Pend to the last record.
3. three actions: insert(), find(), delete()
4. used in situations where one-direction navigation is enough. can implement stack and queue.
5. Double-linked list: add another pointer to the previous record besides the pointer to next record.

## Graph

1. A graph G(V,E) consists of a finite of vertices V and some edges E: directed/undirected, weighted/unweighted.
2. graph can represent map, network, relationship...
3. adjacency: if (x,y) in E, we say x,y are adjacent/neighbors. With direction, (x,y) is edge, then x is adjacent to y; y is adjacent from x.
4. degree: undirected: the number of neighbors of x; directed: indegree: number of vertices coming to x; outdegree: number of vertices leaving x.
5. simple path: a path without repeat node.
6. length of path: number of its edges/the sum of weights of edges.
7. distance of x,y: the length of the shortest path from x to y.
8. adjacency list: an array A[1: num(V)] of pointers to linked list, where A[i] is a pointer to the start of the linked list containing all the nodes adjacenry from node i.

## Tree

1. Tree is undirected, connected, uncyclic graph: there is exactly one simple path between each pairs.
2. root: a node is assigned as the root.
3. depth: the depth of x is the distance(number of edges) from root r to.

4. height: the height of x is the distance from x to its furthest descendant.
5. depth/height of tree: height of root r.
6. level: root level = 0. then 1, 2, ... depth of x = level of x. height of tree = the largest level of descendant.

## Binary tree

1. A rooted tree where each node has at most 2 children(left/right child).
2. each node can be represented as a record(field: data, left/right child, parent).
3. perfect binary tree: a binary tree where each non-leaf node has 2 nodes and all leaves are same level. They are labelled from root as 1,2 ... top to down, left to right. (note. the level of height is from 0,...) The canonical label of children of node n is 2n and 2n+1.
4. almost complete binary tree: the binary tree containing the first n nodes of a perfect binary tree (all the non-leaf level is perfect)

## Binary-Search-Tree

1. Binary Search Tree is a binary tree which has a comparator: all the nodes in the left subtree of x have keys <= key of x; all nodes in the right subtree of x have keys > key of x.
2. Search(T,a): T(n) = O(depth(T)) + 1 = O(height)
Search(T,a)
begin
# T is pointer to root
nodeptr p
p = T
while (p != null) and (p != a):
if a< p: # no '=' in this if-else
p = p.leftchild
else:
p = p.rightchild

return p # a's pointer or null if not exist
end
1. Insert: must be inserted as a leaf. T(n) = T(search) + T(create a new node) = O(height) + O(c) = O(height)
procedure Insert(T,a)
nodeptr p
p = T
boolean done = False

while (done == False):
if a <= p.value:
if p.leftnode = null:
p.leftnode = new <node>
p.leftnode.value = a
done = True
else:
p = p.leftnode
end if
else:
if p.rightnode = null:
p.rightnode = new <node>
p.rightnode.value = a
done = True
else:
p.rightnode = p
end if
end if
end while
end
1. Delete(T,a): The procedure is too long. I show the structure here. The time analysis mainly include 2 search(one large in 1 and one small in 5) and several constant time operation. T(n) = O(height)
2. If a found, use p as the pointer of a.
3. If p is a leaf, then remove(let p.parent -> null) a.
4. If p has one child, replace p's location with p's child (p.parent <- p.child)
5. If P has two child. search the largest node in the left subtree of p. Use q as the pointer to that largest node.
6. use q's value for p.
7. If q has no child, then remove q.
8. If q has one child, go to 4.
9. q can't have two children(then it can't be the larger than its right child)

## Heap

1. Heap is a almost-complete binary tree where x's value is smaller than x.child's value. Heap has a comparator and two operations: delete-min(), insert(H,a).
2. Insert(H,a). The while loop takes O(height). In heap, height = logn. So T(n) = O(logn)
procedure Insert(H,a)

create a new node (n+1) and let node.value = a
let x point to this new node

while (x< x.parent) and (x not root):
swap x and x.parent 's value
x = x.parent
end while
end procedure
1. Delete-min(). While loop takes O(logn) time. Other step takes constant time. So T(n) = O(logn).
function Delete-min(H)
begin
x = root of H
r = key of x
remove r from x
let n = the last node.
let x.key = n.key
remove n
# restore the heap
while x.key > any x.child.key:
swap x's key with the smaller x.child (:= p)'s key
x = p
end while
return r
end Delete-min
1. Implementation heaps with array: node i is ith entry of Array. The value is Array[i].
1. as perfect binary tree, children of node i is 2i and 2i+1. parent of i is i/2. Swap node i,j = swap A[i] and A[j].
2. physically, insert and delete-min also takes O(logn). Because swap takes constant time, here notice that 'insert' is actually 'append' to the end, so T(append) = constant time
3. Tree is conceptual implementation, array is physical implementation
2. Creation of heap:
1. Insert n times: T(n) = O(nlogn)
2. Recursive method(not introduced): T(n) = O(n)
3. Heap sort:
1. build the input array into a heap
2. Delete-min n times and store the return to the output list.
3. T(n) = T(build) + nT(delete-min) = O(n)+nO(logn) = O(nlogn)

## Union-Find

In this section we show the design process based on a special dataset and two specified operations

1. dataset: n disjointed set, each set has one element. Conceptually, each set is a rooted tree. We don't care about Operations:
1. Uion(A,B): union two input sets
2. Find(x): find the set that contains entry x
2. first physical implementation: a single array Parent[1:n]. If i is root, P[i] = 0; otherwise P[i] is the index of its parent.
Procedure U(i,j)
A[i] = j
end U
# T = O(1)
function Find(x)
r = x
while A[r] != 0:
r = A[r]
end while
return r
end find
# T = O(height)
1. First revision
1. Union: make the root of the larger tree the parent of smaller tree root
2. Use -size as the A
3. When union i(larger) and j(smaller), let A[i] = A[i] + A
4. T(find) = O(logn)
2. Second revision: (Path Compression)
1. Find: make each node on the path from x to root as a immediate child of r.
Function find(x):
int tem,s,r
while Parent[r] > 0:
r = Parent[r]
end while
# r is now root
s = x
while s != r:
tem = s
Parent[tem] = r # make path node a child of root
s = Parent[s]
end while

# Divide and Conquer

1. general steps
1. examine the size of the problem
2. if small enough, directly solve the basic problem
3. if not, divide the problem into several small part
4. solve each small part by recusively calling the algorithm
5. merge the sub-solutions of small part into a global solution

## Merge Sort

Procedure MergeSort(in A[1:n],i,j; out B[1:n])
generic C[1:n]
if i == j:
C[i] = A[i]
else:
m = (i+j)/2
MergeSort(A,i,m,C)
MergeSort(A,m+1,j,C)
merge(C,i,j)
end Mergesort

Procedure Merge(in C,i,j; out B[1:n])
int k=(i+j)/2, u=i, v = k+1, w = i #u,v traverse two array, w traverse output B

while (u <= k) and (v<= j):
if C[u] < C[v]:
B[w] = C[u], w++, u++
else:
B[w] = C[v], w++, v++
end if
end while
# deal with unbalanced case
if (u>k):
B[w:j] = C[v:j]
elif (v>j):
B[w:j] = C[u:k]
endif
end Merge
1. Time of Merge is O(cn). Because there is at most n times comparison and each comparison comes with some insertion. Time of Mergesort is T(n) = 2T(n/2)+cn. -> T(n) = O(nlogn).

## Quick Sort

1. Partition the input Array A around an element a: put all entry <= a to the left of a, all entry > a to the right of a. Partitioni takes O(n) because n comparison happen.
Procedure QuickSort(in\out A[1:n],i,j)
if i==j:
return
r = partition(A) # select partition index and partition around it
QuickSort(A,i,r-1)
QuickSort(A,r+1,j)
end QuickSort
1. Time complexity: T(n) = T(r-1) + T(n-r) + O(n) # left sort+right sort+ partition
1. worst case(unbalanced). r = 1. T(n) = T(0)+T(n-1) + n -> T(n) = n^2.
2. average case. T(n) = O(nlogn)
3. The constant factor of QuickSort is smaller than other sort algorithm. So it is quicker on average.
2. Partition algorithm 1. At most n/2 times of swap. Other time is constant time. T(n) = O(n).
Function Partition(input: A[1:n],i,j)
p = i, q = j
while (p<q):
while (p<j) and (A[p]<=A)
p ++
end while
while (q>i) and (A[q]>=A)
q --
end while
if (p<q):
swap (A[p],A[q])
p ++
q --
end if
end while
swap(A,A[q])
return A

## Order Statistics

1. Base D&C method
Function select(A[1:n],k)
if n = 1:
return A
end if
p = partition(A[1:n],k) # p is index
if p=k:
return A[p]
elif p>k:
select(A[1:p],k)
else:
select(A[p+1:],k-p)
end if
end select
1. Time analysis:
1. worst case: T= max(c, T(r), T(n-r))+cn = max(T(1),T(n-1))+cn = T(n-1)+cn -> T(n) = cn^2.
2. average: T(n) = O(nlogn)
2. Sceond D&C method: quickselect.
1. main structure is same, but use wise-partition to avoid unbalance.
2. T(n)<= 20cn -> T(n) = O(n)
function wise_partition(A[1:n])
nm = get_wise_partition_element(A[1:n])
swap (A,A[nm])
r = partition(A) # use first A for index= use nm for partition
return r
end wise_partition

function get_wise_partition_element(A[1:n])
m = n/5
divide A into arrays whose length = 5 #A[1:5],...
sort each small arrays #time-complexity
B[1:m] = the array of middles of the sorted small arrays
mm = quickselect(B[1:m],m/2)
return mm
end

## Binary Search

1. We have a sorted array X[1:n] and a number a. We try to find its index in X. T(n) = O(logn).
funcion binary_search(X[1:n],a)
if n == 1:
return 1
else:
return -1 #not in
if (a = X[n/2]):
return n/2
elif a< X[n/2]:
binary_search(X[1:n/2-1,a])
else:
binary_search(X[n/2+1:],a)
end if
end binary_search    
1. Advantage of D&C compared to straightforward loop method:
1. D&C sub-problems can run parallelly to physically accelerate

# Greedy Algorithm

1. General Strategy
1. select the best (vary from Q to Q) item from remaining input
2. remove it from input and add it to output
2. Selection sort: In each step, scan the remaining items and pick the smallest item. T(n) = O(n^2).
3. Heapsort: still greedy. But use better structure min-heap. T(n) = T(create) + nT(delete_min)+n*O(c) = O(nlogn)

Function HeapSort(input A; output B):
#preprocess
H = create_heap(A);
k = 1
while k<n:
x = delete_min(H)
B.append(x)
k++
end while
end HeapSort
1. Optimal Merge: use min-heap. I feel min-heap is good for greedy algorithm since it always search for the 'smallest' item. T(n) = O(nlogn)
Procedure OptimalMerge(input A1,...AN):
while (there are more than one array input):
find two array with smallest number of items
union the two selected arraies into one new array
1. Prove non-optimality of greedy algorithm(by counter-example)
1. construct an input
2. based on greedy algorithm, generate the output
3. Manually give a better output solution
2. KnapSack Problem: (optimality) select the item with the highest price per unit. If it can fit on the capacity, take all; else, take the largest fraction that fills the capacity.

## Minimum Spanning Tree

1. Spanning Tree: T is a spanning tree of graph G is T has all nodes in G and each edge in T is from G. If G is weighted, the spanning tree with smallest sum of weights is called minimum spanning tree.
ComputeMST(input: W; output T)
put all nodes in W to T with no edges
while T has less than num(nodes)-1 edges:
select the smallest weight edge e from W
REMOVE e FROM G
if adding e to T doesnt create a cycle:
end if
end while
return T
end COmputeMST
1. Use appropriate structure. To find the smallest edge -> min-heap. To check if it makes a cycle -> union-find (if they belong to same root, then cycle). In while, there are |E| times instead of n-1 times because T can keep unchaged in many rounds. T(delete) = O(log|E|). T(union) = O(logn). |E|>n. So T(n) = |E|log(|E|)
function ComputeMST(input: W; output T)
put all nodes in W to T with no edge
H = create_heap(1:|E|)
Parent[1:|V|] = [-1,...]
while T has less than |V|-1 edges:
e = delete-min(H) = (a,b)
r_a = find(a); r_b = find(b)
if r_a != r_b:
Union(r_1,r_2)
end if
end while
return T
end function
1. Proof of optimality. T be generated MST. T' be a MST. Transform T' into T without chaning the weights
1. Select min_weight_edge e in T-T'. Add this edge to T'.
2. This adding makes a cycle in T'. Drop an edge e' in T'-T to delete this cycle. (weight of e' must = weight of e)
3. Keep doing this and finally T' -> T.
4. If no steps can be operated, it means there is only one MST. T = T'.

## Single-Source Shortest Path (SSSP)

1. special path. Let Y is a set of nodes, a special path to a node outside of Y is a path where each immediate node belongs to Y. Dist[n] = the smallest distance of special path to node n. Dist[n] >= smallest distance[n]. When n is also in Y, Dist[n] = smallest distance[n].

2. Greedy algorithm: let starting node s. Y={s} initially. In each step, select the node outside of Y that has smallest DIST. add this node to Y. update the DIST of other nodes. T(out-for) = n. T(inner-for) = n. T(n) = O(n^2)
Function SSSP(input W; output D)
array Y[1:n]=[0,....0]
Y[s] = 1
array D[1:n]; D[i] = W[s,i]
for i = 2 to n:
node_selected = min(D[i]|Y[i] = 0)
Y[node_selected] =1
for node Y[node] = 0:
D[node] = min(D[node], D[node_selected]+W[node_selected,node])
end for
end for
return D
end SSSP
1. Proof when v in Y, then D[v] = smallest distance[v]. Can be proof by induction and countradiction.
1. Let the first u nodes all satisfy. Now we come to next node v.
2. we assume D[v]!=d[v]. Then based on definition, d[v]< D[v].
3. Since that, we can assume on the smallest path d[v], the path leave the shortest path at some point(we call it P) and go other lane. Let the portion from s to P in the smallest path as Q. Then we have
4. D[P] <= Q (This is because the definition of D) < dis[v] < D[v].
5. Here we get D[P] < D[v]. Based on our selection, v is next to select which means D[v] should be smallest among all outside Y node. Here's the contradiction. Proved.

# Dynamic Programming

1. Principle of Optimality: A problem satisfy the Principle of Optimality if the sub-solution of the any optimal solution of the problem is optimal solution to the sub-problem. In general we prove this by contradiction(we assume one sub-solution is not optimal, then it is easy to see the solution is not optimal).

2. Design steps
1. Develop a math notations for the problem, solutions, subsolutions
2. show the proof of optimality
3. Write the relationship between solution and subsolutions
4. design algorithm to calculate the subsolution and solution from bottom to top (with POO, it guarantee we generate optimal solution)

## All-pairs shortest path problem

1. Key-special path: is a path from i to j where the label of each intermediary node is <=k.
1. Let A(n)[i,j] be the shortest n-special path from i to j.
2. A(0)[i,j] = W[i,j].
3. shortest distance from i to j is A(n)[i,j]
4. POO is proved previously: subpath of shortest path is shortest.
2. Reccurency Relationship:
1. Divide k-special path into two groups.
1. Group A: those k-special path that not go through node k.
2. Group B: those k-special path that goes through node k.
2. so A(k)[i,j] = min(min(group-A),min(group-B))
1. group A = (k-1) special path
2. A(k)[i,j] = min(A(k-1)[i,j],group-b)
3. for each k-special path in group B, i->k->j, let Q = i->k, R = k->j. It is obvious that len(Q) = A(k-1)[i,k], len(R) = A(k-1)[k,j]
4. min(group-b) = A(k-1)[i,k]+A(k-1)[k,j]
5. A(k)[i,j] = min(A(k-1)[i,j], A(k-1)[i,k]+A(k-1)[k,j])
Function APSP(input: W; output: A)
for i in 1 to n:
for j in 1 to n:
A(0)[i,j] = W[i,j]

for k in 1 to n:
for i in 1 to n:
for j in 1 to n:
A(k)[i,j] = min(A(k-1)[i,j], A(k-1)[i,k]+A(k-1)[k,j])
end for
end for
end for
return A
end APSP
1. Reduce space complexity: The original space complexity is O(n^3). Since when we update A(k), we don't need A(k-1). So we replace as A(k)[i,j] = min(A(k)[i,j], A(k-1)[i,k]+A(k)[k,j]). Now space complexity is O(n^2). Time complexity doesnt change: O(n^3).

## Optimal Binary Search Tree

1. Return a binary search tree that the search time is minimized.
2. For every missing leaf, create an imaginary node as E_n. Let r_{ij} be the index of root of node a_{ij}
3. C(T) = \sum_{s=1}^n (depth(a_s) + 1) +\sum_{s=0}^n depth(E_s)
4. During POO, we get C(T_{ij}) = C(T_l) + C(T_r) + W_{ij}. W_{ij} = P_{i+1}+...P_{j} + Q_{i} + ... Q_{j}. T_l = T_{ik}, T_r = T_{(k+1)j}
5. So the recurrency relationship is C(T_{ij}) = min(C(T_{ik}) + C(T_{(k+1)j}) + W_{ij}); C_{ii} = 0
6. Dynamic Programming tend to have T(n) = O(n^3). It is more consuming than greedy method, but more powerful. But only work when POO is true.

# Graph Traversal

1. Graph traversal technique visits all the nodes in a graph in a systematic way. In the case of binary tree, there are in-order, pre-order, post-order search. T(n) = O(n) since all nodes get traversed once.
Procedure InOrder(T):
if T == null:
return

inorder(T.left)
visit(T)
inorder(T.right)
end inorder
1. Graph search methods: Deep-first search and Breadth-first search.
2. Sort a binary search tree: use in-order traversal to traverse the bst. We sort it. T(sort bst) = O(n).

## Depth First Search

1. DFS steps
1. find an unvisited node v, treat it as the current node
2. select an unvisited neighbor of current node, visit it, treat it as the current node
3. if there is no unvisited neighbor of current node, backtrack to its parent, treat the parent as the current node.
4. repeat the 2~3 step until there is no unvisited node that can be visited (means in the same tree)
5. if there is still unvisited node, go back to step 1 (means nodes in other trees)
2. DFS implementation. We first treat with the node that comes last. So we use stack.

1. When visiting a node, push it to the stack (top).
2. When backtracking from a node, pop it. The new current node is the parent of the popped node, and it is the top of stack.
3. when stack is empty, then no unvisited node in this tree (step1~4)
3. Time complexity: each edge is dealt twice and each node is processed once so T(n) = O(E+V)
Procedure DFS(input:G)
while G has unvisited node:
node x,y,v
stack S
v = an unvisited node of G
visit(v)
push(v,S)
while S is not empty:
x = top(S)
if x has unvisited neighbor:
y = an unvisited neighbor of x
visit(y)
push(y,S)
else:
pop(S)
end if
end while
end while
end DFS
1. If the graph is connected, DFS generates a spanning tree. If the weights are equal, it is then a MST.
Procedure RecursiveDFS(G,s)
visit(s)
while s has unvisited neighbor a:
RecursiveDFS(G,a)
end while
end RecursiveDFS

1. steps
1. select an unvisited node as the current node (and only this node is treated as the root node), visit it, its level is current level.
2. For each node on the current level, in the order where the current level nodes are visited, visit all its unvisited neighbors. The new visited nodes form a new level.
3. repeat step 2 until no more node can be visited.
4. If there is still unvisited node, repeat step 1. (another tree)
2. implementation: we first deal the nodes on lower level, so first-come-first-serve: Queue.
1. when visit a node, add it to the queue.
2. the next current node(in step 2, the next node to be visit) is the head of queue; get it and remove it from queue.
Procedure BFS(G)
while G has unvisited node
queue Q
node x,y,v
v = an unvisited node of G # root
enqueue(v,Q)
visit(v)
while Q is not empty:
x = dequeue(Q) # the next current node
for each unvisited neighbor y of x:
enqueue(y,Q)
visit(v)
end for
end while
end while
end BFS
1. Time complexity is T(n) = O(E+V)
2. Application: Single-Source-Shortest-Path.
1. The bfs path from root is the shortest path.

## Biconnectivity

1. Definition: The deletion of any single node still leaves the graph connected.

# Backtracking

1. Backtracking is a systematic method to generate all objects of a given combinatorial family (finite objects of finite-size family)
2. uniform presentation
1. each object in the family is represented by a array X[1:n] with a fixed n
2. each element of X takes value from a pre-defined set S={0,1,..}
3. the elements of X must satisfy some constraints C
4. each C-compliant instance of X is a object of the combinatorial family represented by (X[1:n], S, C)
3. The algorithm is a DFS like traversal of the tree that represent the entire solution space.
1. the root is the starting point X.
2. Each path from root to the leaf is a object X.
3. When we backtrack to the root, the algorithm finishes. We find all the path(objects)
Procedure Backtrack():
int r = 1
int X[1:n]. X[i] = a0
while r > 0:
genenext(X,r)
if X[r] = a0:
r --
elif r = n:
print(X)
else:
r ++
end if
end while
end Backtrack

Procedure Genenext(X,r):
X[r] = X[r] + 1
while X[r] <= am:
if X[r] in Constraints:
return
else:
X[r] ++
end if
end while
X[r] = a0
end Genenext

# Barnch and Bound

1. BB is a general optimization technique that applies where greedy and dynamic programming fail. But it is slower, it usually takes exponential time.
2. general idea
1. It is like a BFS for the optimal solution in the solution space. But not all leaves get expanded. Rather, a selected criteria will be used to determine which node to be expanded. （select the node with least cost to expand）
2. Another criteria will tell the algorithm when the optimal solution is found. (if the cost of an entire path )
3. the closer a predictor is to compliance, the faster it gets us to the optimal solution
4. Predictor (approximate cost function):
1. BB choose the live node with the best predictor value (least cost )
2. BB then expands all its children
3. BB compute predictor values for its children
4. Termination: when the chosen best live node is a leaf, then terminate
Procedure BB(input: X):
Begin
E = root point
heap H
while True do
if E is a leaf node:
print(the path from E to root)
return # this is the optimal solution
else:
expand(E)
end if
if H is empty:
print(no solution)
return
end if
E = delete-min(H) #go to next point
end while
end BB

Procedure Expand(E):
begin
generate all children nodes of E
calculate predictor value of  each child node # this varies question to question
insert child node to heap H
end expand
1. Proof of optimality: in minimization question, if ch(c-hat) and c(cost function) satisfy the following two properties, the first expanding node that is also a leaf node is the optimal solution.
1. for each node n, ch(n) <= c(n)
2. for each leaf node, ch(n) = c(n)
2. how to find a good ch?
1. express c mathematically. relax some constraints.
1. which makes calculation of c fast
2. don't relax too much
3. make sure the two properties in 5 are satisfied.
3. BB can be used for finding sub-optimal solutions: stop when a 'good-enough' solution is found.

# Lower Bound Theorem

1. l(n) is called a lower bound for problem P if every algorithm for P takes tim T(P) = OMEGA(l(n))
2. we search for the highest lower bound which captures the intrinsic complexity of the problem.
3. The size of output is always a lower bound. The size of input is sometimes (when the input structure is random) a lower bound.

4. If the lower bound is l(n) and we have an algorithm whose time complexity is also l(n), we say the lower bound is achievable.

5. u(n) is called a upper bound for P if there is an algorithm that T(n) = O(u(n)). We want a small upper bound: until we develop an algorithm whose time complexity is a known lower bound. Then there is no improvement space. At this time, T(n) = THETA(P). This algorithm is optimal (in speed).

## Lower bound on (comparison-based) sorting algorithm

1. Sorting is THETA(nlogn).
2. Since we know sorting is O(nlog). We go to show the lower bound.
3. here we only show the proof for comparison based sorting. But actually the theorem is universal. To do this, we build a model that fit each comparison-based sorting algorithm.
4. The model is a comparison tree. It is a binary tree of comparison nodes. The time of the algorithm is the length of a path from root to leaf. So it is log(N). Here N = n!. n is the number of input. N is the size of comparison tree. So T = OMEGA(nlogn).

# NP problem

1. Polynomial-time problem: a problem is polynomial if there is a solution to the problem whose time is O(n^c) and c is a constant.
2. Exponential-time problem: a problem is exponential-time if
1. we can't find a solution with polynomial time
2. we can find a solution whose time is O(n^{u(n)}) where u-> infinity, u(n) -> infinity
3. Satisfiability Problem(SAT):
1. input: a boolean expression F.
2. yes-no question: is there an assignment to F so that F equals to 1?
4. definition of NP-problem: a yes-no problem is said to be NP if
1. its solution comes from a finite set of possibilities
2. it takes polynomial time to verify the correctness of a solution
5. NP algorithm. An NP algorithm is an algorithm that has 2 stages:
1. a guessing stage using choose() to find a solution
2. a verification stage using polynomial time to verify the correctness of the solution
6. another definition of NP-problem: a problem is NP if there exists a NP algorithm for it.
7. Here is an example: an NP algorithm for k-clique
function kclique(input: G,k)
array X[1:|V|]
for i in 1 to |V|:
X[i] = choose()
end for

#verification
for i in 1 to |V|:
for j = i+1 to |V|:
if (X[i] == X[j]) or ((X[i],X[j]) is not an edge):
return False
end if
end for
end for
return Yes
end kclique
1. Transform (of problem P to problem R) is an algorithm that
1. takes polynomial time
2. Reduction: we say P reduces to R if there is a transform from P to R. Denote as P <=_p R. here 'p' represents polinomial time.
1. If P reduces to R and R is polynomial-time, then P is polynomial-time.
2. The relationship implies that R is at least as hard as P (or even harder than P). So <= here means 'easier than'.
3. NP-complete: a problem P is NP-complete if:
1. P is a NP problem
2. all NP problems can reduce to P (P is 'most difficult' in all NP problems/as hard as all NP problem)

(The second criteria can become: there exists an NP-complete problem R that can reduce to P)

4. Cook's Theorem: SAT(satisfication problem) is NP-complete. We can use this theorem to prove the NP-complete of other problems.
5. NP-hard: a problem P is np-hard if all NP problems can reduce to P. (That is, NP-hard doesn't need to be NP. NP-complete \in NP-hard).
1. Typically, if an yes-no problem is NP-complete, we say the original problem is NP-hard.

# When to use ... algorithm?

We should apply appropriate algorithms for each different questions. Here're some discussions.

## Divide and Conquer

1. If a big problem can be divided into independent,non-overlapping subproblems and we have a method to merge the subsolutions.

2. Else, if the subproblems are overlapping, we may try dynamic programming.

## Greedy algorithm

1. Greedy algorithm is an optimization technique (maximize or minimize something..)

2. The solution must have a set of items. Greedy algorithm selects the locl-optimal item one after another. Greedy algorithm in general does not guarantee the optimality.

## Dynamic Programming

1. Dynamic programming is also for optimization. So when you see 'maximize','minimize', think about dynamic programming.

2. Dynamic programming synthesis the sub-solutions into solution, try all the possible splits(instead of all combinations because by using min() or similar method, only several combinations will be used) before it arrives the optimality. Dynamic programming can guarantee the global optimality with POO(proof of optimality).
3. compare with Divide and conquer: dc only split at some pre-defined point(eg, mid point), dynamic split on each possible point and try every combination. And in dynamic, the sub-problems are not solved independendtly.
4. We use dynamic programming when the problem can be divided into over-lapping sub-problems and the optimal solution can be achieved by using optimal sub-solution (it is important that we can reuse previous sub-solution for solution. I think that's the difference from brute force).

5. Difference from Brute Force:
1. Dynamic programming uses 'memorization' which means it reuses the solution of sub-problems ahwile brute force doesn't.
2. Dynamic programming try all splits but only applies several good combinations while brute force try all combinations.

## Graph Traversal

1. We need to use graph traversal(search) when we have graph structure.
2. Why to use graph structure: graph is a powerful and versatile structure to represent relationship.

## Branch & Bound

1. BB is designed for combinatorial optimization problem. You can use this when greedy and dynamic algorithm fail. 