Muchang Bahng | Duke Math

If you were to invent computer science from scratch, how would I do it? We can divide them into three chronological pillars. First is the theory of algorithms. Second is the construction of computers. Third is the applications of algorithms.

The design and analysis of algorithms are much older than computers in general, having existed as early as 2000 BC when Egyptians developed methods to multiply two numbers (even more primitive cookbooks can be considered algorithms). But in order to do any algorithm, we need a way to store data, whether it's through writing in a piece of paper or storing in computer memory. This requires the use of data structures which, coupled with algorithms, can solve whatever problems that came up in our lives. Note that none of these data structures or algorithms require the use of a computer at all. Graphs could be drawn on a piece of paper, a stack of accounting bills on your desk is literally the stack data structure, and a long list of names and ages of citizens in a tablet can be considered an array (erasing someone off from a tablet would leave an empty space). These different forms of bookkeeping, through different data structures, have advantages and disadvantages.

When mechanical computers were invented in 1822 followed by the first electronic comptuers in 1943, we could rapidly scale these algorithms. Computing prime numbers did not have to be done by hand, and all the theory developed before then can be applied much more widely. But if you look at all the components in your computer, it's baffling to wonder how humanity could create such a complex and intricate object. I mean, certain smartphones that fit into your pocket is able to store the equivalent of 8 trillion on-off switches! This is why the next pillar deals with the construction of computers, which starts off with the most basic component: a transitor. This requires computer engineering and physics knowledge, but from here we slowly build upon abstractions to instruction sets and CPU design. The instruction set of a CPU is strongly tied to the architecture of a computer, and so we must introduce the Assembly language as a way to interact with the computer. Once we introduce the concept of memory (e.g. how different data types are stored), input/output devices, and databuses, this completes the von Neumann architecture of a computer. Finally, we talk about the rest of the computer, such as disks (how data persists even after the computer is turned off), GPUs, etc. Once we are finished with the computer engineering part, we now know how a computer is built. Finally, we talk about some low-level programs such as the firmware that is required to run before a computer turns on.

We have successfully built a computer, but we are not out of the construction pillar just yet. Just because we have a functional computer, it doesn't mean that it is very user-friendly. Therefore, an abstraction over the different architectures is the C programming language (plus C++ since I use it frequently). But to work with languages at all, we must need some way to turn the human-friendly language into machine code that the computer can process. This is the job of compilers, which constitutes another set of notes. Furthermore, computers are not monolithic objects that can operate one task at a time. There is a need to control multiple programs/processes and execute them concurrently, which is the job of the operating system (OS) and opens up a whole new can of worms on how to manage these processes (e.g. virtual memory, mutex locks, etc.). Note that the OS is simply another process, and in order to start the OS, the previously mentioned firmware must activate a bootloader that finds the correct operating system, and so bootloaders such as GRUB will be talked about here. Once the OS has started, this is a natural point to introduce shells (terminals) as a way to interact with the computer through the OS. Finally, we can introduce virtualization software like virtual machines (VMs) and containers (e.g. Docker) which isolate different processes as if they were running with their own operating system.

Great, now that we know how algorithms work and how to run them on computers, we can focus on the application of algorithms, our third pillar. I introduce the two most common languages in the world, Python and JavaScript, which are much higher level and used the most often when developing software or doing research. A powerful feature of the computer is the ability to connect to the world wide web, but to understand this we need to know how computers talk to each other, i.e. computer networking. On a closely related note, networking now allows multiple computers to work on one task, which is the topic of distributed systems. With the explosion of computing power and networks all over the world, the need for databases to store this exponentially growing data also motivates its own area of study.

On an individual level, there are several meta-programming tools that a developer runs into when working with code. First, you need a text editor, and I give a quick guide on NeoVim which is my go-to. Second, you need good version control, which is a systematic way to update code when working in a team, and git is universally used for this. Good version control almost always comes with automated processes and tests (usually run on Docker containers), known as continuous integration/development (CI/CD). You will see all of these concepts in (mostly) industry and (less in) research.

Finally I end with some miscellaneous applications such as blockchain and SLAM.

All of my personal notes are free to download, use, and distrbute under the Creative Commons "Attribution- NonCommercial-ShareAlike 4.0 International" license. Please contact me if you find any errors in my notes or have any further questions.

Data Structures and Algorithms [CS201, CS330]

Array-Based Data Structures: Arrays, Lists, (Singly, Doubly, Circular) Linked Lists, Stacks, Queues
Hash-Based Data Structures: Universal Hashing, Sets, Maps
Tree-Based Data Structures: Binary Trees, Trees, Heaps, Binary Search Trees, Tries, Recursion
Brute Force Algorithms: Arithmetic, Lists, Cryptography, Matrix Operations
Greedy Algorithms: Interval Scheduling, Huffman Coding
Divide and Conquer: Recursion, Merge Sort, Inversions, Selection, Quick Sort, Closest Pair of Points, Karatsuba, Strassen, Polynomial Multiplication
Graphs: DFS, BFS, DAGs, Topological Sorting, Dijkstra, Kosaraju's SCC, Bellman Ford, Graph Colorings, MST, Prim, Kruskal
Dynamic Programming: Longest Increasing Subsequence, 0/1 Knapsack, Line Breaking, Bellman Ford
Randomized Algorithms: Primality Testing
Flows and Cuts: Max-Flow Min-Cut Theorem, Max Flows
Linear Programming:

Computability and Complexity Theory

Introduction: Computers, Internet Protocol, Networks, UDP/TCP, DNS/HTTPS, Binary Representation of Data
Information Theory: Shannon Entropy, Data Compression
Finite Computation: AON-CIRC Programs, Syntactic Sugar, NAND-CIRC Programs
Uniform Computation: Infinite Functions, Deterministic Finite Automata, Regular Expressions, Turing Machines, Turing Completeness, Cellular Automata, Lambda Calculus, Uncomputable Functions
Efficient Algorithms: Modeling Running Time, Time Hierarchy Theorem, Non-Uniform Computation, Polynomial Time Reductions, Traveling Salesman Problem, Graph Problems, 3SAT, NP Hard/Completeness

Advanced Algorithms

Graph Algorithms
Geometric Algorithms
Probabilistic Algorithms

Quantum_Computing [PHYS 627]

Quantum Mechanics: Bra-ket Notation, State Space, Unitary Evolution, General & Projective Measurements, POVM, Distinguishable Quantum States, Composite Systems as Tensor Product Spaces, Heisenberg Uncertainty Principle, Entanglement
Qubits & Quantum Circuits: Superposition, Born's Rule, Change of Computational Basis, Bloch Sphere, Global & Relative Phase, Logic Gates (Pauli matrices, Hadamard, CNOT, etc.), No-Cloning Theorem, Bell States, Teleportation
Quantum Algorithms: Reversible extensions of classical gates, Quantum Parallelism, Deutch-Jozsa Algorithm

Architecture

Transistors, Circuits
How are numbers stored?
Processor Design
Ram
Disk, how memory is stored when machine is powered off
Firmware
Von Neumann Architecture
GPUs
Drivers

Assembly

C++

Basics: Variables, Expressions, Functions, Scope
Translation: Preprocessing, Directives, Compilation, Linking, Header Files, Namespaces
Types: Casting
Constant Expressions: Compiler Optimization, Constants, Compile-Time Programming, Constexpr

Compilers

Threads, Processes
Mutexes
Virtual Memory
Virtual Machines
Containers (Docker)
Shells and terminals

Operating Systems

Python

Names and Values: Rebinding vs Mutating, Immutabale and Mutabale Types
Function Closures and Variable Scopes
Lists: Implementation, Similar Built-In Data Structures
Hashing: Implementation, Similar Built-In Data Structures
Iterators and Loops
Item Assignment with Walrus Opreator
Raising Exceptions
Decorators
Composing CLasses

Javascript

Document Object Model (DOM)
JavaScript Runtime Engine
Events and Event Handling
Asynchronous Handling: Callback functions, Promises, API Handling, Async/Await

Networks

Distributed Systems

High Performance Computing (SLURM)
Edge Computing
Internet of Things

Databases [CS316]

Relational Algebra:
Functional Dependencies:
Entity-Relation Diagrams:
SQL

Tools

Text Editors
Git
CI/CD

Blockchain

Cryptography & Encoding: Symmetric, Asymmmetric Encryption, SHA256 Hash Function, Digital Signatures, Elliptic Curve Cryptography, Base58Check, WIF Format
Cryptocurrency Wallets: Bitcoin Keys, Addresses, WIF-Compression, Hierarchical Deterministic Wallets, (Hardened) (Extended) Child Key Derivation, HD Wallet Key Path
Transaction Chains: UTXO, Fees, Orphan Transaction Pool
Standard Transactions: Locking/Unlocking Scripts, Script Language, P2PKH, P2SH, P2PK, MultiSig, OP_RETURN Scripts
Bitcoin Network: Full Nodes, SPV nodes, Bootstrapping, UTXO pools, networking, three-way handshake
Blockchain: Block Header, Linking Blocks, Merkle Trees & Paths
Mining: Coinbase transaction, Proof-of-Work, Difficulty Targeting, Network Block Validation, Blockchain Forks, Hash Rate, Mining Pools, Consensus Attacks

Simultaneous Localization and Mapping [CS394]

Hardware: Sensors (GPS, LiDAR), Cameras, IMUs, Lens, Wireless Connections, Sparse/Dense Maps
Image Processing: Feature Points, ORB Algorithm, Segmentation, Image Pyramids, Camera Parameterization, World/Camera Coordinates, Intrinsic/Extrinsic Matrix
Transformations: Lie Groups, Lie Algebras, SO(3), SE(3), Euler Angles, Quaternions, Exponential Map
Frontend: Visual Odometry, Keyframes, Mappoints, Epipolar Geometry, Triangulation, Essential/Fundamental Matrix, 8-Point Algorithm,
Backend: Filters, Extended Kalman Filter (EKF), Nonlinear Optimization, Graph Optimization, Bundle Adjustment, Sliding Window Filters, Pose Graphs, Sparse Optimization
Beyond: Loop Closing, EdgeSLAM, Multi-Agent Collaborative SLAM