Muchang Bahng | Duke Math

Computer Systems & Basics

Contents

System Hardware
Program Lifecycle Phases
Navigation Shell Commands
Network & Connectivity Commands
Environment Variables
Git and Github
Python Virtual Environments

System Hardware

Non-Volatile Drive Storage

A drive is basically a computer component used to store data. It may be a static storage device (e.g. a HDD or SSD) or may use removable media (e.g. thumb, disk, CD). All drives store nonvolatile data (also called nonvolatile memory, NVM), meaning that the data is not erased when the power is turned off.

A floppy disk drive is a portable circular floppy plastic/metal disk coated with iron oxide or other magnetic material. They come in many sizes ranging from 3~8 inches in diameter, with the standard capacity being 1.44MB. When inserting the floppy disk into a computer, there is a read/write head that uses a magnet to polarize the iron particles in one of two directions, each represting a 0 or 1 in binary data. The head can also read these polarities in order to retrieve data stored on the disk in the form of polarized particles. Note that the head would read the disk "circularly" as the disk rotates. Each disk would be divided into typically 40 tracks with around 8 equal sectors.
A hard disk drive (HDD) is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage and one or more rigid (hence, the name hard) rapidly rotating platters coated with magnetic material. Data is accessed in arandom-access manner, meaning that individual blocks of data can be stored and retrieved in any order. They usually come inside a metal case enclosing the entire drive (3.5-inch for computers and 2.5-inch for laptop HDDs). Since the data on the HDD is determined by the polarities of the magnetic material on the disks, it is sensitive to external magnetic fields that may corrupt the data. Furthermore, because the drive heads must align over an area of the disk in order to read or write data, and the disk is constantly spinning, there’s a delay before data can be accessed. The drive may need to read from multiple locations in order to launch a program or load a file, which means it may haveto wait for the platters to spin into the proper position multiple times before it can complete the command. If a drive is asleep or in a low-power state, it can take several seconds more for the disk to spin up to full power and begin operating. Their speeds are measured in RPM, with the normal range of desktop HDDs having 5400-7200 RPM. It is useful to know that 5400 RPM drives offer an average of 100MB/s read and 7200 RPM drives offer 120MB/s.
A Solid State Drive (SSD) is an extra step up from the HDD. From the very beginning, it was clear that hard drives couldn’t possibly match the speeds at which CPUs could operate. Latency in HDDs is measured in milliseconds, compared with nanoseconds for your typical CPU. One millisecond is 1,000,000 nanoseconds, and it typically takes a hard drive 10-15 milliseconds to find data on the drive and begin reading it. The hard drive industry introduced smaller platters, on-disk memory caches, and faster spindle speeds to counteract this trend, but there’s only so fast drives can spin. Western Digital’s 10,000 RPM VelociRaptor family is the fastest set of drives ever built for the consumer market, while some enterprise drives spun as quickly as 15,000 RPM. The problem is, even the fastest spinning drive with the largest caches and smallest platters are still achingly slow as far as your CPU is concerned. Unlike HDDs, soid state drives do not need moving parts or spinning disks (hence their name). Instead, it uses NAND flash memory, which is a type of non-volatile storage that erases data in units called blocks and rewrites data at the Byte level. It also retains data for decades, regardless of whether the device is powered on or off. It is used in not only SSD, but also USB flash drives, SD cards, mobile phones, digital cameras, tablets, and others. The most fundamental unit of storage is the flash memory cell, which uses electron thresholds to hold certain bits of information, usually three bits (called TLC - triple level cell) or four bits (called QLC - quad level cell). If is no electron charge in the cell, the cell represents a 111 (for TLC) and 1111 (for QLC). Since each cell can store multiple bits of information, they are arranged in a large array (consisting of millions of cells stacked on top of each other) into a block, leading to a typical storage between 256KB and 4MB.

Furthermore, each type of drive on a computer are assigned a device/drive letter, a single alphabetic character A through Z. Computers containing a hard drive always have that default hard drive assigned to a C: drive letter, and external drives may be assigned different letters, such as Google Drive being assigned a G: drive letter. You may also notice that when opening the command prompt on windows, the leftmost letter represents which drive you are currently on. Note that the "wmic" is an abbreviation of Windows Management Interface Command. Some commands may require you to use an elevated command prompt, which can be used by opening the cmd file as an administrator.

C:\Users\bahng>

Windows CMD	MacOS Terminal	Task
wmic LOGICALDISK LIST BRIEF		Lists all the drives on your computer. Note that the DeviceID is simply the drive letter, the DriveType has numerical encodings (2: Removable disk, 3: Fixed local disk, 4: Network disk), the FreeSpace and Size are in bytes, and the VolumeName is the name of the disk.
wmic diskdrive get status, model		Outputs the model name of the drive along with its status. If the status is OK the health is good, and if it shows 'Pred Fail' your drive may crash soon.
chkdsk c:		Checks the file system and provides a summary of issues on the drive. If the bad disk sectors is not 0, get technical assistance.
dfrgui		Opens a window that tell you all types of drives on your computer.
<letter>: (d:)		Changes the drive you are working in

We demonstrate some of the commands here.

C:\Users\bahng>wmic LOGICALDISK LIST BRIEF
  DeviceID  DriveType  FreeSpace     ProviderName  Size           VolumeName
  C:        3          838628864000                1003327844352  OS
  G:        3          71506178048                 107374182400   Google Drive

C:\Users\bahng>wmic diskdrive get status, model
  Model                      Status
  PM9A1 NVMe Samsung 1024GB  OK

Volatile, Short-Term Storage

Information travels from drives and other stores to the CPU, but the physical distance that the bits must travel across the motherboard also puts an upper limit on the retrieval speed (i.e. the speed of electromagnetic waves), especially if the distance must be covered thousands or millions of times back and forth. This limit is known as latency. This is why computers have a hierarchy of stores reserved for information that is accessed more frequently, some closer to the CPU and others even within the CPU itself!

Random Access Memory, or RAM, is short-term memory that acts as a cache for the CPU that is 50-200 times faster than a regular SSD. It is volatile, meaning that all its memory is erased when the computer shuts down. The most recent type of RAM is DDR4, then DDR3, followed by DDR2, DDR, and SDRAM. In addition the speeds of RAM is
- DDR4: 2133mhz, 2400mhz, 2666mhz, 3200mhz
- DDR3: 1066mhz, 1300mhz, 1600mhz, 1866mhz
where the mhz value represents how many times per second the RAM can access its memory. However, know that some motherboards have technical limitations to what kind of RAM speed it can handle, so in these cases, the system will throttle your faster RAM stick to meet this need. In terms of capacity, the following gives us nice benchmarks.
- 4-8 GB: Laptops for web browsing and light gaming (e.g. my Macbook Air 2019)
- 16-32 GB: Laptops for gaming (possibly heavy) and programming
- 64-128 GB: Crazy stuff.
- 256 GB: This basically means you have a RAM that is pretty much the size of a typical SSD.
Finally, there are broad categories of RAM.
- Static RAM (SRAM) requires a constant power flow in order to function and therefore doesn't need to be refreshed to keep the data intact (hence the name static). Note that this does not mean that SRAM is nonvolatile. Therefore, the SRAM is typically used in CPU caches or video cards.
- Dynamic RAM (DRAM) requires a periodic refresh of power in order to function. The capacitors that store data in DRAM gradually discharge energy (no energy means the data becomes lost). DRAM is found in systems memory and video graphics memory.
- Synchronous Dynamic RAM (SDRAM) is a DRAM that operates in sync with the CPU clock, which means that it waits for the clock signal before responding to data input. This is advantageous since the CPU can process overlapping instructions in parallel, known as pipelining (the ability to reveive an instruction before the previous instruction has been fully resolved). This allows more instructions to be completed simultaneously. By contrast, DRAM is asynchronous, which means it responsd immediately to user input.
CPU caches have a hierarchy that is divided into (from fastest to slowest) L1, L2, L3, and sometimes even L4. The CPU will check the L1 cache first to see if there is a hit (if there is, then data is retrieved extremely fast), then the L2, and so on.
- L1: 8-64 KB storage typically (but there are exceptions, i.e. the Apple M1 chip has a 192 KB L1 cache)
- L2: 256KB-8MB storage
- L3: 10-64MB storage (and sometimes up to 256MB for server chips)

To check the status of your RAM on your computer, refer to the following commands.

Windows CMD	MacOS Terminal	Task
wmic MEMORYCHIP (get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed)		Outputs relevant information about the RAM
wmic memoryship list full		Outputs a list of all specifications of each memory stick.
systeminfo \| findstr /C:"Total Physical Memory"		Outputs the total RAM memory of your computer.
systeminfo \|find "Available Physical Memory"		Outputs the available RAM memory of your computer.

For my computer, the outputs are as such. It shows two memory sticks each with 8GB of memory, a memory type of 0 which is a DDR4 (type 24 means DDR3), speeds of 3200 mhz, and TypeDetail of 128 which means the RAM is synchronous (SDRAM).

C:\Users\bahng>wmic MEMORYCHIP get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed
BankLabel  Capacity    DeviceLocator  MemoryType  Speed  TypeDetail
            8589934592  DIMM A         0           3200   128
            8589934592  DIMM B         0           3200   128

The status of your CPU can be checked with the following commands

Windows CMD	MacOS Terminal	Task
wmic cpu list full		Outputs a list of all specifications of the CPU
wmic cpu (get caption, deviceid, name, numberofcores, maxclockspeed, status)		Outputs relevant information about the CPU

Program Lifecycle Phases

[Hide]

First, we review some definitions. More on program lifecycle phases here.

Programming languages are broadly classified into two types. High-level languages are the familiar programming languages that we work with today (that allow much more abstraction), while low-level languages are very close to the hardware, such as machine language and assembly language.

Programmers write programs in source code (usually high-level languages), which are then inputted into language processors that translate them into object code (usually machine code consisting of binary). The duration in which the source code of the program is being edited is called the edit time, while the compile time is when the source code is translated into machine code by a language processor. There are three types of language processors.

A compiler is a language processor that reads the complete source program written in high-level language as a whole in one go and translates it into an equivalent program in machine language. The source code is translated to object code successfully if it is free of errors. The compiler specifies the errors at the end of the compilation with line numbers when there are any errors in the source code. The errors must be removed before the compiler can successfully recompile the source code again. (e.g. C, C++, C#, Java)
An assmebler is used to translate the program written in Assembly language (basically a low-level language with very strong correspondence between the instructions in the language and the machine code instructions) into machine code. The assembler is basically the 1st interface that is able to communicate humans with the machine. We need an assembler to fill the gap between human and machine so that they can communicate with each other. Code written in assembly language is some sort of mnemonics (instructions) like ADD, MUL, MUX, SUB, DIV, MOV and so on, and the assembler is basically able to convert these mnemonics into binary code.
An interpreter translates a single statement of the source program into machine code and executes immediately before moving on to the next line. If there is an error in the statement, the interpreter terminates its translating at that statement and displays an error message. The interpreter moves on to the next line for execution only after the removal of the error. An interpreter directly executes instructions written source code without previously converting them to an object code or machine code. (e.g. Python, Pearl, JavaScript, Ruby)

A quick compare and contrast.

Compiler	Interpreter
Takes more time to analyze source code but execution time is faster.	Takes less time to analyze source code but execution time is slower.
Debugging is harder since compiler generates error message after entire scan.	Debugging is easier since interpreter continues translating the program until error is met.
Requires a lot of memory for generating object codes.	Requires less memory because no object code is generated
Generates intermediate object code.	No intermediate object code is generated.

The result of a successful compilation is an executable, which is a program in the form of a file containing millions of lines of very simple machine code instructions (e.g. add 2 numbers or compare 2 numbers), also called processor instructions. This executable can be stored somewhere in the computer drive for future use or it may be copied immediately in a faster memory state, such as the RAM. The load time is when the OS takes the program's executable from storage and puts it into an active memory (e.g. RAM) in order to begin execution.

The CPU understands only a low level machine code language (aka native code), which is contained within the executable. The language of the machine code is hardwired into the design of the CPU hardware; it is not something that can be changed at will. Each family of compatible CPUs (e.g. the popular Intel x86 family) has its own, idiosyncratic machine code which is not compatible with the machine code of other CPU families. More information here.

Once the instruction bytes are copied from storage to RAM, the CPU can run through the steps/lines at the rate of about 2 billion lines/steps per second. This execution phase, when the CPU executes the instructions until normal termination or a crash, is called the runtime.

More on Executables

More specifically, an executable is a file that contains a list of instructions and data to cause a computer's CPU to perform indicated tasks, as opposed to the data files, which are fundamentally strings of data that must be interpreted (parsed) by a program to be meaningful. Executables usually have extension names .exe or .bat, and they can generally be run (invoked) in two ways:

The executable file can be run by simply double clicking on the file name, opening it, and having the user type commands in an interactive session of an interpter (like inputting commands in terminal window or a python shell).
Alternatively, we can start writing a program, complete writing it, and then have this program compiled into an executable to be invoked.

Some common examples of executables are:

python.exe - used to run python scripts that have the .py extension, located at C:\Users\bahng\AppData\Local\Programs\Python\Python39
pythonw.exe - used to run .pyw files for GUI programs
terminal.exe (on MacOS)
cmd.exe (on Windows OS)
py.exe - an exuecutable used to run the python.exe executable like a shortcut, located at C:\windows\py.exe

Some commands for running python scripts.

Windows CMD	MacOS Terminal	Task
python (py) -V		Checks the version of python.exe (py.exe)
python.exe (py.exe) <script>.py		Runs a python script with the executable (assuming python and py are in PATH)

Statically Typed vs Dynamically Typed Languages

Type-checking is the process of checking and verifying the type of a construct (constant, variable, array, list, object) and its usage context. It helps in minimizing the possibility of type errors in the program, and type checking may occur either at compile-time (static checking) or at run-time (dynamic checking).

Statically-Typed Languages: Since we type check during compilation, every detail about the variables and all the data types must be known before we do the compiling process. Once a variable is assigned a type, it can't be assigned to some other variable of a different type, and so the data type of a declared variable is fixed. This makes sense since in Java, C, C++, etc., the programmer must specify what the data type of each variable is by writing something like int myNum = 15;.
Dynamically-Typed Languages: Since we type-check during runtime, there is no need to specify the data type of each variable while writing code, which improves writing speed. These languages have the capability to identify the type of each variable during run-time, so we do not need to declare the data types of variables. In these languages, variables are bound to objects at run-time using assignment statements, and most modern languages (e.g. JavaScript, Python, PHP, etc.) are dynamically typed.

Navigation Shell Commands

[Hide]

I work in both the Mac and Windows operating systems, which requires me to know two sets of commands when accessing the command line.

Windows CMD	MacOS Terminal	Task
dir	ls	Lists files and folders in current directory
dir /ad	ls -a	Lists ALL files and folders in current directory
cd	pwd	Full path of current folder/directory
cd < path to directory >	cd < path to directory >	Change folder/directory
cd ..	cd ..	One directory up in directory tree
cd \	cd /	Move to root directory
mkdir (rmdir) newFolder	mkdir (rmdir) newFolder	Create (delete) new directory in current directory
echo < text > > filename(.txt)	cat > fileName(.txt)	Create new file with <text> written inside. If file type is not specified, default is .txt.
cls	clear	Clear the terminal screen
type	cat	Concatenate and print a file
ren oldName newName	mv oldName newName	Rename a file or directory
robocopy (move) FileOrFolderPath <path to destination directory>	cp -r (mv) myFolder <path to destination>	Copy (move) a directory to destination directory
wmic LOGICALDISK LIST BRIEF		Lists all the drives on your computer
<letter>: (d:)		Changes the drive you are working in
	nano fileName	Open file in nano, which would encode the binary data differently. This is especially fun since you can open non-text files and output them in a different encoding, even if the encoding is complete nonsense. Deleting a few lines in a, say mp3 file opened in nano can corrput the file, leading to distortions in the original mp3 audio.

Network & Connectivity Commands

[Hide]

When your device connects to a router in a LAN, it has a local IP address (usually starting with 172.30. or 192.168.). Packets of data leaves your comptuer address, from a port (out of the 2¹⁶=65,536 ports), to the router local IP address, known as the default gateway. This router then connects to the internet, which now has another address (which can be Googled) and forwards the packets through there.

In addition to the IP address, the MAC (Media Access Control)/physical address is used to uniquely define the physical address of a computer, while the IP address identifies the connection of the device on the network. The MAC address of a computer cannot be changed with time and environment, while the IP address modifies with time and environment. More info on the difference between them is found here.

Windows CMD	MacOS Terminal	Task
ping (-t) www.google.com	ping www.google.com	Checks the ping time to www.google.com by sending small multiple-byte packets of data. Measures in milliseconds. The "-t" pings the address forever until manually stopped.
ping 192.168.0.1	ping 192.168.0.1	Checks the ping time to an IP address in your local network by sending small multiple-byte packets of data. Make sure that they are both on the same network, with the same subnet mask, and firewalls are turned off.
ipconfig (/all)		Shows your IP address and related information about your device within the network.
ipconfig /displaydns		Displays the DNS cache of your system
ipconfig /flushdns		Flushes the DNS cache of your system.
tracert www.google.com		Traces the route it takes for a packet to reach a destination and shows information about each hop along that route. If you’re having issues connecting to a website, tracert can show you where the problem is occurring.
netstat -an	nettop	Displays a list of all open network connections on the computer, along with the port they're using and the foreign IP address they're connected to.
nslookup www.google.com	nslookup www.google.com	Find the IP address associated with a domain
ipconfig /release		Forces your network adapter (internal computer hardware that connects to network) to drop its assigned IP address. Note that this disconnects your wifi.
ipconfig /renew		Renews the network adapter's IP address
arp -a	arp -a	Shows all the devices connected to your network.

Environment Variables

[Hide]

When in an environment (whether it'd be a base or a virtual one), the environment has certain characteristics that are stored in what we call the environment variables. A list of then can be outputted with the commands below.

Windows CMD	MacOS Terminal	Task
set		Lists all the environment variables
echo %<VARIABLE>%		Returns the value of the environment variable

A few of the first variables that are outputted on my system is shown here.

C:\Users\bahng>set
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\bahng\AppData\Roaming
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files

They are all in the form

   NAME=VALUE

where the NAMEs are conventionally written in uppercase (&VARIABLE for MacOS and %VARIABLE% for Windows), and the VALUEs are strings. The point of these variables is to communicated to programs how the machine is set up and sometimes to control the behavior of programs (e.g. where the home directory is, what user is logged in, etc.). One notable environment variable is the PATH environment variable, which is basically a set of directories where executable programs are located. More specifically, it specifies the directories in which executable programs are located on the machine that can be started without knowing and typing the whole path to the file on the command line. Note that the shell does not check subdirectories in PATH, only that directory itself. We can also modify which file extensions (other than the common .exe, .bat, etc.) is supported by editing %PATHEXT%. One can check this this set of directories with the following commands:

Windows CMD	MacOS Terminal	Task
echo %PATH%	echo $PATH	Specifies the directories in which executables are located on the machine
echo %PATHEXT%	echo $PATHEXT	Specifies the extensions that are supported by PATH and can therefore be called directly in cmd/terminal.

For example, calling echo %PATHEXT% on my Windows laptop returns the following supported executable extensions:

C:\Users\bahng>echo %PATHEXT%
.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

On MacOS, PATH usually holds all bin and sbin directories relevant for the current user. On Windows, it contains at least the C:\Windows and C:\Windows\system32 directories. This is why you can run calc.exe or notepad.exe from the command line (you actually don't need to include the .exe, just calc will do), but not chrome.exe since it is located in C:\Program Files\Google\Chrome\Application.
Calling the PATH environment variable on my computer returns

C:\Users\bahng>echo %PATH%
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Users\bahng\anaconda3;C:\Users\bahng\anaconda3\Library\mingw-w64\bin;C:\Users\bahng\anaconda3\Library\usr\bin;C:\Users\bahng\anaconda3\Library\bin;C:\Users\bahng\anaconda3\Scripts;C:\Users\bahng\AppData\Local\Microsoft\WindowsApps;C:\Users\bahng\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\bahng\AppData\Local\Programs\Python\Python39;C:\Users\bahng\AppData\Local\Programs\Python\Python39\Scripts

and since I have py.exe installed in C:\Windows\system32 (which is in PATH), I can call it immediately in cmd. However, the python.exe is located in C:\Users\bahng\AppData\Local\Programs\Python\Python39, which is currently not on the PATH variable and therefore cannot be called without specifying the entire path of python.exe. For convenience, we should modify the PATH variable to include the directory that python.exe is located in. Adding a directory to your PATH expands the # of directories that are searched when, from any directory, you enter a command in the shell. Unfortunately, modifying the PATH with the shell can be very dangerous, but there is an easy way to do it using a GUI explained very well through this link. All environment variables can be changed as explained in the link using the GUI (in the control panel).

Git and Github

[Hide]

Here is the general list of commands that you will use when working with git from the command line.

When you are starting a new git repository, whether its creating a new repo from scratch or from an existing project, go into the directory and run git init. This can be done in two ways: Go into the directory through terminal/cmd (using the cd) command and run git init. Go into the directory through the graphical user interface, right-click, and open the git bash. Then run git init.
Do git add (git add filename.ext for a specific file or git add . for everything in the directory) to add all the relevant files in the staging area.
Do git status to check which files have been added and which changes are being tracked.
You should create a .gitignore file right away to indicate all of the files you don't want to track. You should add this too with git add .gitignore.
Do git commit -m "Your comment" to commit your changes (i.e. have git take a "snapshot" of your work in this timeline).
Connect your local git repository to github. Go to Github, log in to your account, and click the new repository button on the top-right.Then follow the instructions on the screen, which should tell you to type in a command like this:
```
    >>> git remote add origin https://github.com/username/new_repo
    
```
Push your changes into your remote repo by doing git push origin master (or sometimes, git push -u origin master). Sometimes, the branch may be called main, so you would do git push origin main.

To see the log of your changes in git, do git log or git log --oneline, which should show you a list of all the commits that have occured. Press enter to go down, and press q to exit the log. Now let us talk about working with branches.

If we want to create a new branch, we simply type git branch myBranch
We can see all of our branches with git branch -a. The branch with the asterick next to it is the one you are currently on.
To move branches, do git checkout myBranch. Within this new branch, we can edit our files, do git add, git commit, and everything else completely separately from the master branch.
To merge the myBranch branch to the master branch, first go into the master branch by doing git checkout master and then doing git merge myBranch. If there are no conflicts, the branches will merge successfully, and you can work with your changes in myBranch in the master branch. If there are conflicts, you should go into the files, edit the code accordingly, and then merge it again to the master branch.
To delete the branch myBranch, first go into the master branch and do git branch -d myBranch. This will give you a warning whether you really want to delete it, and you can confirm it by doing git branch -D myBranch.

Python Virtual Environments & FileSystem

[Hide]

Remember that an environment really contains three things:

A Lib\site-packages directory that contains all the packages needed for the project.
- A module is a bunch of code in a file with the extension .py, in which there may be functions, classes, or variables defined.
- A package is a directory of a collection of modules (i.e. the folder that contains all modules).
- A library is an umbrella term referring to a reusable chunk of code, containing a collection of related modules and packages. It is sometimes used interchangeable with packages.
A scripts directory that contains various executables that will run your scripts with the help of packages imported.
- python.exe
- activate.bat
- pip.exe
- wheel.exe
Whatever scripts you have written as a part of your program.

Note that the scripts you have written as a part of your program is not within the virtual environment folder; it only contains the executable scripts and packages. Usually, the scripts would be contained one step above in the file tree of the virtual environment all within a bigger project folder.

We must also distinguish two types of environments.

A base environment, which handles Python scripts on your entire system. The path for this environment is similar for most systems, but mine is C:\Users\bahng\AppData\Local\Programs\Python\Python39. Note that the Python directory contains the scripts, while the Python39 directory is technically known as the environment.
Multiple virtual environments, which are isolated environments designed for specific projects. The key benefit of virtual environments is that they isolate each of your projects from any other installation of Python on your computer, allowing multiple versions of Python to coexist without stepping on each other. However, they can be decenetralized, making it hard to manage and requiring activation/deactivation in a terminal-basis. Some of my own virtual environments are listed in different directories, but venv folder has the path: C:\Users\bahng\PycharmProjects\HW_Problem_Generator\venv.

Thankfully, there are multiple package managers that allows us to organize our environments. In addition to the information below, this link is also a great tutorial. It is conventional to use venv somewhere in the environment name, and it is easily available in ignore files like .gitignore.

Some common virtual environment management programs.

Venv is the default virtual environment module for Python 3 and one of the easiest modules for creating virtual environments. It comes pre-installed in Python3 or newer.

Windows CMD	MacOS Terminal	Task
python -m venv <path_to_venv_directory/NAME>		Creates a new virtual environment directory named NAME in current directory by default or in path_to_venv_directory specified.
activate.bat		Activates the virtual environment. Remember to be in the scripts directory in order to run the executable.
deactivate.bat		Deactivates the virtual environment. Remember to be in the scripts directory in order to run the executable.

Note that one you have activated the virtual environment, the environment name should pop up in parantheses in your shell as such.

C:\User\bahng>python -m venv testvenv
C:\User\bahng>cd testvenv\Scripts
C:\Users\bahng\testvenv\Scripts>activate.bat
(venv) C:\Users\bahng\venv\Scripts> _

Remember that virtual environments are just directories that contain a library of site-packages and a collection of executables. They can be installed anywhere on your computer, even in directories that have nothing to do with Python scripts.

Virtualenv is a third-party dependency manager used for creating and managing Python projects, but used mainly for Python 2 so we will not elaborate here. Note that virtual environments do not come as a native feature for Python 2.

Conda is another package manager that specializes in data science. You can download it by installing Anaconda or Miniconda (a mini-version of Anaconda that contains only Conda and its dependencies). Again, for ease of use, make sure you add conda to the PATH variable. Note that when you install conda, a directory called anaconda3 is made which contains a new base environment and additional virtual environment directories. Listing all the environments I have returns

C:\Users\bahng>conda env list
# conda environments:
#
base                     C:\Users\bahng\anaconda3
testenv               *  C:\Users\bahng\anaconda3\envs\testenv

The asterick just means that I am currently in (i.e. have activated) the testenv environment.

Windows CMD	MacOS Terminal	Task
conda info		Outputs information on conda
conda -V		Outputs version of conda
conda update conda		Updates conda
conda install (update) PACKAGE		Install (update) PACKAGE with conda
conda create --name NAME		Create a virtual environment called NAME. Note that regardless of which directory you are in, conda will create all virtual environments in anaconda3\envs.
conda activate (deactivate) ENV		Activate (deactivate) the environment named ENV
conda env list		Get a list of all conda environments, active environment shown with asterick.
conda create --clone ENV --name ENV2		Clones an environment
conda list		List all packages & versions installed in active environment
conda env remove --name ENV		Delete an environment and everyhing in it
conda install --name ENV2 PKG		Install new package PKG in a different environment ENV2
conda remove --name ENV PKG1 PKG2		Remove one or more packages from environment ENV

A conda cheatsheet.

Pip is a package manager used to install, uninstall, and organize different Python packages on your computer. First

Windows CMD	MacOS Terminal	Task
pip install (uninstall) PKG		Install (uninstall) a package directly from PyPI into the current active environment using pip. However, this may not install all the dependencies of the package, so the command below is recommended.
python -m pip install (uninstall) PKG		Installs a package and all of its dependencies. The uninstall command does not uninstall the dependencies however.
python -m pip install PKG --upgrade		Upgrades package to its latest version
python -m pip install --upgrade pip		Update pip to its latest version
python -m pip list		List all packages & versions installed in active environment
pip show PKG		Shows information about package PKG
python -m pip search PKG		Searches for the PKG package.
pip --help		Returns the full list of pip options.

A final reminder that each instance of python, pip, and all the other packages installed is only updated within its environment! Updating, pip (for instance) in the base environment will not update pip in any one of the virtual environments.