Computer Systems & Basics
Non-Volatile Drive Storage
A
drive is basically a computer component used to store data. It may be a static storage device (e.g. a HDD or SSD) or may use removable media (e.g. thumb, disk, CD). All drives store
nonvolatile data (also called nonvolatile memory, NVM), meaning that the data is not erased when the power is turned off.
- A floppy disk drive is a portable circular floppy plastic/metal disk coated with iron oxide or other magnetic material. They come in many sizes ranging from 3~8 inches in diameter, with the standard capacity being 1.44MB. When inserting the floppy disk into a computer, there is a read/write head that uses a magnet to polarize the iron particles in one of two directions, each represting a 0 or 1 in binary data. The head can also read these polarities in order to retrieve data stored on the disk in the form of polarized particles. Note that the head would read the disk "circularly" as the disk rotates. Each disk would be divided into typically 40 tracks with around 8 equal sectors.
- A hard disk drive (HDD) is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage and one or more rigid (hence, the name hard) rapidly rotating platters coated with magnetic material. Data is accessed in arandom-access manner, meaning that individual blocks of data can be stored and retrieved in any order. They usually come inside a metal case enclosing the entire drive (3.5-inch for computers and 2.5-inch for laptop HDDs). Since the data on the HDD is determined by the polarities of the magnetic material on the disks, it is sensitive to external magnetic fields that may corrupt the data. Furthermore, because the drive heads must align over an area of the disk in order to read or write data, and the disk is constantly spinning, there’s a delay before data can be accessed. The drive may need to read from multiple locations in order to launch a program or load a file, which means it may haveto wait for the platters to spin into the proper position multiple times before it can complete the command. If a drive is asleep or in a low-power state, it can take several seconds more for the disk to spin up to full power and begin operating. Their speeds are measured in RPM, with the normal range of desktop HDDs having 5400-7200 RPM. It is useful to know that 5400 RPM drives offer an average of 100MB/s read and 7200 RPM drives offer 120MB/s.
- A Solid State Drive (SSD) is an extra step up from the HDD. From the very beginning, it was clear that hard drives couldn’t possibly match the speeds at which CPUs could operate. Latency in HDDs is measured in milliseconds, compared with nanoseconds for your typical CPU. One millisecond is 1,000,000 nanoseconds, and it typically takes a hard drive 10-15 milliseconds to find data on the drive and begin reading it. The hard drive industry introduced smaller platters, on-disk memory caches, and faster spindle speeds to counteract this trend, but there’s only so fast drives can spin. Western Digital’s 10,000 RPM VelociRaptor family is the fastest set of drives ever built for the consumer market, while some enterprise drives spun as quickly as 15,000 RPM. The problem is, even the fastest spinning drive with the largest caches and smallest platters are still achingly slow as far as your CPU is concerned.
Unlike HDDs, soid state drives do not need moving parts or spinning disks (hence their name). Instead, it uses NAND flash memory, which is a type of non-volatile storage that erases data in units called blocks and rewrites data at the Byte level. It also retains data for decades, regardless of whether the device is powered on or off. It is used in not only SSD, but also USB flash drives, SD cards, mobile phones, digital cameras, tablets, and others.
The most fundamental unit of storage is the flash memory cell, which uses electron thresholds to hold certain bits of information, usually three bits (called TLC - triple level cell) or four bits (called QLC - quad level cell). If is no electron charge in the cell, the cell represents a 111 (for TLC) and 1111 (for QLC). Since each cell can store multiple bits of information, they are arranged in a large array (consisting of millions of cells stacked on top of each other) into a block, leading to a typical storage between 256KB and 4MB.
Furthermore, each type of drive on a computer are assigned a
device/drive letter, a single alphabetic character A through Z.
Computers containing a hard drive always have that default hard drive assigned to a C: drive letter, and external drives may be assigned
different letters, such as Google Drive being assigned a G: drive letter.
You may also notice that when opening the command prompt on windows, the leftmost letter represents which drive you are currently on. Note that the "wmic" is an abbreviation of Windows Management Interface Command. Some commands may require you to use an
elevated command prompt, which can be used by opening the cmd file as an administrator.
C:\Users\bahng>
Windows CMD |
MacOS Terminal |
Task |
wmic LOGICALDISK LIST BRIEF |
|
Lists all the drives on your computer. Note that the DeviceID is simply the drive letter, the DriveType has numerical encodings (2: Removable disk, 3: Fixed local disk, 4: Network disk), the FreeSpace and Size are in bytes, and the VolumeName is the name of the disk. |
wmic diskdrive get status, model |
|
Outputs the model name of the drive along with its status. If the status is OK the health is good, and if it shows 'Pred Fail' your drive may crash soon. |
chkdsk c: |
|
Checks the file system and provides a summary of issues on the drive. If the bad disk sectors is not 0, get technical assistance. |
dfrgui |
|
Opens a window that tell you all types of drives on your computer. |
<letter>: (d:) |
|
Changes the drive you are working in |
We demonstrate some of the commands here.
C:\Users\bahng>wmic LOGICALDISK LIST BRIEF
DeviceID DriveType FreeSpace ProviderName Size VolumeName
C: 3 838628864000 1003327844352 OS
G: 3 71506178048 107374182400 Google Drive
C:\Users\bahng>wmic diskdrive get status, model
Model Status
PM9A1 NVMe Samsung 1024GB OK
Volatile, Short-Term Storage
Information travels from drives and other stores to the CPU, but the physical distance that the bits must travel across the motherboard also puts an upper limit on the retrieval speed (i.e. the speed of electromagnetic waves), especially if the distance must be covered thousands or millions of times back and forth. This limit is known as
latency. This is why computers have a hierarchy of stores reserved for information that is accessed more frequently, some closer to the CPU and others even within the CPU itself!
- Random Access Memory, or RAM, is short-term memory that acts as a cache for the CPU that is 50-200 times faster than a regular SSD. It is volatile, meaning that all its memory is erased when the computer shuts down. The most recent type of RAM is DDR4, then DDR3, followed by DDR2, DDR, and SDRAM. In addition the speeds of RAM is
- DDR4: 2133mhz, 2400mhz, 2666mhz, 3200mhz
- DDR3: 1066mhz, 1300mhz, 1600mhz, 1866mhz
where the mhz value represents how many times per second the RAM can access its memory. However, know that some motherboards have technical limitations to what kind of RAM speed it can handle, so in these cases, the system will throttle your faster RAM stick to meet this need.
In terms of capacity, the following gives us nice benchmarks.
- 4-8 GB: Laptops for web browsing and light gaming (e.g. my Macbook Air 2019)
- 16-32 GB: Laptops for gaming (possibly heavy) and programming
- 64-128 GB: Crazy stuff.
- 256 GB: This basically means you have a RAM that is pretty much the size of a typical SSD.
Finally, there are broad categories of RAM.
- Static RAM (SRAM) requires a constant power flow in order to function and therefore doesn't need to be refreshed to keep the data intact (hence the name static). Note that this does not mean that SRAM is nonvolatile. Therefore, the SRAM is typically used in CPU caches or video cards.
- Dynamic RAM (DRAM) requires a periodic refresh of power in order to function. The capacitors that store data in DRAM gradually discharge energy (no energy means the data becomes lost). DRAM is found in systems memory and video graphics memory.
- Synchronous Dynamic RAM (SDRAM) is a DRAM that operates in sync with the CPU clock, which means that it waits for the clock signal before responding to data input. This is advantageous since the CPU can process overlapping instructions in parallel, known as pipelining (the ability to reveive an instruction before the previous instruction has been fully resolved). This allows more instructions to be completed simultaneously. By contrast, DRAM is asynchronous, which means it responsd immediately to user input.
- CPU caches have a hierarchy that is divided into (from fastest to slowest) L1, L2, L3, and sometimes even L4. The CPU will check the L1 cache first to see if there is a hit (if there is, then data is retrieved extremely fast), then the L2, and so on.
- L1: 8-64 KB storage typically (but there are exceptions, i.e. the Apple M1 chip has a 192 KB L1 cache)
- L2: 256KB-8MB storage
- L3: 10-64MB storage (and sometimes up to 256MB for server chips)
To check the status of your RAM on your computer, refer to the following commands.
Windows CMD |
MacOS Terminal |
Task |
wmic MEMORYCHIP (get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed) |
|
Outputs relevant information about the RAM |
wmic memoryship list full |
|
Outputs a list of all specifications of each memory stick. |
systeminfo | findstr /C:"Total Physical Memory" |
|
Outputs the total RAM memory of your computer. |
systeminfo |find "Available Physical Memory" |
|
Outputs the available RAM memory of your computer. |
For my computer, the outputs are as such. It shows two memory sticks each with 8GB of memory, a memory type of 0 which is a DDR4 (type 24 means DDR3), speeds of 3200 mhz, and TypeDetail of 128 which means the RAM is synchronous (SDRAM).
C:\Users\bahng>wmic MEMORYCHIP get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed
BankLabel Capacity DeviceLocator MemoryType Speed TypeDetail
8589934592 DIMM A 0 3200 128
8589934592 DIMM B 0 3200 128
The status of your CPU can be checked with the following commands
Windows CMD |
MacOS Terminal |
Task |
wmic cpu list full |
|
Outputs a list of all specifications of the CPU |
wmic cpu (get caption, deviceid, name, numberofcores, maxclockspeed, status) |
|
Outputs relevant information about the CPU |
Program Lifecycle Phases
[Hide]
First, we review some definitions. More on program lifecycle phases
here.
- Programming languages are broadly classified into two types. High-level languages are the familiar programming languages that we work with today (that allow much more abstraction), while low-level languages are very close to the hardware, such as machine language and assembly language.
- Programmers write programs in source code (usually high-level languages), which are then inputted into language processors that translate them into object code (usually machine code consisting of binary). The duration in which the source code of the program is being edited is called the edit time, while the compile time is when the source code is translated into machine code by a language processor. There are three types of language processors.
- A compiler is a language processor that reads the complete source program written in high-level language as a whole in one go and translates it into an equivalent program in machine language. The source code is translated to object code successfully if it is free of errors. The compiler specifies the errors at the end of the compilation with line numbers when there are any errors in the source code. The errors must be removed before the compiler can successfully recompile the source code again. (e.g. C, C++, C#, Java)
- An assmebler is used to translate the program written in Assembly language (basically a low-level language with very strong correspondence between the instructions in the language and the machine code instructions) into machine code. The assembler is basically the 1st interface that is able to communicate humans with the machine. We need an assembler to fill the gap between human and machine so that they can communicate with each other. Code written in assembly language is some sort of mnemonics (instructions) like ADD, MUL, MUX, SUB, DIV, MOV and so on, and the assembler is basically able to convert these mnemonics into binary code.
- An interpreter translates a single statement of the source program into machine code and executes immediately before moving on to the next line. If there is an error in the statement, the interpreter terminates its translating at that statement and displays an error message. The interpreter moves on to the next line for execution only after the removal of the error. An interpreter directly executes instructions written source code without previously converting them to an object code or machine code. (e.g. Python, Pearl, JavaScript, Ruby)
A quick compare and contrast.
Compiler |
Interpreter |
Takes more time to analyze source code but execution time is faster. |
Takes less time to analyze source code but execution time is slower. |
Debugging is harder since compiler generates error message after entire scan. |
Debugging is easier since interpreter continues translating the program until error is met. |
Requires a lot of memory for generating object codes. |
Requires less memory because no object code is generated |
Generates intermediate object code. |
No intermediate object code is generated. |
-
The result of a successful compilation is an executable, which is a program in the form of a file containing millions of lines of very simple machine code instructions (e.g. add 2 numbers or compare 2 numbers), also called processor instructions. This executable can be stored somewhere in the computer drive for future use or it may be copied immediately in a faster memory state, such as the RAM. The load time is when the OS takes the program's executable from storage and puts it into an active memory (e.g. RAM) in order to begin execution.
The CPU understands only a low level machine code language (aka native code), which is contained within the executable. The language of the machine code is hardwired into the design of the CPU hardware; it is not something that can be changed at will. Each family of compatible CPUs (e.g. the popular Intel x86 family) has its own, idiosyncratic machine code which is not compatible with the machine code of other CPU families. More information here.
Once the instruction bytes are copied from storage to RAM, the CPU can run through the steps/lines at the rate of about 2 billion lines/steps per second. This execution phase, when the CPU executes the instructions until normal termination or a crash, is called the runtime.
More on Executables
More specifically, an
executable is a file that contains a list of instructions and data to cause a computer's CPU to perform indicated tasks, as opposed to the data files, which are fundamentally strings of data that must be interpreted (parsed) by a program to be meaningful. Executables usually have extension names
.exe
or
.bat
, and they can generally be run (invoked) in two ways:
- The executable file can be run by simply double clicking on the file name, opening it, and having the user type commands in an interactive session of an interpter (like inputting commands in terminal window or a python shell).
- Alternatively, we can start writing a program, complete writing it, and then have this program compiled into an executable to be invoked.
Some common examples of executables are:
- python.exe - used to run python scripts that have the .py extension, located at
C:\Users\bahng\AppData\Local\Programs\Python\Python39
- pythonw.exe - used to run .pyw files for GUI programs
- terminal.exe (on MacOS)
- cmd.exe (on Windows OS)
- py.exe - an exuecutable used to run the python.exe executable like a shortcut, located at
C:\windows\py.exe
Some commands for running python scripts.
Windows CMD |
MacOS Terminal |
Task |
python (py) -V |
|
Checks the version of python.exe (py.exe) |
python.exe (py.exe) <script>.py |
|
Runs a python script with the executable (assuming python and py are in PATH) |
Statically Typed vs Dynamically Typed Languages
Type-checking is the process of checking and verifying the type of a construct (constant, variable, array, list, object) and its usage context. It helps in minimizing the possibility of type errors in the program, and type checking may occur either at compile-time (static checking) or at run-time (dynamic checking).
- Statically-Typed Languages: Since we type check during compilation, every detail about the variables and all the data types must be known before we do the compiling process. Once a variable is assigned a type, it can't be assigned to some other variable of a different type, and so the data type of a declared variable is fixed. This makes sense since in Java, C, C++, etc., the programmer must specify what the data type of each variable is by writing something like
int myNum = 15;
.
- Dynamically-Typed Languages: Since we type-check during runtime, there is no need to specify the data type of each variable while writing code, which improves writing speed. These languages have the capability to identify the type of each variable during run-time, so we do not need to declare the data types of variables. In these languages, variables are bound to objects at run-time using assignment statements, and most modern languages (e.g. JavaScript, Python, PHP, etc.) are dynamically typed.
Navigation Shell Commands
[Hide]
I work in both the Mac and Windows operating systems, which requires me to know two sets of commands when accessing the command line.
Windows CMD |
MacOS Terminal |
Task |
dir |
ls |
Lists files and folders in current directory |
dir /ad |
ls -a |
Lists ALL files and folders in current directory |
cd |
pwd |
Full path of current folder/directory |
cd < path to directory > |
cd < path to directory > |
Change folder/directory |
cd .. |
cd .. |
One directory up in directory tree |
cd \ |
cd / |
Move to root directory |
mkdir (rmdir) newFolder |
mkdir (rmdir) newFolder |
Create (delete) new directory in current directory |
echo < text > > filename(.txt) |
cat > fileName(.txt) |
Create new file with <text> written inside. If file type is not specified, default is .txt. |
cls |
clear |
Clear the terminal screen |
type |
cat |
Concatenate and print a file |
ren oldName newName |
mv oldName newName |
Rename a file or directory |
robocopy (move) FileOrFolderPath <path to destination directory> |
cp -r (mv) myFolder <path to destination> |
Copy (move) a directory to destination directory |
wmic LOGICALDISK LIST BRIEF |
|
Lists all the drives on your computer |
<letter>: (d:) |
|
Changes the drive you are working in |
|
nano fileName |
Open file in nano, which would encode the binary data differently. This is especially fun since you can open non-text files and output them in a different encoding, even if the encoding is complete nonsense. Deleting a few lines in a, say mp3 file opened in nano can corrput the file, leading to distortions in the original mp3 audio. |
Network & Connectivity Commands
[Hide]
When your device connects to a router in a LAN, it has a local IP address (usually starting with
172.30.
or
192.168.
). Packets of data leaves your comptuer address, from a
port (out of the 2
16=65,536 ports), to the router local IP address, known as the
default gateway. This router then connects to the internet, which now has another address (which can be Googled) and forwards the packets through there.
In addition to the IP address, the
MAC (Media Access Control)/physical address is used to uniquely define the physical address of a computer, while the IP address identifies the connection of the device on the network. The MAC address of a computer cannot be changed with time and environment, while the IP address modifies with time and environment. More info on the difference between them is found
here.
Windows CMD |
MacOS Terminal |
Task |
ping (-t) www.google.com |
ping www.google.com |
Checks the ping time to www.google.com by sending small multiple-byte packets of data. Measures in milliseconds. The "-t" pings the address forever until manually stopped. |
ping 192.168.0.1 |
ping 192.168.0.1 |
Checks the ping time to an IP address in your local network by sending small multiple-byte packets of data. Make sure that they are both on the same network, with the same subnet mask, and firewalls are turned off. |
ipconfig (/all) |
|
Shows your IP address and related information about your device within the network. |
ipconfig /displaydns |
|
Displays the DNS cache of your system |
ipconfig /flushdns |
|
Flushes the DNS cache of your system. |
tracert www.google.com |
|
Traces the route it takes for a packet to reach a destination and shows information about each hop along that route. If you’re having issues connecting to a website, tracert can show you where the problem is occurring. |
netstat -an |
nettop |
Displays a list of all open network connections on the computer, along with the port they're using and the foreign IP address they're connected to. |
nslookup www.google.com |
nslookup www.google.com |
Find the IP address associated with a domain |
ipconfig /release |
|
Forces your network adapter (internal computer hardware that connects to network) to drop its assigned IP address. Note that this disconnects your wifi. |
ipconfig /renew |
|
Renews the network adapter's IP address |
arp -a |
arp -a |
Shows all the devices connected to your network. |
When in an environment (whether it'd be a base or a virtual one), the environment has certain characteristics that are stored in what we call the
environment variables. A list of then can be outputted with the commands below.
Windows CMD |
MacOS Terminal |
Task |
set |
|
Lists all the environment variables |
echo %<VARIABLE>% |
|
Returns the value of the environment variable |
A few of the first variables that are outputted on my system is shown here.
C:\Users\bahng>set
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\bahng\AppData\Roaming
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
They are all in the form
NAME=VALUE
where the NAMEs are conventionally written in uppercase (
&VARIABLE
for MacOS and
%VARIABLE%
for Windows), and the VALUEs are strings. The point of these variables is to communicated to programs how the machine is set up and sometimes to control the behavior of programs (e.g. where the home directory is, what user is logged in, etc.).
One notable environment variable is the
PATH environment variable, which is basically a set of directories where executable programs are located. More specifically, it specifies the directories in which executable programs are located on the machine that can be started without knowing and typing the whole path to the file on the command line. Note that the shell does not check subdirectories in PATH, only that directory itself. We can also modify which file extensions (other than the common
.exe
,
.bat
, etc.) is supported by editing
%PATHEXT%
. One can check this this set of directories with the following commands:
Windows CMD |
MacOS Terminal |
Task |
echo %PATH% |
echo $PATH |
Specifies the directories in which executables are located on the machine |
echo %PATHEXT% |
echo $PATHEXT |
Specifies the extensions that are supported by PATH and can therefore be called directly in cmd/terminal. |
For example, calling echo %PATHEXT%
on my Windows laptop returns the following supported executable extensions:
C:\Users\bahng>echo %PATHEXT%
.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
On MacOS, PATH usually holds all bin
and sbin
directories relevant for the current user. On Windows, it contains at least the C:\Windows
and C:\Windows\system32
directories. This is why you can run calc.exe
or notepad.exe
from the command line (you actually don't need to include the .exe
, just calc
will do), but not chrome.exe
since it is located in C:\Program Files\Google\Chrome\Application
.
Calling the PATH environment variable on my computer returns
C:\Users\bahng>echo %PATH%
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Users\bahng\anaconda3;C:\Users\bahng\anaconda3\Library\mingw-w64\bin;C:\Users\bahng\anaconda3\Library\usr\bin;C:\Users\bahng\anaconda3\Library\bin;C:\Users\bahng\anaconda3\Scripts;C:\Users\bahng\AppData\Local\Microsoft\WindowsApps;C:\Users\bahng\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\bahng\AppData\Local\Programs\Python\Python39;C:\Users\bahng\AppData\Local\Programs\Python\Python39\Scripts
and since I have
py.exe
installed in
C:\Windows\system32
(which is in PATH), I can call it immediately in cmd. However, the python.exe is located in
C:\Users\bahng\AppData\Local\Programs\Python\Python39
, which is currently not on the PATH variable and therefore cannot be called without specifying the
entire path of
python.exe
. For convenience, we should modify the PATH variable to include the directory that
python.exe
is located in. Adding a directory to your PATH expands the # of directories that are searched when, from any directory, you enter a command in the shell. Unfortunately, modifying the PATH with the shell can be very dangerous, but there is an easy way to do it using a GUI explained very well through this
link. All environment variables can be changed as explained in the link using the GUI (in the control panel).
Here is the general list of commands that you will use when working with git from the command line.
-
When you are starting a new git repository, whether its creating a new repo from scratch or from an existing project, go into the directory and run
git init
. This can be done in two ways: Go into the directory through terminal/cmd (using the cd
) command and run git init
. Go into the directory through the graphical user interface, right-click, and open the git bash. Then run git init
.
-
Do
git add
(git add filename.ext
for a specific file or git add .
for everything in the directory) to add all the relevant files in the staging area.
-
Do
git status
to check which files have been added and which changes are being tracked.
-
You should create a
.gitignore
file right away to indicate all of the files you don't want to track. You should add this too with git add .gitignore
.
-
Do
git commit -m "Your comment"
to commit your changes (i.e. have git take a "snapshot" of your work in this timeline).
-
Connect your local git repository to github. Go to Github, log in to your account, and click the new repository button on the top-right.Then follow the instructions on the screen, which should tell you to type in a command like this:
>>> git remote add origin https://github.com/username/new_repo
-
Push your changes into your remote repo by doing
git push origin master
(or sometimes, git push -u origin master
). Sometimes, the branch may be called main
, so you would do git push origin main
.
To see the log of your changes in git, do
git log
or
git log --oneline
, which should show you a list of all the commits that have occured. Press enter to go down, and press
q
to exit the log. Now let us talk about working with branches.
-
If we want to create a new branch, we simply type
git branch myBranch
-
We can see all of our branches with
git branch -a
. The branch with the asterick next to it is the one you are currently on.
-
To move branches, do
git checkout myBranch
. Within this new branch, we can edit our files, do git add, git commit, and everything else completely separately from the master branch.
-
To merge the
myBranch
branch to the master branch, first go into the master branch by doing git checkout master
and then doing git merge myBranch
. If there are no conflicts, the branches will merge successfully, and you can work with your changes in myBranch
in the master
branch. If there are conflicts, you should go into the files, edit the code accordingly, and then merge it again to the master branch.
-
To delete the branch
myBranch
, first go into the master branch and do git branch -d myBranch
. This will give you a warning whether you really want to delete it, and you can confirm it by doing git branch -D myBranch
.
Python Virtual Environments & FileSystem
[Hide]
Remember that an environment really contains three things:
- A
Lib\site-packages
directory that contains all the packages needed for the project.
- A module is a bunch of code in a file with the extension
.py
, in which there may be functions, classes, or variables defined.
- A package is a directory of a collection of modules (i.e. the folder that contains all modules).
- A library is an umbrella term referring to a reusable chunk of code, containing a collection of related modules and packages. It is sometimes used interchangeable with packages.
- A scripts directory that contains various executables that will run your scripts with the help of packages imported.
- python.exe
- activate.bat
- pip.exe
- wheel.exe
- Whatever scripts you have written as a part of your program.
Note that the scripts you have written as a part of your program is
not within the virtual environment folder; it only contains the executable scripts and packages. Usually, the scripts would be contained one step above in the file tree of the virtual environment all within a bigger project folder.
We must also distinguish two types of environments.
- A base environment, which handles Python scripts on your entire system. The path for this environment is similar for most systems, but mine is
C:\Users\bahng\AppData\Local\Programs\Python\Python39
.
Note that the Python directory contains the scripts, while the Python39 directory is technically known as the environment.
- Multiple virtual environments, which are isolated environments designed for specific projects. The key benefit of virtual environments is that they isolate each of your projects from any other installation of Python on your computer, allowing multiple versions of Python to coexist without stepping on each other. However, they can be decenetralized, making it hard to manage and requiring activation/deactivation in a terminal-basis. Some of my own virtual environments are listed in different directories, but
venv
folder has the path: C:\Users\bahng\PycharmProjects\HW_Problem_Generator\venv
.
Thankfully, there are multiple package managers that allows us to organize our environments. In addition to the information below, this
link is also a great tutorial. It is conventional to use venv somewhere in the environment name, and it is easily available in ignore files like .gitignore.
Some common virtual environment management programs.
- Venv is the default virtual environment module for Python 3 and one of the easiest modules for creating virtual environments. It comes pre-installed in Python3 or newer.
Windows CMD |
MacOS Terminal |
Task |
python -m venv <path_to_venv_directory/NAME> |
|
Creates a new virtual environment directory named NAME in current directory by default or in path_to_venv_directory specified. |
activate.bat |
|
Activates the virtual environment. Remember to be in the scripts directory in order to run the executable. |
deactivate.bat |
|
Deactivates the virtual environment. Remember to be in the scripts directory in order to run the executable. |
Note that one you have activated the virtual environment, the environment name should pop up in parantheses in your shell as such.
C:\User\bahng>python -m venv testvenv
C:\User\bahng>cd testvenv\Scripts
C:\Users\bahng\testvenv\Scripts>activate.bat
(venv) C:\Users\bahng\venv\Scripts> _
Remember that virtual environments are just directories that contain a library of site-packages and a collection of executables. They can be installed anywhere on your computer, even in directories that have nothing to do with Python scripts.
- Virtualenv is a third-party dependency manager used for creating and managing Python projects, but used mainly for Python 2 so we will not elaborate here. Note that virtual environments do not come as a native feature for Python 2.
- Conda is another package manager that specializes in data science. You can download it by installing Anaconda or Miniconda (a mini-version of Anaconda that contains only Conda and its dependencies). Again, for ease of use, make sure you add conda to the PATH variable. Note that when you install conda, a directory called anaconda3 is made which contains a new base environment and additional virtual environment directories. Listing all the environments I have returns
C:\Users\bahng>conda env list
# conda environments:
#
base C:\Users\bahng\anaconda3
testenv * C:\Users\bahng\anaconda3\envs\testenv
The asterick just means that I am currently in (i.e. have activated) the testenv environment.
Windows CMD |
MacOS Terminal |
Task |
conda info |
Outputs information on conda |
conda -V |
Outputs version of conda |
conda update conda |
Updates conda |
conda install (update) PACKAGE |
Install (update) PACKAGE with conda |
conda create --name NAME |
Create a virtual environment called NAME. Note that regardless of which directory you are in, conda will create all virtual environments in anaconda3\envs. |
conda activate (deactivate) ENV |
Activate (deactivate) the environment named ENV |
conda env list |
Get a list of all conda environments, active environment shown with asterick. |
conda create --clone ENV --name ENV2 |
Clones an environment |
conda list |
List all packages & versions installed in active environment |
conda env remove --name ENV |
Delete an environment and everyhing in it |
conda install --name ENV2 PKG |
Install new package PKG in a different environment ENV2 |
conda remove --name ENV PKG1 PKG2 |
Remove one or more packages from environment ENV |
A conda cheatsheet.
- Pip is a package manager used to install, uninstall, and organize different Python packages on your computer. First
Windows CMD |
MacOS Terminal |
Task |
pip install (uninstall) PKG |
|
Install (uninstall) a package directly from PyPI into the current active environment using pip. However, this may not install all the dependencies of the package, so the command below is recommended. |
python -m pip install (uninstall) PKG |
|
Installs a package and all of its dependencies. The uninstall command does not uninstall the dependencies however. |
python -m pip install PKG --upgrade |
|
Upgrades package to its latest version |
python -m pip install --upgrade pip |
|
Update pip to its latest version |
python -m pip list |
|
List all packages & versions installed in active environment |
pip show PKG |
|
Shows information about package PKG |
python -m pip search PKG |
|
Searches for the PKG package. |
pip --help |
|
Returns the full list of pip options. |
A final reminder that each instance of python, pip, and all the other packages installed is only updated
within its environment! Updating, pip (for instance) in the base environment will not update pip in any one of the virtual environments.