Summary Article of what i have learned in my NUS Modules

LiveRunGrow

33 min readAug 3, 2020

Content:

CS3223 Database Systems Implementation + CS2102 Introduction to DB
CS2105 Computer Network
CS2106 Operating System

CS3223 (Storage Management, Query Processing, Transaction Management)

Week 1

A DBMS store relations (actual data), Indexes (Data structures to speed up access to relations), System Catalog (Metadata about relations), Log Files (Information maintained for data recovery).
DBMS stores information on non-volatile disks, processes data in main memory RAM. This means we need to keep transferring data from RAM to disk vice versa.
Disks: secondary storage device. Disk pages or blocks. Time to retrieve information depends on its relative location on the disk. It consists of 3 types of timing: Seek Time + Rotational delay + Transfer Time. To reduce I/O cost, can only reduce seek and rotational delays.

Week 2

Types of queries: Single Record retrievals & Range Searches
An index is a data structure on a file to speed up retrieval/selections based on some search key (a subset of the field of a relation).
An index is stored as a file. Records in an index file are referred to as data entries. Each data entry points to a data record.

There are two main types of indexes. Tree-based index and Hash-based index.
B+ Tree: Leaf nodes store sorted data entries.

Properties of B+ Tree: Height balanced, grow and shrink dynamically, minimum 50% occupancy which ensures storage efficiency
Operations of B+ Tree: (1) Search (2) Insert (Start from leaf node) (3) Delete

An index is a clustered index if the order of its data entries is the same as or close to the order of the data records; otherwise, it is an unclustered index.
To build a clustered index, first sort the data file.

Dense index: There is at least one data entry per search key value (in some data record).
Every sparse index is clustered. Sparse index cannot support exists search.

Week 3

Hash based indexes are ideally best for equality selections, inefficient for range search.
Static hashing | Dynamic hashing

Dynamic hashing: Linear hashing | Extendible hashing
Linear hashing is a dynamic scheme where the hash file grows linearly, one bucket at a time by systematic splitting of buckets. It prevents long overflow chains.

linear hashing performance: 1.2 disk I/O for uniform or lowerly skewd data distribution.
Extendible hashing: Handle file growth by doubling # of buckets.
Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflows.

Week 4

Given k sorted files (runs), we can merge them into larger sorted runs, and eventually produce one single sorted file.
To sort a very large unsorted file, we can do it in 2 phases:

Phase 1: Generate sorted runs

Phase 2: Merge sorted runs (we already know this)

So how do we do phase 1? We read in as many records as possible into memory, perform in-memory sort. Then, write out sorted records as a sorted run. Repeat until all records in the unsorted files are read.

To do internal sort, we can consider quick sort or replacement selection.

Week 5

Implementing Relational Operations such as selection, projection, join, set-difference etc.

(1) Join Algorithms

Iteration-based -> Block nested loop

For each tuple in the outer relation R, we scan the entire inner relation S. There are 3 types of nested loops

Simple nested loop
Page Nested loop
Block nested loop

Index-based -> Index nested loop: B+ Tree, Hash, Clustered, Unclustered

Sort-based -> Sort merge join

Partition-based -> Hash join

(2) The Projection Operation

Sort-based approach

hash-based approach

Week 8

Query Optimisation -> Queries that require multiple operations (join, distinct etc) may be composed in different ways, thus optimisation is needed to come up with a good plan.

Each strategy of composition can be represented as a query evaluation plan. The goal is to find the best plan that compute the same answer.

(Left) what needs to be implemented (Right) How to implement it?

Estimate the size of tables to calculate cost of operations
Use of histogram

Week 9

Transactions

Ends in one of two ways: commit or abort
Two important properties for a transaction: Atomicity (either execute all its actions, or none of them), Durability (The effects of a committed xact must survive failures)

DBMS ensures the above by logging all actions and performing recovery accordingly. It undos the actions of aborted/failed transactions. It redo actions of committed transactions not yet propagated to disk when the system crashes.
A transaction: sequence of r(x), w(x) actions.
Conflicting actions on the same item A:
R1(A), W2(A)
W1(A), R2(A)
W1(A), W2(A)
Serial schedule: No interleaving of actions or transaction
Serialisable schedule: A Schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule.
Anomalies with interleaved Xact executions
(1) Dirty read: Read from an uncommitted transaction
(2) Unrepeatable read: Reading the value two times gives different values.
(3) Lost Update: Writing on an object has no effect as it is overridden by another transaction.
Conflict Equivalent schedule: S1 can be transformed into S2 by swapping a series of non conflicting actions.
Conflict serializable: If it is conflict equivalent to some serial schedule.
Note that some serializable schedule are not conflict serializable.
Draw precedence graph to detect conflict serializability.
View Equivalent: …
View serializable: If S is view equivalent to some serial schedule over the same set of Xacts.
….TO BE CONTINUED.

CS2105 Introduction to computer networks

The internet is a network of connected computing devices. It is the infrastructure that connects hosts/end systems together (like highways).

Hosts run network applications. These applications communicate using protocols.
The internet is a packet switching network. (The opposite would be circuit switching).
Hosts connect to the Internet via access ISPs. These ISPs in turn must be interconnected.

NOTE THAT CRYPTOGRAPHY -> Transport layer. Usually for transport service, we are concerned with data integrity, throughput, timing and security.

A host can have several processes (apps). The IP address is a 32 bit integers which identifies the host. The port number identifies the process and is 16-bit integer.
IPv4 is 32-Bit IP address whereas IPv6 is a 128-Bit IP address.

Protocols

Application Layer protocol

Defines types of messages exchanged eg, request, response

Transport protocol

TCP: Connection oriented, flow controlled (sender won’t overwhelm receiver), congestion controlled, reliable (guarantees transfer of data). However, TCP does not provide: timing, minimum throughput guarantee, security.
UDP: Unreliable data transfer. None of TCP. It does not provide: reliability, flow control, congestion control, timing, throughput
guarantee or security.

HTTP (application protocol)

HyperText Transfer Protocol: Web’s application layer protocol.
Web is an application and http is its’ protocol. It uses TCP for its transport.
HTTP/1.0 (Non persistent)
HTTP/1.1 (Persistent) (Pipelining)
HTTP/2 (Multiplexing)
HTTP uses TCP as transport service.

HTTP/1.1 is also called persistent -> Keep the socket open after sending response
HTTP was designed to be stateless where the server maintains no information about past client requests. Cookies are used to maintain state. Cookies are http messages that carry state.

Cookie Technology

Cookie header in the HTTP response and request message.
A cookie file kept on the user’s end system and managed by the user’s browsers.
A base-end database at the Web site.

When a client first sends a request, the server will create a cookie id for the user which is stored in the backend database. Then, it sends a response back to the client, along with the cookie id in the header. This cookie file is then kept on the user’s browser. Following this, next time the browser sends a request, it attaches the cookie.

Others: https://www.tutorialspoint.com/What-is-the-difference-between-session-and-cookies#:~:text=Cookies%20and%20Sessions%20are%20used,as%20well%20as%20a%20server.&text=A%20session%20creates%20a%20file,and%20their%20values%20are%20stored.

Cookies and Sessions are used to store information. Cookies are only stored on the client-side machine, while sessions get stored on the client as well as a server.

Cookie

A cookie is a small file with the maximum size of 4KB that the web server stores on the client browser. Cookies hold data in a key=value pairs. Once a cookie has been set, all page requests that follow return the cookie name and value.

Cookies are considered highly insecure because the user can easily manipulate their content. That’s why you should always validate cookie data.

Cookies are usually used to preserve login state, where a username and a special hash are sent from the browser, and the server checks them against the database to approve access.

Cookies are also often used in sessions creation. (Note: they are not the same)

Session

A session is a global variable stored on the server. Each user gets a unique session id. Each user gets a session ID, which is sent back to the server for validation either by cookie or by GET variable.

Sessions are usually short-lived, which makes them ideal in saving temporary state between applications. Sessions also expire once the user closes the browser.

Sessions are considered more secure than cookies because the variables themselves are kept on the server.

Server opens a session (sets a cookie via HTTP header).
Server sets a session variable.
Client changes page.
Client sends all cookies, along with the session ID from step 1.
Server reads session ID from cookie.
Server matches session ID from a list in a database (or memory etc).
Server finds a match, reads variables which are now available on $_SESSION superglobal.

Difference between Cookie and Session

What is Cookie? A cookie is a small file with the maximum size of 4KB that the web server stores on the client…

www.guru99.com

What are cookies and sessions, and how do they relate to each other?

Let's go through this: and are both ways to preserve the application's state between different requests the browser…

stackoverflow.com

Update (Below is from ChatGpt, which is quite clear…):

A cookie is a small piece of data that a website stores on a user’s computer or mobile device. Cookies are used to remember a user’s preferences, login information, and browsing activity on the website. The website can access and read the cookie when the user visits the website again. Cookies can also be set to expire after a certain time or when the user closes their browser.

A session, on the other hand, is a way for a website to keep track of a user’s activity during a particular browsing session. A session is typically created when a user logs into a website and ends when the user logs out or closes their browser. During a session, the website can store information about the user’s activity, such as the items they have added to a shopping cart, their preferences, or their login status. This information is stored on the server and can be accessed by the website during the user’s session.

The main difference between cookies and sessions is that cookies are stored on the user’s device, while sessions are stored on the server. Cookies can be accessed by the website even after the user has closed their browser or logged out of the website, while sessions are destroyed when the user ends their session. Cookies can be used to remember a user’s preferences or login information even after they have left the website, while sessions are typically used to store information about a user’s current activity on the website.

Accessing Cookies: When a user visits a website, the website can create a cookie and store it on the user’s device using a response header in HTTP. When the user visits the website again, the website can read the cookie using a request header in HTTP. The website can then use the information stored in the cookie to personalize the user’s experience or remember their preferences. Cookies can be accessed and managed by the user through their browser settings.

Creating Cookies: Cookies can be created by a website using server-side scripting languages such as PHP, ASP, or JavaScript. When a user performs a specific action on the website, such as submitting a form or logging in, the server can create a cookie and store it on the user’s device. The cookie can contain information such as the user’s login credentials, preferences, or browsing history.

Accessing Sessions: When a user logs into a website, the website can create a session and store information about the user’s activity on the server. The website can access the session information using a session ID that is stored in a cookie on the user’s device. The session ID allows the website to associate the user’s activity with the session information stored on the server. When the user logs out or ends their session, the session information is destroyed.

Creating Sessions: Sessions can be created by a website using server-side scripting languages such as PHP, ASP, or Java. When a user logs in or performs a specific action on the website, the server can create a session and store information about the user’s activity. The session information can include data such as the user’s login credentials, shopping cart contents, or preferences. The session ID is then stored in a cookie on the user’s device, allowing the website to associate the user’s activity with the session information stored on the server.

Why store SessionID in a Cookie?

session ID is typically stored in a cookie on the user’s device to allow the website to identify and associate the user’s activity with the session information stored on the server.

When a user logs into a website, the website creates a unique session ID for that user’s session. This session ID is then stored in a cookie on the user’s device, usually as a session cookie that is deleted when the user closes their browser.

The website can then use the session ID to retrieve the session information stored on the server for that user’s session. This allows the website to personalize the user’s experience and provide them with customized content based on their activity on the website during that session.

By storing the session ID in a cookie on the user’s device, the website can ensure that the user is identified correctly even if they navigate away from the website and return later. Without the session ID stored in a cookie, the website would have no way to associate the user’s activity with their session information on the server, and the user would need to log in again every time they returned to the website.

How does a URL identify a host?

Domain Name Service: Translate between host name (eg, www.comp.nus.edu.sg) and IP address
DNS resource record: record that maps host name and IP addresses
These records are kept by DNS Servers. The DNS stores Resource records in distributed databases implemented in a hierarchy of many name servers.

13 root name servers worldwide. Here, root servers answer requests for records in the root zone by returning a list of the authoritative name servers for the appropriate top-level domain(TLD).
Top-level domain servers -> responsible for .com, .edu and top level country domains like .sg, .my
Authoritative servers -> Organisation’s own DNS server. It provides authoritative hostname for IP mappings.
Local DNS (default name server) Server -> Does not belong to hierarchy. Each ISP has a local DNS server. It retrieves name-to-address translation from local cache. Cache entries will expire after Time to Live(TTL) countdown.

DNS runs over UDP to be fast. It avoids on RTT induced by TCP handshake.

Addressing processes

Note that the IP address of a host only identifies a host device but does not identify a process running inside that host. Port number identifies the process.
Some standard port numbers: HTTP Server: 80…
Socket is the software interface between app processes and transport layer protocol.

Reliable Data Transfer Model

Network layer service is unreliable and packets may be lost or corrupted during transmission. Therefore, a reliable transport protocol should ensure that packets are received by receiver in good order.

rdt 3.0 stinks in performance, given its stop-and-wait nature.
Hence, we introduce Pipelined Protocols. (1) Go-Back-N (2) Selective Repeat

Go-Back-N -> Cumulative ACK. ACK n means all packets ≤ n have been received.

Selective Repeat -> One timer per packet, receive needs a buffer

Transport: TCP, UDP (Week 5)

Connection less transport: UDP
Connection-oriented transport: TCP

UDP uses checksum to detect errors.

TO summarise the above,

Step 1 (SYN) : In the first step, client wants to establish a connection with server, so it sends a segment with SYN(Synchronize Sequence Number) which informs server that client is likely to start communication and with what sequence number.
Step 2 (SYN + ACK): Server responds to the client request with SYN-ACK signal bits set. Acknowledgement(ACK) signifies the response of segment it received and SYN signifies with what sequence number it is likely to start the segments with.
Step 3 (ACK) : In the final part client acknowledges the response of server and they both establish a reliable connection with which they will start the actual data transfer.

TCP Connection Release

Step 1 (FIN From Client) — Suppose that the client application decides it wants to close the connection. (Note that the server could also choose to close the connection). This causes the client send a TCP segment with the FIN bit set to 1 to server and to enter the FIN_WAIT_1 state. While in the FIN_WAIT_1 state, the client waits for a TCP segment from the server with an acknowledgment (ACK). At this point, the client can no longer send application data but can receive data.
Step 2 (ACK From Server) — When Server received FIN bit segment from Sender (Client), Server Immediately send acknowledgement (ACK) segment to the Sender (Client).
Step 3 (Client waiting) — While in the FIN_WAIT_1 state, the client waits for a TCP segment from the server with an acknowledgment. When it receives this segment, the client enters the FIN_WAIT_2 state. While in the FIN_WAIT_2 state, the client waits for another segment from the server with the FIN bit set to 1.
Step 4 (FIN from Server) — Server sends FIN bit segment to the Sender(Client) after some time when Server send the ACK segment (because of some closing process in the Server). At this point, the server can no longer send application data.
Step 5 (ACK from Client) — When Client receive FIN bit segment from the Server, the client acknowledges the server’s segment and enters the TIME_WAIT state. The TIME_WAIT state lets the client resend the final acknowledgment in case the ACK is lost. The time spent by client in the TIME_WAIT state is depend on their implementation, but their typical values are 30 seconds, 1 minute, and 2 minutes. After the wait, the connection formally closes and all resources on the client side (including port numbers and buffer data) are released.

TIMED-WAIT state is a mechanism in TCP/IP stacks that keeps sockets open after an application has shutdown the socket. Its purpose is two fold: It stops packets that are delayed from being accepted by another socket using the same source address, source port, destination address, destination port combination. It ensures the remote end of a connection has closed the connection. This is especially needed when a last ACK packet is lost.

UDP Preserves message boundaries but TCP does not. Message boundaries in this context is simply the start & end of the message/packet. With TCP connections, all messages/packets are combined into a continuous stream of data, whereas with UDP the messages are given to you in their original form. They will have an exact size in bytes.

Network Layer (Week 6)

Network layer is responsible for delivering packets to receiving hosts. Routers will examine header fields of IP data-grams passing it and direct it to the right destination.
Responsible for host-to-host communication. They take charge of forwarding and routing.

A router is a device that forwards packets between networks and performs routing protocols.

An IP address is associated with an interface. A router has multiple interfaces and hence, multiple IP addresses.

IP addresses are configured automatically by the Dynamic Host Configuration Protocol.

DHCP runs over UDP. DHCP server port number: 67, DHCP client port number: 68.
Router uses longest prefix match in forwarding table when determining to which next hop a IP datagram is sent.

Your ISP only gives you one IP address, but you have several hosts.

Let’s look at how a packet travels within its local network

First the local machine will compare the destination IP address to see if it’s in the same subnet by looking at its subnet mask.
When packets are sent they need to have a source MAC address, destination MAC address, source IP address and destination IP address, at this point we do not know the destination MAC address.
To get to the destination host, we use ARP to broadcast a request on the local network to find the MAC address of the destination host.
Now the packet can be successfully sent!

Let’s see how a packet travels outside its network

First the local machine will compare the destination IP address, since its outside of our network, it does not see the MAC address of the destination host. And we can’t use ARP because the ARP request is a broadcast to locally connected hosts.
So our packet now looks at the routing table, it doesn’t know the address of the destination IP, so it sends it out to the default gateway (another router). So now our packet contains our source IP, destination IP and source MAC, however we don’t have a destination MAC. Remember MAC addresses are only reached through the same network. So what does it do? It sends an ARP request to get the MAC address of the default gateway.
The router looks at the packet and confirms the destination MAC address, but it’s not the final destination IP address, so it keeps looking at the routing table to forward the packet to another IP address that can help the packet move along to its destination. Every time the packet moves, it strips the old source and destination MAC address and updates the packet with the new source and destination MAC addresses.
Once the packet gets forwarded to the same network, we use ARP to find the final destination MAC address
During this process, our packet doesn’t change the source or destination IP address.

Network Security

Link Layer

Network layer provides communication service between any two hosts, but there can be multiple links between the hosts.

An IP address is kind of like your postal address. Anyone who knows your postal address can send you a letter. That letter may travel a simple or complex route to get to you, but you don’t care, as long as it makes it.

The same is true of packets of data traveling on a network like the internet. The IP address indicates the computer to which a packet is destined, and the system takes care of getting it there. A letter may or may not also have a return address so you know who to write back to, but a TCP/IP address always has a return IP address.

A router can perhaps be thought of as a company’s mail clerk. You may send a letter to “Complaint Department, Some Big Company, Some Big Company’s Address”. The postal service will get that letter to the company. The company’s mail clerk then notes that the letter needs to go to the complaint department, and routes it there using inter-office mail. And of course, all your outgoing mail is picked up by the clerk and routed to the external postal service as needed.

When you’re behind a router, the same thing sort of happens. All of the packets destined for you are actually addressed to your router. The router then determines which of your computers that packet is meant for, and routes the packet appropriately (hence the name router).

Whether corporate mail room or networking router, neither the actual physical location of your office within your company’s building, or the actual local IP address of your computer on your local, private network is visible to the outside world.

A MAC Address is kind of like the color, size, and shape of your physical mailbox. It’s enough that the mail clerk (your network router) can identify it, but it’s unique to you. There’s no reason that anyone other than your postal carrier might care what it is, and you can change it by getting a new mailbox (network card) at any time and slapping your name (IP address) on it, without affecting your delivery.

H -> Hub, S -> Switch, R -> Router

Do You Know the Difference Between Hub, Switch & Router?

When computers, network devices or other networks are required to be connected, hubs, switches and routers are the…

medium.com

Hub simply sends messages to all ports on the hub. Only the computer of the intended message recipient will respond. It works on the physical layer.

Switch knows the address to send message to and sends directly to the computer. It is connected to many computers. It works on the link layer.

A router is a small computer which can be programmed to handle and route the network traffic. It usually connects at least two networks together and calculates the best route. It works on the network layer.

A LAN (local area network) is a group of computers and network devices connected together, usually within the same building.

A MAN (metropolitan area network) is a larger network that usually spans several buildings in the same city or town. The IUB network is an example of a MAN.

A WAN (wide area network), in comparison to a MAN, is not restricted to a geographical location, although it might be confined within the bounds of a state or country. A WAN connects several LANs, and may be limited to an enterprise (a corporation or an organization) or accessible to the public.

Extra

HTTPS is HTTP with encryption. The only difference between the two protocols is that HTTPS uses TLS (SSL) to encrypt normal HTTP requests and responses. As a result, HTTPS is far more secure than HTTP. A website that uses HTTP has http:// in its URL, while a website that uses HTTPS has https://.

What is an HTTP request? What is an HTTP response?

HTTP requests are generated by a user’s browser as the user interacts with web properties. For example, if a user clicks on a hyperlink, the browser will send a series of “HTTP GET” requests for the content that appears on that page. If someone Googles “What is HTTP?” and this article shows up in the search results, when they click on the link, their browser will create and send a series of HTTP requests in order to get the information necessary to render the page.

These HTTP requests all go to either an origin server or a proxy caching server, and that server will generate an HTTP response. HTTP responses are answers to HTTP requests.

What does a typical HTTP request look like?

An HTTP request is just a series of lines of text that follow the HTTP protocol. A GET request might look like this:

GET /hello.txt HTTP/1.1
User-Agent: curl/7.63.0 libcurl/7.63.0 OpenSSL/1.1.l zlib/1.2.11
Host: www.example.com
Accept-Language: en

This section of text, generated by the user’s browser, gets sent across the Internet. The problem is, it’s sent just like this, in plaintext that anyone monitoring the connection can read. (Those who are unfamiliar with the HTTP protocol may find this text hard to understand, but anyone with a baseline knowledge of the protocol’s commands and syntax can read it easily.)

This is especially an issue when users submit sensitive data via a website or a web application. This could be a password, a credit card number, or any other data entered into a form, and in HTTP all this data is sent in plaintext for anyone to read. (When a user submits a form, the browser translates this into an HTTP POST request instead of an HTTP GET request.)

PUT is a method of modifying resource where the client sends data that updates the entire resource. It is used to set an entity’s information completely. PUT is similar to POST in that it can create resources, but it does so when there is a defined URI. PUT overwrites the entire entity if it already exists, and creates a new resource if it doesn’t exist.
Unlike PUT, PATCH applies a partial update to the resource.
This means that you are only required to send the data that you want to update, and it won’t affect or change anything else. So if you want to update the first name on a database, you will only be required to send the first parameter; the first name.
For example, when you want to change the first name of a person in a database, you need to send the entire resource when making a PUT request.

When an origin server receives an HTTP request, it sends an HTTP response, which is similar:

HTTP/1.1 200 OK
Date: Wed, 30 Jan 2019 12:14:39 GMT
Server: Apache
Last-Modified: Mon, 28 Jan 2019 11:17:01 GMT
Accept-Ranges: bytes
Content-Length: 12
Vary: Accept-Encoding
Content-Type: text/plainHello World!

If a website uses HTTP instead of HTTPS, all requests and responses can be read by anyone who is monitoring the session. Essentially, a malicious actor can just read the text in the request or the response and know exactly what information someone is asking for, sending, or receiving.

What is HTTPS?

The S in HTTPS stands for “secure.” HTTPS uses TLS (or SSL) to encrypt HTTP requests and responses, so in the example above, instead of the text, an attacker would see a bunch of seemingly random characters.

Instead of:

GET /hello.txt HTTP/1.1
User-Agent: curl/7.63.0 libcurl/7.63.0 OpenSSL/1.1.l zlib/1.2.11
Host: www.example.com
Accept-Language: en

The attacker sees something like:

t8Fw6T8UV81pQfyhDkhebbz7+oiwldr1j2gHBB3L3RFTRsQCpaSnSBZ78Vme+DpDVJPvZdZUZHpzbbcqmSW1+3xXGsERHg9YDmpYk0VVDiRvw1H5miNieJeJ/FNUjgH0BmVRWII6+T4MnDwmCMZUI/orxP3HGwYCSIvyzS3MpmmSe4iaWKCOHQ==

In HTTPS, how does TLS/SSL encrypt HTTP requests and responses?

SSL stands for Secure Sockets Layer. It is the standard technology for keeping an internet connection secure and safeguarding any sensitive data being transmitted. It uses encryption algorithms to scramble the data in transit.
TLS (Transport Layer Security) is just an updated, more secure version of SSL.
HTTPS (Hyper Text Transfer Protocol Secure) appears in the URL when a website is secured by an SSL certificate. The presence of a SSL certificate on a website triggers the SSL (TLS) protocol.
SSL operates on top of the transmission control protocol (TCP). It starts to work when the TCP connection has been established, initiating the SSL handshake.

In the above, the TLS handshake uses asymmetric encryption. When both the client and server obtains client random, server random and the premaster secret, they will then each use an algorithm to calculate the master secret.

This master secret will then be used to calculate several session keys for use in that session only. (Four)

A set of 4 completely new session keys gets created with every new communication session and new TLS handshake. There will be a different client write key, server write key, and so on, but those 4 types of keys get created every time.

In summary:

CS2106 (Operating System)

Why do we need OS?

Abstraction over the hardware
Resource Allocator
Control execution of programs

OS is a software that has the privilege to run in kernel mode and have complete access to all hardware resources while other software executes only in user mode where they have limited access to all hardware resources.

We need information to describe a running program. These information include Memory context, Hardware context, OS context.

Memory context:

Text section: Store instructions
Data section: Store global variables
Stack: Store information about function invocation described by stack frames
Heap: Support dynamic allocation of memory

Hardware context:

GPU, Program counter, Stack pointer, Stack frame pointer

OS context:

To distinguish processes from each other
Process ID, Process State (Is the process running?)

Each process is stored in a process control block. The process table contains process control blocks of all processes.

Process Scheduling

We hope to achieve parallelism between processes, and therefore, time slicing is used to share CPU time between processes, with OS occupying CPU between different processes to do the context switch.
There are 2 types of process behavious: CPU-Activity involving computation and IO-Activity such as printing to screen.
There are 2 types of scheduling policies, defined by when scheduling is triggered.

Non-preemptive: Where a process stayed scheduled until it blocks or voluntarily gives up CPU.
Preemptive: Where a process is given a fixed time quota to run.

Scheduling for batch processing (no interaction)

First Come First Served => Guaranteed to have no starvation
Shortest Job First => Starvation is possible
Shortest remaining time

Scheduling for interactive system

Preemptive scheduling algorithms are needed. We need a timer interrupt, which triggers OS scheduler at each interval of timer interrupt.

Round robin -> Tasks stored in FIFO queue. The first task in the queue will be picked to run until a fixed quantum elapsed or it gives up voluntarily. or blocked. It will then be placed at the end of the queue. => Offers response time guarantee.
Priority Scheduling -> Each process is assigned a priority value and we only select task with highest priority value to run upon context switch. We decrease the priority of current running process after it fully occupies the time quantum. => Might lead to priority inversion. rmb the cleaner office example.
Multi-level feedback queue -> Minimises both response time for IO bound process and turnaround time for CPU bound process. Each process is allocated a priority value. The process with the highest priority will be run first. Priority value is updated as below: (1) New jobs will be assigned highest priority. If a job utilised its full time quantum, its priority will be reduced. Else, if it gives up or blocks before it finishes the time slice, it will retain current priority.
Lottery scheduling -> luck

Process Alternative:

Disadvantages of processes:

Expensive because we need duplicate memory space and duplicate the process context.
Hard to do inter process communication

Threads:

Share memory context, OS context.
Context switching only involves hardware context (registers) and stack.
Need additional unique information -> Thread ID, Registers, General Purpose and Stack.

Two kinds of threads:

User thread: Implemented as a user library. Operations are handled by the runtime system. Kernel will not be aware of the existence of threads in the process.

(Advantages)
Can have multithreaded process on any OS
Thread operations are just library calls
Generally more configurable and flexible where the scheduling algorithm can be customised.
(Disadvantages)
OS is not aware of threads and scheduling is performed at process level. One thread blocked will cause the process to be blocked and hence all other process.
It cannot exploit multiple CPUs.

2. Kernel thread: Implemented in the OS and thread operation is handled as system calls.

(Advantages)
Thread level scheduling is possible
Kernel can schedule on thread levels, therefore more than 1 thread in the same process can run simultaneously on multiple CPUs.
(Disadvantages)
Thread operations is now a system call which makes it more resource intensive.
If implemented with many features, then it will be expensive and an overkill.

Synchronisation

Deadlock: two or more transactions are each waiting for locks to be released that are held by the other.

Live lock: A transaction is left in a wait state for infinite time, although the dbms is not in a deadlock system. To avoid live lock, increase priority of transaction as the time they spend waiting increases.

Memory Abstraction

Logical Address == how the process views its memory space. Logical address !== physical address. Instead, a mapping between logical and physical addresses is needed.

Logical memory is limited by the size of the page table.

Continuous Memory Allocation

Assume: Each process occupies a contiguous memory region. The physical memory is large enough to contain one or more processes with complete memory space.
How to allocate memory to a single process:
1. Physical memory can be split into a fixed number of partitions of fixed size
Cons: Leads to internal fragmentation. Unused space.
2. Partition is created based on the actual size of the process.
Cons: Need to maintain more information in OS
Cons: Takes more time to locate appropriate region
3. Algorithms to allocate spaces to processes. Buddy system.

Disjoint Memory Allocation

Each logical page size = Physical page size
Page table -> Mapping between logical pages and physical pages
Translation look aside buffer: Cache of a few page table entries. We search for the corresponding frame number by looking for its’ page number in TLB. If TLB is hit, then frame number is retrieved for transmission. If TLB miss, memory access is done to access the full page table, update TLB and do translation.
When a context switch occurs, TLB entries are flushed.
Access Right bit is attached to each page table entries to indicate whether it is writable, readable or executable.
Memory access is checked against the access right bits.
Valid bit is attached to each page table entry to indicate whether the page is valid for process to access. This bit is set by OS and out-of-range will be caught.

Virtual Memory Allocation

We allow logical memory space of a process to be larger than physical memory. The logical address space can either reside in physical memory or in secondary storage.
We use a page table to translate virtual address to physical address. In the page table, we add an additional field to distinguish between a memory resident and non memory resident.
If CPU finds out a certain page is not memory resident, it will trigger a page fault which will be handled by the OS to bring the non memory resident page into the physical memory.

Page Table Structure

Two-Level Page Table

Inverted Page Table

Keeps a single mapping of physical frame to <pid, page number>.

Page Replacement

FIFO Page replacement => Belay’s anomaly where the number of page fault can increase if the number of physical frame increases.
Least Recently used => Searching for the last used time to evict page takes time to search.
Second change page replacement => Modified FIFO to give a second chance to pages that are accessed…..

Difference between Program and Process - GeeksforGeeks

1. Program : When we execute a program that was just compiled, the OS will generate a process to execute the program…

www.geeksforgeeks.org

Difference between Thread Context Switch and Process Context Switch - GeeksforGeeks

A program is a set of instructions that is aimed at performing specific tasks.A program is therefore a passive entity…

www.geeksforgeeks.org

Difference between Process and User Level Thread - GeeksforGeeks

Process is an activity of executing a program. Process is of two types - User process and System process. Process…

www.geeksforgeeks.org

Difference between Process and Kernel Thread - GeeksforGeeks

Difference between Process and Kernel Thread 1. Process: Process is an activity of executing a program. Process is of…

www.geeksforgeeks.org

Interprocess Communication

https://www.tutorialspoint.com/what-is-interprocess-communication#:~:text=Interprocess%20communication%20is%20the%20mechanism,from%20one%20process%20to%20another.

The mechanism provided by the operating system that allows processes to communicate with each other.

Synchronisation

Semaphore: A semaphore is a variable that controls the access to a common resource by multiple processes. The two types of semaphores are binary semaphores and counting semaphores.

Mutual Exclusion with critical section: Mutual exclusion requires that only one process thread can enter the critical section at a time. This is useful for synchronization and also prevents race conditions.

Barrier: A barrier does not allow individual processes to proceed until all the processes reach it. Many parallel languages and collective routines impose barriers.

Lock: The processes trying to acquire this lock wait in a loop while checking if the lock is available or not. This is known as busy waiting because the process is not doing any useful operation even though it is active.

Approaches:

Pipe

A pipe is a data channel that is unidirectional. Two pipes can be used to create a two-way data channel between two processes. This uses standard input and output methods. Pipes are used in all POSIX systems as well as Windows operating systems.

Socket

The socket is the endpoint for sending or receiving data in a network. This is true for data sent between processes on the same computer or data sent between different computers on the same network. Most of the operating systems use sockets for interprocess communication.

Signal

Signals are useful in interprocess communication in a limited way. They are system messages that are sent from one process to another. Normally, signals are not used to transfer data but are used for remote commands between processes.

Shared Memory

Shared memory is the memory that can be simultaneously accessed by multiple processes. This is done so that the processes can communicate with each other. All POSIX systems, as well as Windows operating systems use shared memory.

Message Queue

Multiple processes can read and write data to the message queue without being connected to each other. Messages are stored in the queue until their recipient retrieves them. Message queues are quite useful for interprocess communication and are used by most operating systems.

Summary Article of what i have learned in my NUS Modules

CS3223 (Storage Management, Query Processing, Transaction Management)

(1) Join Algorithms

(2) The Projection Operation

CS2105 Introduction to computer networks

Protocols

Application Layer protocol

Transport protocol

HTTP (application protocol)

Cookie

Session

Difference between Cookie and Session

What is Cookie? A cookie is a small file with the maximum size of 4KB that the web server stores on the client…

What are cookies and sessions, and how do they relate to each other?

Let's go through this: and are both ways to preserve the application's state between different requests the browser…

Update (Below is from ChatGpt, which is quite clear…):

How does a URL identify a host?

Addressing processes

Reliable Data Transfer Model

Transport: TCP, UDP (Week 5)

TCP Connection Release

Network Layer (Week 6)

Network Security

Link Layer

H -> Hub, S -> Switch, R -> Router

Do You Know the Difference Between Hub, Switch & Router?

When computers, network devices or other networks are required to be connected, hubs, switches and routers are the…

Extra

What is an HTTP request? What is an HTTP response?

What does a typical HTTP request look like?

What is HTTPS?

In HTTPS, how does TLS/SSL encrypt HTTP requests and responses?

CS2106 (Operating System)

Memory context:

Hardware context:

OS context:

Process Scheduling

Scheduling for batch processing (no interaction)

Scheduling for interactive system

Process Alternative:

Synchronisation

Memory Abstraction

Continuous Memory Allocation

Disjoint Memory Allocation

Virtual Memory Allocation

Page Table Structure

Page Replacement

Difference between Program and Process - GeeksforGeeks

1. Program : When we execute a program that was just compiled, the OS will generate a process to execute the program…

Difference between Thread Context Switch and Process Context Switch - GeeksforGeeks

A program is a set of instructions that is aimed at performing specific tasks.A program is therefore a passive entity…

Difference between Process and User Level Thread - GeeksforGeeks

Process is an activity of executing a program. Process is of two types - User process and System process. Process…

Difference between Process and Kernel Thread - GeeksforGeeks

Difference between Process and Kernel Thread 1. Process: Process is an activity of executing a program. Process is of…

Interprocess Communication

Approaches:

The End :)

Written by LiveRunGrow