Linux Systems Programming


Kernel

Kernel functions

User mode vs Kernel mode

Modern processor architectures typically allow the CPU to operate in at least two different modes: user mode and kernel mode (sometimes also referred to as supervisor mode). Hardware instructions allow switching from one mode to the other. Correspondingly, areas of virtual memory can be marked as being part of user space or kernel space. When running in user mode, the CPU can access only memory that is marked as being in user space; attempts to access memory in kernel space result in a hardware exception. When running in kernel mode, the CPU can access both user and kernel memory space.

Certain operations can be performed only while the processor is operating in kernel mode. Examples include executing the halt instruction to stop the system, accessing the memory-management hardware, and initiating device I/O operations. By taking advantage of this hardware design to place the operating system in kernel space, operating system implementers can ensure that user processes are not able to access the instructions and data structures of the kernel, or to perform operations that would adversely affect the operation of the system.

Process

A process is started by the kernel, it is also the kernel that can end it, all inputs into a process come through the kernel and all outputs from a process pass through the kernel.

ps -e

Show all processes

If A and B are processes, they cant talk to each other directly all communication should pass through the Kernel.

In a sense this is like a client-server API, if the client (process) wants to access any resources it asks the server (kernel).

cat /proc/1/limits |grep processes

Get maximum number of processes on your system

  1. descriptor 0 is standard input
  2. descriptor 1 is standard output
  3. descriptor 2 is standard error

Memory Mappings (mmap())

//TODO:

init

daemon

Interprocess Communication and Synchronization

Signals

Kernel signals

Process Time

  1. system CPU time: Time spent executing system calls and per-forming other kernel services on behalf of the process
  2. user CPU time: The time spent executing code in user mode (Program Code)

User Profile

Users are just another type of client to the server (kernel).

cat /etc/passwd | grep <USERNAME>

Show user profile

The above outputs something like:

username:password:UID:GID:comment:home:shell

The first username_ followed by : then the password_ (x means encrypted and stored in /etc/shadow) followed by the user ID (UID) which is followed by the group id of the user GID followed by a comment that describes the user account followed by the users home directory and the shell that is launched on user login.

Superuser

cat /etc/passwd | grep root

Show superuser profile

User groups

Users can be grouped together for administrative, imagine some files that can only be accessed by users that are part of a specific group.

cat /etc/group | grep <GROUP_NAME>

Show all available groups

The above outputs something like:

group_name:group_password:group_id:group_members

Filesystem

ls /

List the files and directories there in root directory

Permissions

User Permissions

Each file has an associated user user id (UID) and group (GID) that define the owner of the file and group it belongs to. These properties are also the building block of file permissions.

In the context of permissions, there are 3 types of entities within the system, entities can interact with files depending on the file’s permissions.

Here are the entities:

  1. Owner of the file
  2. Group members of the file
  3. Other users

There 3 types of permissions:

  1. read allows an entity to read the file > (for directories read allows an entity to list the content of a directory)

  2. write allows an entity to modify the file > (for directories write allows the contents of the directory to be changed)

  3. execute allows an entity to execute the file > (for directories execute allows access to files within the directory

Process Permissions

// TODO:

Syscalls

+-----------------+   |              ...
   | Program/Process |   |      .-----> [ ]
   +-||----------||--+   |      |       [ ]
    \      libc     /    | .----'       [ ]
     '-------------'     | |            [ ]
           |             | |            [ ]
           |             | |            [ ]
           '---------------'            ...
                         |
       [User Space]      |     [Kernel Space]
                         |
                         |

IO (syscall) Buffering

This section has been adapted from: https://era.co/blog/unbuffered-io-slows-rust-programs

Programming languages have access to OS syscalls, these are used for things such as IO.

syscalls are slow to call, when designing high performance code all syscalls should be analyzed.

No buffering, slow:

use std::fs;
use std::io::{self, Write};

fn main() -> io::Result<()> {
    let mut f = fs::File::create("/tmp/unbuffered.txt")?;
    f.write(b"foo")?;
    f.write(b"\n")?;
    f.write(b"bar\nbaz\n")?;
    return Ok(());
}

We can use the strace program to see the syscalls used in a program:

$ strace --trace=write ./target/release/01_unbuffered
write(3, "foo", 3)                      = 3
write(3, "\n", 1)                       = 1
write(3, "bar\nbaz\n", 8)               = 8

We should rather use buffered io like this:

use std::fs;
use std::io::{self, BufWriter, Write};

fn main() -> io::Result<()> {
    let mut f = BufWriter::new(fs::File::create("x.txt")?);
    f.write(b"foo")?;
    f.write(b"\n")?;
    f.write(b"bar\nbaz\n")?;
    return Ok(());
}
$ strace --trace=write ./target/release/02_buffered
write(3, "foo\nbar\nbaz\n", 12)         = 12

fsync

This section has been adapted from: https://bonsaidb.io/blog/durable-writes/#What%20are%20%27durable%20writes%27%3F

When writing data to a file, the data is cached in RAM by the OS, it is not immediately written to the file-system. This writing to the disk is slow, so buffered io is used. But when power is suddenly cut the data (in RAM) to be written to the file is lost forever.

To prevent the lost of buffered/cached data, we need to flush, or sync the data.

Rust uses the correct APIs for each platform when calling File::sync_all or File::sync_data to provide durable writes. The standard library does not provide APIs to invoke the underlying APIs mentioned above. Thankfully, the libc crate makes it easy to call the APIs we are interested in for this post.

SQLite

Transactions

In programs that use SQLite, there can be various actions to be performed by the database. These actions can be grouped together in what’s called a Transaction.

A transaction is a sequence of actions on data items

Transactions help prevent problems that could arise, such as data durability when a program crushes or in the event of an unexpected power failure or even during complex concurrent programming procedures etc. (These restrictions/guarantees is basically ACID, more info below)

Programs can start a transaction and execute operations as part of the transaction. But for the transactions changes to occur the transaction must be committed. To commit simply means to instruct the database to permanently update its state according to the operations contained within the transaction.

Transactions can be considered as logical units of work for a database system. If a transaction fails the database must remove it’s effects from the database and revert back to the state the database was in before the transaction occurred.

No only are transaction units of work that move the database state forward, transactions are also a database abstraction with the following guarantees (aka ACID):

To get a greater view of Transactions let us see them at work, using Rust and the Rusqlite crate:

use rusqlite::{params, Connection, Result};

/// A helper function for connecting the database
fn connect_db() -> Result<Connection> {
    let conn = Connection::open("/tmp/TEST_DB.db")?;

    conn.execute(
        "CREATE TABLE IF NOT EXISTS vals(
            v  INTEGER NOT NULL
        )",
        [],
    )?;

    Ok(conn)
}

/// A slow way to insert rows
fn slow_insert(conn: &Connection) -> Result<()> {
    for count in 1..=1000 {
        conn.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }

    Ok(())
}

/// A fast way to insert rows
fn fast_insert(conn: &mut Connection) -> Result<()> {
    let tx = conn.transaction()?;

    for count in 0..1000 {
        tx.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }
    tx.commit()?;
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_slow_insert() {
        let conn = connect_db().unwrap();
        slow_insert(&conn).unwrap();
    }

    // #[test]
    // fn test_fast_insert() {
    //     let mut conn = connect_db().unwrap();

    //     fast_insert(&mut conn).unwrap();
    // }
}

In the above code we try out two ways to insert 1000 rows in an SQLite database. The code has three function:

The code also has two test function test_slow_insert and test_fast_insert the later is commented out because we only want to test the slow one first by running:

cargo test

We see that it is quite slow, the test on my machine output:

running 1 test
test tests::test_slow_insert has been running for over 60 seconds
test tests::test_slow_insert ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 165.22s

   Doc-tests st
   
running 0 tests
   
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Let us try out the fast version by commenting out the test_slow_insert unit test function and uncommenting the test_fast_insert unit test function. Then after we run cargo test we get

running 1 test
test tests::test_fast_insert ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.18s

   Doc-tests st
   
running 0 tests
   
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s 

This time the test completes instantly.

So why is the first one slow, specifically why is this slow:

// A slow way to insert rows
fn slow_insert(conn: &Connection) -> Result<()> {
    for count in 1..=1000 {
        conn.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }

    Ok(())
}

It is slow because it uses the connection’s execute method which results in a new transaction being created to insert each and every row. This might be acceptable if the database was held in memory but in this case the database is being held on the filesystem on spinning disk drive. Interactions with the filesystem is slow usually several syscalls have to be called. For example for durability reasons (a key requirement for ACID) databases often make use of the fsync system call. All this means creating a 1000 transactions and committing them is very slow, it is much better to batch the database operations on a single transaction and only committing them once like this:

// A fast way to insert rows
fn fast_insert(conn: &mut Connection) -> Result<()> {
    let tx = conn.transaction()?;

    for count in 0..1000 {
        tx.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }
    tx.commit()?;
    Ok(())
}

Tools


[home] | Emancipation through technology |