Process

2026-02-19 Thu

As was earlier stated, a process is just a running program. On a computer, there’s a limited amount of CPUs that seemingly run hundreds or thousands of processes at the same time. This illusion of limitless CPUs is created by the OS by virtualizing the CPU. It’s done by the time sharing mechanism of the CPU, in which the OS runs one process, stops it and switches to a different one providing the illusion that multiple programs are running simultaneously at the cost of performance. On top of this mechanism resides a policy which is an algorithm for making decisions within the OS.

A mechanism answers a how question (lower-level mechanism) about a system, whereas the policy provides the answer to a which question (high-level). For example: how does an OS perform a context switch v/s which process should it run right now.

It is necessary to understand the machine state of a process to understand what constitutes it. One of the obvious components of the state is the memory that the process can address (aka address space). Also part of the state are the registers: special ones being program counter (PC aka instruction pointer IP), stack pointer, frame pointer etc.

Process API

  • Create: New processes.
  • Destory: Said processes forcefully.
  • Wait: For graceful exit/completion of processes.
  • Miscellaenous Control: Process suspension, resuming etc.
  • Status: How long did a process run for, what’s its current state.

Process Creation

The first things the OS must do is to load the program code and static data (initialized variables) in memory, into the address space of the process. In early OSes, loading was done eagerly i.e. the entire program was loaded all at once but modern OS do lazy loading where only the pieces of data that’s needed is loaded during execution (done via paging/swapping).

Once the data’s loaded, some memory must be allocated for program’s stack (used for local variables, function parameters, return addresses). The OS also initializes the stack with arguments i.e. fill parameters to main() function. Then it must also allocate some heap memory, used for explicitly requested dynamically-allocated data.

The OS must also do I/O related initialization, i.e. setup the three file descriptors for the process to read input from terminal and print output/error to the screen. Once all the setup is done, program execution can begin by jumping to the main() routine, and the control is shifted from CPU to the newly-created process.

Process States

A process can be in Running, Ready or Blocked state. Moving from Ready to Running means the process has been scheduled, if a running process gets blocked due to I/O, the OS doesn’t move it to ready unless the blocking event finishes.

stateDiagram-v2
    Running --> Ready: Descheduled
    Ready --> Running: Scheduled

    Running --> Blocked: I/O initiated
    Blocked --> Ready: I/O done

Process Data Structures

OS must have a process list to keep track of state of all processes, which ones are ready/running or blocked. For blocked processes, the I/O events must also be tracked somehow. The register context holds the contents of registers for a stopped process, which can then be used to restore the values in physical registers when running of the process resumes.

struct context {
  int eip;
  int esp;
  ...
  int ebp;
};

enum proc_state { UNUSED, EMBRYO, SLEEPING,
                  RUNNABLE, RUNNING, ZOMBIE };

struct proc {
  char *mem;                  // Start of proc mem
  uint sz;                    // Size of proc mem
  char *kstack;               // Process kernel stack bottom
  enum proc_state state;      // Proc state
  int pid;
  struct proc *parent;        // Parent process
  void *chan;                 // If !zero, sleeping on chan
  int killed;                 // If !zero, has been killed
  struct file *ofile[NOFILE]; // Open files
  struct inode *cwd;
  struct context context;
  struct trapframe *tf;       // Trap frame for current interrupt
};

Homework

The below program process-run.py allows for observing how process states change as programs run

process-run.py
#! /usr/bin/env python

from __future__ import print_function
import sys
from optparse import OptionParser
import random

# to make Python2 and Python3 act the same -- how dumb
def random_seed(seed):
    try:
        random.seed(seed, version=1)
    except:
        random.seed(seed)
    return

# process switch behavior
SCHED_SWITCH_ON_IO = 'SWITCH_ON_IO'
SCHED_SWITCH_ON_END = 'SWITCH_ON_END'

# io finished behavior
IO_RUN_LATER = 'IO_RUN_LATER'
IO_RUN_IMMEDIATE = 'IO_RUN_IMMEDIATE'

# process states
STATE_RUNNING = 'RUNNING'
STATE_READY = 'READY'
STATE_DONE = 'DONE'
STATE_WAIT = 'BLOCKED'

# members of process structure
PROC_CODE = 'code_'
PROC_PC = 'pc_'
PROC_ID = 'pid_'
PROC_STATE = 'proc_state_'

# things a process can do
DO_COMPUTE = 'cpu'
DO_IO = 'io'
DO_IO_DONE = 'io_done'


class scheduler:
    def __init__(self, process_switch_behavior, io_done_behavior, io_length):
        # keep set of instructions for each of the processes
        self.proc_info = {}
        self.process_switch_behavior = process_switch_behavior
        self.io_done_behavior = io_done_behavior
        self.io_length = io_length
        return

    def new_process(self):
        proc_id = len(self.proc_info)
        self.proc_info[proc_id] = {}
        self.proc_info[proc_id][PROC_PC] = 0
        self.proc_info[proc_id][PROC_ID] = proc_id
        self.proc_info[proc_id][PROC_CODE] = []
        self.proc_info[proc_id][PROC_STATE] = STATE_READY
        return proc_id

    # program looks like this:
    #   c7,i,c1,i
    # which means
    #   compute for 7, then i/o, then compute for 1, then i/o
    def load_program(self, program):
        proc_id = self.new_process()
        for line in program.split(','):
            opcode = line[0]
            if opcode == 'c': # compute
                num = int(line[1:])
                for i in range(num):
                    self.proc_info[proc_id][PROC_CODE].append(DO_COMPUTE)
            elif opcode == 'i':
                self.proc_info[proc_id][PROC_CODE].append(DO_IO)
                # add one compute to HANDLE the I/O completion
                self.proc_info[proc_id][PROC_CODE].append(DO_IO_DONE)
            else:
                print('bad opcode %s (should be c or i)' % opcode)
                exit(1)
        return

    def load(self, program_description):
        proc_id = self.new_process()
        tmp = program_description.split(':')
        if len(tmp) != 2:
            print('Bad description (%s): Must be number <x:y>' % program_description)
            print('  where X is the number of instructions')
            print('  and Y is the percent change that an instruction is CPU not IO')
            exit(1)

        num_instructions, chance_cpu = int(tmp[0]), float(tmp[1])/100.0
        for i in range(num_instructions):
            if random.random() < chance_cpu:
                self.proc_info[proc_id][PROC_CODE].append(DO_COMPUTE)
            else:
                self.proc_info[proc_id][PROC_CODE].append(DO_IO)
                # add one compute to HANDLE the I/O completion
                self.proc_info[proc_id][PROC_CODE].append(DO_IO_DONE)
        return

    def move_to_ready(self, expected, pid=-1):
        if pid == -1:
            pid = self.curr_proc
        assert(self.proc_info[pid][PROC_STATE] == expected)
        self.proc_info[pid][PROC_STATE] = STATE_READY
        return

    def move_to_wait(self, expected):
        assert(self.proc_info[self.curr_proc][PROC_STATE] == expected)
        self.proc_info[self.curr_proc][PROC_STATE] = STATE_WAIT
        return

    def move_to_running(self, expected):
        assert(self.proc_info[self.curr_proc][PROC_STATE] == expected)
        self.proc_info[self.curr_proc][PROC_STATE] = STATE_RUNNING
        return

    def move_to_done(self, expected):
        assert(self.proc_info[self.curr_proc][PROC_STATE] == expected)
        self.proc_info[self.curr_proc][PROC_STATE] = STATE_DONE
        return

    def next_proc(self, pid=-1):
        if pid != -1:
            self.curr_proc = pid
            self.move_to_running(STATE_READY)
            return
        for pid in range(self.curr_proc + 1, len(self.proc_info)):
            if self.proc_info[pid][PROC_STATE] == STATE_READY:
                self.curr_proc = pid
                self.move_to_running(STATE_READY)
                return
        for pid in range(0, self.curr_proc + 1):
            if self.proc_info[pid][PROC_STATE] == STATE_READY:
                self.curr_proc = pid
                self.move_to_running(STATE_READY)
                return
        return

    def get_num_processes(self):
        return len(self.proc_info)

    def get_num_instructions(self, pid):
        return len(self.proc_info[pid][PROC_CODE])

    def get_instruction(self, pid, index):
        return self.proc_info[pid][PROC_CODE][index]

    def get_num_active(self):
        num_active = 0
        for pid in range(len(self.proc_info)):
            if self.proc_info[pid][PROC_STATE] != STATE_DONE:
                num_active += 1
        return num_active

    def get_num_runnable(self):
        num_active = 0
        for pid in range(len(self.proc_info)):
            if self.proc_info[pid][PROC_STATE] == STATE_READY or \
                   self.proc_info[pid][PROC_STATE] == STATE_RUNNING:
                num_active += 1
        return num_active

    def get_ios_in_flight(self, current_time):
        num_in_flight = 0
        for pid in range(len(self.proc_info)):
            for t in self.io_finish_times[pid]:
                if t > current_time:
                    num_in_flight += 1
        return num_in_flight

    def check_for_switch(self):
        return

    def space(self, num_columns):
        for i in range(num_columns):
            print('%10s' % ' ', end='')

    def check_if_done(self):
        if len(self.proc_info[self.curr_proc][PROC_CODE]) == 0:
            if self.proc_info[self.curr_proc][PROC_STATE] == STATE_RUNNING:
                self.move_to_done(STATE_RUNNING)
                self.next_proc()
        return

    def run(self):
        clock_tick = 0

        if len(self.proc_info) == 0:
            return

        # track outstanding IOs, per process
        self.io_finish_times = {}
        for pid in range(len(self.proc_info)):
            self.io_finish_times[pid] = []

        # make first one active
        self.curr_proc = 0
        self.move_to_running(STATE_READY)

        # OUTPUT: headers for each column
        print('%s' % 'Time', end='') 
        for pid in range(len(self.proc_info)):
            print('%14s' % ('PID:%2d' % (pid)), end='')
        print('%14s' % 'CPU', end='')
        print('%14s' % 'IOs', end='')
        print('')

        # init statistics
        io_busy = 0
        cpu_busy = 0

        while self.get_num_active() > 0:
            clock_tick += 1

            # check for io finish
            io_done = False
            for pid in range(len(self.proc_info)):
                if clock_tick in self.io_finish_times[pid]:
                    io_done = True
                    self.move_to_ready(STATE_WAIT, pid)
                    if self.io_done_behavior == IO_RUN_IMMEDIATE:
                        # IO_RUN_IMMEDIATE
                        if self.curr_proc != pid:
                            if self.proc_info[self.curr_proc][PROC_STATE] == STATE_RUNNING:
                                self.move_to_ready(STATE_RUNNING)
                        self.next_proc(pid)
                    else:
                        # IO_RUN_LATER
                        if self.process_switch_behavior == SCHED_SWITCH_ON_END and self.get_num_runnable() > 1:
                            # this means the process that issued the io should be run
                            self.next_proc(pid)
                        if self.get_num_runnable() == 1:
                            # this is the only thing to run: so run it
                            self.next_proc(pid)
                    self.check_if_done()

            # if current proc is RUNNING and has an instruction, execute it
            instruction_to_execute = ''
            if self.proc_info[self.curr_proc][PROC_STATE] == STATE_RUNNING and \
                   len(self.proc_info[self.curr_proc][PROC_CODE]) > 0:
                instruction_to_execute = self.proc_info[self.curr_proc][PROC_CODE].pop(0)
                cpu_busy += 1

            # OUTPUT: print what everyone is up to
            if io_done:
                print('%3d*' % clock_tick, end='')
            else:
                print('%3d ' % clock_tick, end='')
            for pid in range(len(self.proc_info)):
                if pid == self.curr_proc and instruction_to_execute != '':
                    print('%14s' % ('RUN:'+instruction_to_execute), end='')
                else:
                    print('%14s' % (self.proc_info[pid][PROC_STATE]), end='')

            # CPU output here: if no instruction executes, output a space, otherwise a 1 
            if instruction_to_execute == '':
                print('%14s' % ' ', end='')
            else:
                print('%14s' % '1', end='')

            # IO output here:
            num_outstanding = self.get_ios_in_flight(clock_tick)
            if num_outstanding > 0:
                print('%14s' % str(num_outstanding), end='')
                io_busy += 1
            else:
                print('%10s' % ' ', end='')
            print('')

            # if this is an IO start instruction, switch to waiting state
            # and add an io completion in the future
            if instruction_to_execute == DO_IO:
                self.move_to_wait(STATE_RUNNING)
                self.io_finish_times[self.curr_proc].append(clock_tick + self.io_length + 1)
                if self.process_switch_behavior == SCHED_SWITCH_ON_IO:
                    self.next_proc()

            # ENDCASE: check if currently running thing is out of instructions
            self.check_if_done()
        return (cpu_busy, io_busy, clock_tick)

#
# PARSE ARGUMENTS
#

parser = OptionParser()
parser.add_option('-s', '--seed', default=0, help='the random seed', action='store', type='int', dest='seed')
parser.add_option('-P', '--program', default='', help='more specific controls over programs', action='store', type='string', dest='program')
parser.add_option('-l', '--processlist', default='', help='a comma-separated list of processes to run, in the form X1:Y1,X2:Y2,... where X is the number of instructions that process should run, and Y the chances (from 0 to 100) that an instruction will use the CPU or issue an IO (i.e., if Y is 100, a process will ONLY use the CPU and issue no I/Os; if Y is 0, a process will only issue I/Os)', action='store', type='string', dest='process_list')
parser.add_option('-L', '--iolength', default=5, help='how long an IO takes', action='store', type='int', dest='io_length')
parser.add_option('-S', '--switch', default='SWITCH_ON_IO', help='when to switch between processes: SWITCH_ON_IO, SWITCH_ON_END', action='store', type='string', dest='process_switch_behavior')
parser.add_option('-I', '--iodone', default='IO_RUN_LATER', help='type of behavior when IO ends: IO_RUN_LATER, IO_RUN_IMMEDIATE', action='store', type='string', dest='io_done_behavior')
parser.add_option('-c', help='compute answers for me', action='store_true', default=False, dest='solve')
parser.add_option('-p', '--printstats', help='print statistics at end; only useful with -c flag (otherwise stats are not printed)', action='store_true', default=False, dest='print_stats')
(options, args) = parser.parse_args()

random_seed(options.seed)

assert(options.process_switch_behavior == SCHED_SWITCH_ON_IO or options.process_switch_behavior == SCHED_SWITCH_ON_END)
assert(options.io_done_behavior == IO_RUN_IMMEDIATE or options.io_done_behavior == IO_RUN_LATER)

s = scheduler(options.process_switch_behavior, options.io_done_behavior, options.io_length)

if options.program != '':
    for p in options.program.split(':'):
        s.load_program(p)
else:
    # example process description (10:100,10:100)
    for p in options.process_list.split(','):
        s.load(p)

assert(options.io_length >= 0)

if options.solve == False:
    print('Produce a trace of what would happen when you run these processes:')
    for pid in range(s.get_num_processes()):
        print('Process %d' % pid)
        for inst in range(s.get_num_instructions(pid)):
            print('  %s' % s.get_instruction(pid, inst))
        print('')
    print('Important behaviors:')
    print('  System will switch when ', end='')
    if options.process_switch_behavior == SCHED_SWITCH_ON_IO:
        print('the current process is FINISHED or ISSUES AN IO')
    else:
        print('the current process is FINISHED')
    print('  After IOs, the process issuing the IO will ', end='')
    if options.io_done_behavior == IO_RUN_IMMEDIATE:
        print('run IMMEDIATELY')
    else:
        print('run LATER (when it is its turn)')
    print('')
    exit(0)

(cpu_busy, io_busy, clock_tick) = s.run()

if options.print_stats:
    print('')
    print('Stats: Total Time %d' % clock_tick)
    print('Stats: CPU Busy %d (%.2f%%)' % (cpu_busy, 100.0 * float(cpu_busy)/clock_tick))
    print('Stats: IO Busy  %d (%.2f%%)' % (io_busy, 100.0 * float(io_busy)/clock_tick))
    print('')
# '-l': 'a comma-separated list of processes to run, in the form X1:Y1,X2:Y2,...
# where X is the number of instructions that process should run
# Y the chances (from 0 to 100) that an instruction will use the CPU or issue an IO
# i.e., if Y is 100, a process will ONLY use the CPU and issue no I/Os;
# if Y is 0, a process will only issue I/Os

python process-run.py -l 5:100,5:100 -c -p
Output
Time        PID: 0        PID: 1           CPU           IOs
  1        RUN:cpu         READY             1          
  2        RUN:cpu         READY             1          
  3        RUN:cpu         READY             1          
  4        RUN:cpu         READY             1          
  5        RUN:cpu         READY             1          
  6           DONE       RUN:cpu             1          
  7           DONE       RUN:cpu             1          
  8           DONE       RUN:cpu             1          
  9           DONE       RUN:cpu             1          
 10           DONE       RUN:cpu             1          

Stats: Total Time 10
Stats: CPU Busy 10 (100.00%)
Stats: IO Busy  0 (0.00%)

python process-run.py -l 4:100,1:0 -c -p
Output
Time        PID: 0        PID: 1           CPU           IOs
  1        RUN:cpu         READY             1          
  2        RUN:cpu         READY             1          
  3        RUN:cpu         READY             1          
  4        RUN:cpu         READY             1          
  5           DONE        RUN:io             1          
  6           DONE       BLOCKED                           1
  7           DONE       BLOCKED                           1
  8           DONE       BLOCKED                           1
  9           DONE       BLOCKED                           1
 10           DONE       BLOCKED                           1
 11*          DONE   RUN:io_done             1          

Stats: Total Time 11
Stats: CPU Busy 6 (54.55%)
Stats: IO Busy  5 (45.45%)

# switching the order changes the total time taken

python process-run.py -l 1:0,4:100 -c -p
Output
Time        PID: 0        PID: 1           CPU           IOs
  1         RUN:io         READY             1          
  2        BLOCKED       RUN:cpu             1             1
  3        BLOCKED       RUN:cpu             1             1
  4        BLOCKED       RUN:cpu             1             1
  5        BLOCKED       RUN:cpu             1             1
  6        BLOCKED          DONE                           1
  7*   RUN:io_done          DONE             1          

Stats: Total Time 7
Stats: CPU Busy 6 (85.71%)
Stats: IO Busy  5 (71.43%)

# -S determines how system reacts when process issues I/O

python process-run.py -l 1:0,4:100 -c -S SWITCH_ON_END
Output
Time        PID: 0        PID: 1           CPU           IOs
  1         RUN:io         READY             1          
  2        BLOCKED         READY                           1
  3        BLOCKED         READY                           1
  4        BLOCKED         READY                           1
  5        BLOCKED         READY                           1
  6        BLOCKED         READY                           1
  7*   RUN:io_done         READY             1          
  8           DONE       RUN:cpu             1          
  9           DONE       RUN:cpu             1          
 10           DONE       RUN:cpu             1          
 11           DONE       RUN:cpu             1          
python process-run.py -l 1:0,4:100 -c -S SWITCH_ON_IO
Output
Time        PID: 0        PID: 1           CPU           IOs
  1         RUN:io         READY             1          
  2        BLOCKED       RUN:cpu             1             1
  3        BLOCKED       RUN:cpu             1             1
  4        BLOCKED       RUN:cpu             1             1
  5        BLOCKED       RUN:cpu             1             1
  6        BLOCKED          DONE                           1
  7*   RUN:io_done          DONE             1          
# -I defines the type of behavior when IO ends

python process-run.py -l 3:0,5:100,5:100,5:100 -S SWITCH_ON_IO -I IO_RUN_LATER -c -p
Output
Time        PID: 0        PID: 1        PID: 2        PID: 3           CPU           IOs
  1         RUN:io         READY         READY         READY             1          
  2        BLOCKED       RUN:cpu         READY         READY             1             1
  3        BLOCKED       RUN:cpu         READY         READY             1             1
  4        BLOCKED       RUN:cpu         READY         READY             1             1
  5        BLOCKED       RUN:cpu         READY         READY             1             1
  6        BLOCKED       RUN:cpu         READY         READY             1             1
  7*         READY          DONE       RUN:cpu         READY             1          
  8          READY          DONE       RUN:cpu         READY             1          
  9          READY          DONE       RUN:cpu         READY             1          
 10          READY          DONE       RUN:cpu         READY             1          
 11          READY          DONE       RUN:cpu         READY             1          
 12          READY          DONE          DONE       RUN:cpu             1          
 13          READY          DONE          DONE       RUN:cpu             1          
 14          READY          DONE          DONE       RUN:cpu             1          
 15          READY          DONE          DONE       RUN:cpu             1          
 16          READY          DONE          DONE       RUN:cpu             1          
 17    RUN:io_done          DONE          DONE          DONE             1          
 18         RUN:io          DONE          DONE          DONE             1          
 19        BLOCKED          DONE          DONE          DONE                           1
 20        BLOCKED          DONE          DONE          DONE                           1
 21        BLOCKED          DONE          DONE          DONE                           1
 22        BLOCKED          DONE          DONE          DONE                           1
 23        BLOCKED          DONE          DONE          DONE                           1
 24*   RUN:io_done          DONE          DONE          DONE             1          
 25         RUN:io          DONE          DONE          DONE             1          
 26        BLOCKED          DONE          DONE          DONE                           1
 27        BLOCKED          DONE          DONE          DONE                           1
 28        BLOCKED          DONE          DONE          DONE                           1
 29        BLOCKED          DONE          DONE          DONE                           1
 30        BLOCKED          DONE          DONE          DONE                           1
 31*   RUN:io_done          DONE          DONE          DONE             1          

Stats: Total Time 31
Stats: CPU Busy 21 (67.74%)
Stats: IO Busy  15 (48.39%)

python process-run.py -l 3:0,5:100,5:100,5:100 -S SWITCH_ON_IO -I IO_RUN_IMMEDIATE -c -p
Output
Time        PID: 0        PID: 1        PID: 2        PID: 3           CPU           IOs
  1         RUN:io         READY         READY         READY             1          
  2        BLOCKED       RUN:cpu         READY         READY             1             1
  3        BLOCKED       RUN:cpu         READY         READY             1             1
  4        BLOCKED       RUN:cpu         READY         READY             1             1
  5        BLOCKED       RUN:cpu         READY         READY             1             1
  6        BLOCKED       RUN:cpu         READY         READY             1             1
  7*   RUN:io_done          DONE         READY         READY             1          
  8         RUN:io          DONE         READY         READY             1          
  9        BLOCKED          DONE       RUN:cpu         READY             1             1
 10        BLOCKED          DONE       RUN:cpu         READY             1             1
 11        BLOCKED          DONE       RUN:cpu         READY             1             1
 12        BLOCKED          DONE       RUN:cpu         READY             1             1
 13        BLOCKED          DONE       RUN:cpu         READY             1             1
 14*   RUN:io_done          DONE          DONE         READY             1          
 15         RUN:io          DONE          DONE         READY             1          
 16        BLOCKED          DONE          DONE       RUN:cpu             1             1
 17        BLOCKED          DONE          DONE       RUN:cpu             1             1
 18        BLOCKED          DONE          DONE       RUN:cpu             1             1
 19        BLOCKED          DONE          DONE       RUN:cpu             1             1
 20        BLOCKED          DONE          DONE       RUN:cpu             1             1
 21*   RUN:io_done          DONE          DONE          DONE             1          

Stats: Total Time 21
Stats: CPU Busy 21 (100.00%)
Stats: IO Busy  15 (71.43%)

# -s is the seed

python process-run.py -s 1 -l 3:50,3:50 -c -p
python process-run.py -s 2 -l 3:50,3:50 -c -p
python process-run.py -s 3 -l 3:50,3:50 -c -p
Output
Time        PID: 0        PID: 1           CPU           IOs
  1        RUN:cpu         READY             1          
  2         RUN:io         READY             1          
  3        BLOCKED       RUN:cpu             1             1
  4        BLOCKED       RUN:cpu             1             1
  5        BLOCKED       RUN:cpu             1             1
  6        BLOCKED          DONE                           1
  7        BLOCKED          DONE                           1
  8*   RUN:io_done          DONE             1          
  9         RUN:io          DONE             1          
 10        BLOCKED          DONE                           1
 11        BLOCKED          DONE                           1
 12        BLOCKED          DONE                           1
 13        BLOCKED          DONE                           1
 14        BLOCKED          DONE                           1
 15*   RUN:io_done          DONE             1          

Stats: Total Time 15
Stats: CPU Busy 8 (53.33%)
Stats: IO Busy  10 (66.67%)

Time        PID: 0        PID: 1           CPU           IOs
  1         RUN:io         READY             1          
  2        BLOCKED       RUN:cpu             1             1
  3        BLOCKED        RUN:io             1             1
  4        BLOCKED       BLOCKED                           2
  5        BLOCKED       BLOCKED                           2
  6        BLOCKED       BLOCKED                           2
  7*   RUN:io_done       BLOCKED             1             1
  8         RUN:io       BLOCKED             1             1
  9*       BLOCKED   RUN:io_done             1             1
 10        BLOCKED        RUN:io             1             1
 11        BLOCKED       BLOCKED                           2
 12        BLOCKED       BLOCKED                           2
 13        BLOCKED       BLOCKED                           2
 14*   RUN:io_done       BLOCKED             1             1
 15        RUN:cpu       BLOCKED             1             1
 16*          DONE   RUN:io_done             1          

Stats: Total Time 16
Stats: CPU Busy 10 (62.50%)
Stats: IO Busy  14 (87.50%)

Time        PID: 0        PID: 1           CPU           IOs
  1        RUN:cpu         READY             1          
  2         RUN:io         READY             1          
  3        BLOCKED        RUN:io             1             1
  4        BLOCKED       BLOCKED                           2
  5        BLOCKED       BLOCKED                           2
  6        BLOCKED       BLOCKED                           2
  7        BLOCKED       BLOCKED                           2
  8*   RUN:io_done       BLOCKED             1             1
  9*       RUN:cpu         READY             1          
 10           DONE   RUN:io_done             1          
 11           DONE        RUN:io             1          
 12           DONE       BLOCKED                           1
 13           DONE       BLOCKED                           1
 14           DONE       BLOCKED                           1
 15           DONE       BLOCKED                           1
 16           DONE       BLOCKED                           1
 17*          DONE   RUN:io_done             1          
 18           DONE       RUN:cpu             1          

Stats: Total Time 18
Stats: CPU Busy 9 (50.00%)
Stats: IO Busy  11 (61.11%)

System Calls

Upon fork() ing, the child isn’t an exact copy so it has its own address space and the value it returns to the caller is different. (Mandatory read: A fork() in the road: Baumann et al.)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid: %d)\n", (int) getpid());
    fflush(stdout); // Without flushing, the child was getting copy of stdout buffer
    int rc = fork();
    if (rc < 0) {
        // fork failed
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) {
        // child (new process)
        printf("hello, I am child (pid: %d)\n", (int) getpid());
    } else {
        // parent goes down this path (main)
        printf("hello, I am parent of %d (pid: %d)\n", rc, (int) getpid());
    }
    return 0;
}
hello world (pid: 86839)
hello, I am parent of 86840 (pid: 86839)
hello, I am child (pid: 86840)

While the parent receives PID of newly-created child, the child receives rc 0. (Why does the second hello world also refer to the parent pid in its “hello world”? Fixed by flushing the output, without this the child process was getting a copy of the buffered stdout)

man posix_spawn | col -b
Output
POSIX_SPAWN(2)		      System Calls Manual		POSIX_SPAWN(2)

NAME
     posix_spawn posix_spawnp – spawn a process

SYNOPSIS
     #include <spawn.h>

     int
     posix_spawn(pid_t *restrict pid, const char *restrict path,
	 const posix_spawn_file_actions_t *file_actions,
	 const posix_spawnattr_t *restrict attrp, char *const argv[restrict],
	 char *const envp[restrict]);

     int
     posix_spawnp(pid_t *restrict pid, const char *restrict file,
	 const posix_spawn_file_actions_t *file_actions,
	 const posix_spawnattr_t *restrict attrp, char *const argv[restrict],
	 char *const envp[restrict]);

DESCRIPTION
     The posix_spawn() function creates a new process from the executable
     file, called the new process file, specified by path, which is an
     absolute or relative path to the file.  The posix_spawnp() function is
     identical to the posix_spawn() function if the file specified contains a
     slash character; otherwise, the file parameter is used to construct a
     pathname, with its path prefix being obtained by a search of the path
     specified in the environment by the “PATH variable”.  If this variable
     isn't specified, the default path is set according to the _PATH_DEFPATH
     definition in <paths.h>, which is set to “/usr/bin:/bin”.	This pathname
     either refers to an executable object file, or a file of data for an
     interpreter; execve(2) for more details.

     The argument pid is a pointer to a pid_t variable to receive the pid of
     the spawned process; if this is NULL, then the pid of the spawned process
     is not returned.  If this pointer is non-NULL, then on successful
     completion, the variable will be modified to contain the pid of the
     spawned process.  The value is undefined in the case of a failure.

     The argument file_actions is either NULL, or it is a pointer to a file
     actions object that was initialized by a call to
     posix_spawn_file_actions_init(3) and represents zero or more file
     actions.

     File descriptors open in the calling process image remain open in the new
     process image, except for those for which the close-on-exec flag is set
     (see close(2) and fcntl(2)).  Descriptors that remain open are unaffected
     by posix_spawn() unless their behaviour is modified by particular spawn
     flags or a file action; see posix_spawnattr_setflags(3) and
     posix_spawn_file_actions_init(3) for additional information.

     The argument attrp is either NULL, or it is a pointer to an attributes
     object that was initialized by a call to posix_spawnattr_init(3) and
     represents a set of spawn attributes to apply.  If NULL, then the default
     attributes are applied; otherwise, these attributes can control various
     aspects of the spawned process, and are applied prior to the spawned
     process beginning execution; see posix_spawnattr_init(3) for more
     information.

     The argument argv is a pointer to a null-terminated array of character
     pointers to null-terminated character strings.  These strings construct
     the argument list to be made available to the new process.  At least
     argv[0] must be present in the array, and should contain the file name of
     the program being spawned, e.g. the last component of the path or file
     argument.

     The argument envp is a pointer to a null-terminated array of character
     pointers to null-terminated strings.  A pointer to this array is normally
     stored in the global variable environ. These strings pass information to
     the new process that is not directly an argument to the command (see
     environ(7)).

     Signals set to be ignored in the calling process are set to be ignored in
     the new process, unless the behaviour is modified by user specified spawn
     attributes.  Signals which are set to be caught in the calling process
     image are set to default action in the new process image.	Blocked
     signals remain blocked regardless of changes to the signal action, unless
     the mask is overridden by user specified spawn attributes.  The signal
     stack is reset to be undefined (see sigaction(2) for more information).

     By default, the effective user ID and group ID will be the same as those
     of the calling process image; however, this may be overridden to force
     them to be the real user ID and group ID of the parent process by user
     specified spawn attributes (see posix_spawnattr_init(3) for more
     information).

     If the set-user-ID mode bit of the new process image file is set (see
     chmod(2)), the effective user ID of the new process image is set to the
     owner ID of the new process image file.  If the set-group-ID mode bit of
     the new process image file is set, the effective group ID of the new
     process image is set to the group ID of the new process image file.  (The
     effective group ID is the first element of the group list.)  The real
     user ID, real group ID and supplementary group IDs of the new process
     image remain the same as the calling process image.  After any set-user-
     ID and set-group-ID processing, the effective user ID is recorded as the
     saved set-user-ID, and the effective group ID is recorded as the saved
     set-group-ID.  These values may be used in changing the effective IDs
     later (see setuid(2)).

     The new process also inherits the following attributes from the calling
     process:

	   parent process ID	see getppid(2)
	   process group ID	see getpgrp(2), posix_spawnattr_init(3)
	   access groups	see getgroups(2)
	   working directory	see chdir(2)
	   root directory	see chroot(2)
	   control terminal	see termios(4)
	   resource usages	see getrusage(2)
	   interval timers	see getitimer(2)
	   resource limits	see getrlimit(2)
	   file mode mask	see umask(2)
	   signal mask		see sigaction(2), sigsetmask(2),
				posix_spawnattr_init(3)

     When a program is executed as a result of a posix_spawn() or
     posix_spawnp() call, it is entered as follows:

	   int
	   main(int argc, char **argv, char **envp);

     where argc is the number of elements in argv (the ``arg count'') and argv
     points to the array of character pointers to the arguments themselves.

RETURN VALUES
     If the pid argument is NULL, no pid is returned to the calling process;
     if it is non-NULL, then posix_spawn() and posix_spawnp() functions return
     the process ID of the child process into the pid_t variable pointed to by
     the pid argument and return a 0 on success.  If an error occurs, they
     return a non-zero error code as the function return value, and no child
     process is created.

ERRORS
     The posix_spawn() and posix_spawnp() functions will fail and return to
     the calling process if:

     [EINVAL]		The value specified by file_actions or attrp is
			invalid.

     [E2BIG]		The number of bytes in the new process's argument list
			is larger than the system-imposed limit.  This limit
			is specified by the sysctl(3) MIB variable
			KERN_ARGMAX.

     [EACCES]		Search permission is denied for a component of the
			path prefix.

     [EACCES]		The new process file is not an ordinary file.

     [EACCES]		The new process file mode denies execute permission.

     [EACCES]		The new process file is on a filesystem mounted with
			execution disabled (MNT_NOEXEC in ⟨sys/mount.h⟩).

     [EFAULT]		The new process file is not as long as indicated by
			the size values in its header.

     [EFAULT]		Path, argv, or envp point to an illegal address.

     [EIO]		An I/O error occurred while reading from the file
			system.

     [ELOOP]		Too many symbolic links were encountered in
			translating the pathname.  This is taken to be
			indicative of a looping symbolic link.

     [ENAMETOOLONG]	A component of a pathname exceeded {NAME_MAX}
			characters, or an entire path name exceeded {PATH_MAX}
			characters.

     [ENOENT]		The new process file does not exist.

     [ENOEXEC]		The new process file has the appropriate access
			permission, but has an unrecognized format (e.g., an
			invalid magic number in its header).

     [ENOMEM]		The new process requires more virtual memory than is
			allowed by the imposed maximum (getrlimit(2)).

     [ENOTDIR]		A component of the path prefix is not a directory.

     [ETXTBSY]		The new process file is a pure procedure (shared text)
			file that is currently open for writing or reading by
			some process.

     [EBADARCH] 	The new process file has no architectures appropriate
			for the current system.

     [EBADF]		Bad file descriptor for one or more file_actions.

     Additionally, they may fail for any of the reasons listed in fork(2) or
     exec(3).

CAVEAT
     If a program is setuid to a non-super-user, but is executed when the real
     uid is ``root'', then the program has some of the powers of a super-user
     as well.

SEE ALSO
     exit(2), fork(2), execl(3), sysctl(3), environ(7),
     posix_spawnattr_init(3), posix_spawn_file_actions_init(3),

STANDARDS
     Version 3 of the Single UNIX Specification (“SUSv3”) [SPN]

HISTORY
     The posix_spawn() and posix_spawnp() function calls appeared in Version 3
     of the Single UNIX Specification (“SUSv3”) [SPN].

Mac OS X		       November 2, 2010 		      Mac OS X

Sometimes its useful for parent to wait() for child process to finish. In the below program, the child will print first, thus introducing a bit of determinism.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid: %d)\n", (int) getpid());
    fflush(stdout); // Without flushing, the child was getting copy of stdout buffer
    int rc = fork();
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        printf("hello, I am child (pid: %d)\n", (int) getpid());
    } else { // parent goes down this path (main)
        int rc_wait = wait(NULL);
        printf("hello, I am parent of %d (rc_wait: %d) (pid: %d)\n",
               rc, rc_wait, (int) getpid());
    }
    return 0;
}
hello world (pid: 86584)
hello, I am child (pid: 86586)
hello, I am parent of 86586 (rc_wait: 86586) (pid: 86584)

The exec() syscall if useful when one wants to run a program different from the calling program. Here execvp() runs wc which shows how many lines, words and bytes are in process-run.py.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid: %d)\n", (int) getpid());
    fflush(stdout); // Without flushing, the child was getting copy of stdout buffer
    int rc = fork();
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        printf("hello, I am child (pid: %d)\n", (int) getpid());
        fflush(stdout); // Always flush!
        char *myargs[3];
        myargs[0] = strdup("wc");
        myargs[1] = strdup("process-run.py");
        myargs[2] = NULL; // marks end of array
        execvp(myargs[0], myargs); // runs `wc`
    } else { // parent goes down this path (main)
        int rc_wait = wait(NULL);
        printf("hello, I am parent of %d (rc_wait: %d) (pid: %d)\n",
               rc, rc_wait, (int) getpid());
    }
    return 0;
}
hello world (pid: 2019)
hello, I am child (pid: 2020)
  342  1174 13351 process-run.py
hello, I am parent of 2020 (rc_wait: 2020) (pid: 2019)

Here the exec() call, loads code (and static data) from the executable passed to it and overwrites its current code segment (and current static data) with the loaded one; the heap/stack and other parts of memory space of program are re-initialized and the OS just runs the new program. It doesn’t even create a new process, just transforms the existing one (fork_wait_exec_p3) into the exec’d program (wc).

man execve | col -b
Output
EXECVE(2)		      System Calls Manual		     EXECVE(2)

NAME
     execve – execute a file

SYNOPSIS
     #include <unistd.h>

     int
     execve(const char *path, char *const argv[], char *const envp[]);

DESCRIPTION
     execve() transforms the calling process into a new process.  The new
     process is constructed from an ordinary file, whose name is pointed to by
     path, called the new process file.  This file is either an executable
     object file, or a file of data for an interpreter.  An executable object
     file consists of an identifying header, followed by pages of data
     representing the initial program (text) and initialized data pages.
     Additional pages may be specified by the header to be initialized with
     zero data;  see a.out(5).

     An interpreter file begins with a line of the form:

	   #! interpreter [arg ...]

     When an interpreter file is execve()'d, the system runs the specified
     interpreter.  If any optional args are specified, they become the first
     (second, ...) argument to the interpreter. The name of the originally
     execve()'d file becomes the subsequent argument; otherwise, the name of
     the originally execve()'d file is the first argument.  The original
     arguments to the invocation of the interpreter are shifted over to become
     the final arguments.  The zeroth argument, normally the name of the
     execve()'d file, is left unchanged.

     The argument argv is a pointer to a null-terminated array of character
     pointers to null-terminated character strings.  These strings construct
     the argument list to be made available to the new process.  At least one
     argument must be present in the array; by custom, the first element
     should be the name of the executed program (for example, the last
     component of path).

     The argument envp is also a pointer to a null-terminated array of
     character pointers to null-terminated strings.  A pointer to this array
     is normally stored in the global variable environ. These strings pass
     information to the new process that is not directly an argument to the
     command (see environ(7)).

     File descriptors open in the calling process image remain open in the new
     process image, except for those for which the close-on-exec flag is set
     (see close(2) and fcntl(2)).  Descriptors that remain open are unaffected
     by execve().

     Signals set to be ignored in the calling process are set to be ignored in
     the new process. Signals which are set to be caught in the calling
     process image are set to default action in the new process image.
     Blocked signals remain blocked regardless of changes to the signal
     action.  The signal stack is reset to be undefined (see sigaction(2) for
     more information).

     If the set-user-ID mode bit of the new process image file is set (see
     chmod(2)), the effective user ID of the new process image is set to the
     owner ID of the new process image file.  If the set-group-ID mode bit of
     the new process image file is set, the effective group ID of the new
     process image is set to the group ID of the new process image file.  (The
     effective group ID is the first element of the group list.)  The real
     user ID, real group ID and other group IDs of the new process image
     remain the same as the calling process image.  After any set-user-ID and
     set-group-ID processing, the effective user ID is recorded as the saved
     set-user-ID, and the effective group ID is recorded as the saved set-
     group-ID.	These values may be used in changing the effective IDs later
     (see setuid(2)).

     The new process also inherits the following attributes from the calling
     process:

	   process ID		see getpid(2)
	   parent process ID	see getppid(2)
	   process group ID	see getpgrp(2)
	   access groups	see getgroups(2)
	   working directory	see chdir(2)
	   root directory	see chroot(2)
	   control terminal	see termios(4)
	   resource usages	see getrusage(2)
	   interval timers	see getitimer(2)
	   resource limits	see getrlimit(2)
	   file mode mask	see umask(2)
	   signal mask		see sigaction(2), sigsetmask(2)

     When a program is executed as a result of an execve() call, it is entered
     as follows:

	   int
	   main(int argc, char **argv, char **envp);

     where argc is the number of elements in argv (the ``arg count'') and argv
     points to the array of character pointers to the arguments themselves.

RETURN VALUES
     As the execve() function overlays the current process image  with a new
     process image, the successful call has no process to return to.  If
     execve() does return to the calling process, an error has occurred; the
     return value will be -1 and the global variable errno is set to indicate
     the error.

ERRORS
     execve() will fail and return to the calling process if:

     [E2BIG]		The number of bytes in the new process's argument list
			is larger than the system-imposed limit.  This limit
			is specified by the sysctl(3) MIB variable
			KERN_ARGMAX.

     [EACCES]		Search permission is denied for a component of the
			path prefix.

     [EACCES]		The new process file is not an ordinary file.

     [EACCES]		The new process file mode denies execute permission.

     [EACCES]		The new process file is on a filesystem mounted with
			execution disabled (MNT_NOEXEC in ⟨sys/mount.h⟩).

     [EFAULT]		The new process file is not as long as indicated by
			the size values in its header.

     [EFAULT]		Path, argv, or envp point to an illegal address.

     [EIO]		An I/O error occurred while reading from the file
			system.

     [ELOOP]		Too many symbolic links were encountered in
			translating the pathname.  This is taken to be
			indicative of a looping symbolic link.

     [ENAMETOOLONG]	A component of a pathname exceeded {NAME_MAX}
			characters, or an entire path name exceeded {PATH_MAX}
			characters.

     [ENOENT]		The new process file does not exist.

     [ENOEXEC]		The new process file has the appropriate access
			permission, but has an unrecognized format (e.g., an
			invalid magic number in its header).

     [ENOMEM]		The new process requires more virtual memory than is
			allowed by the imposed maximum (getrlimit(2)).

     [ENOTDIR]		A component of the path prefix is not a directory.

     [ETXTBSY]		The new process file is a pure procedure (shared text)
			file that is currently open for writing or reading by
			some process.

CAVEAT
     If a program is setuid to a non-super-user, but is executed when the real
     uid is ``root'', then the program has some of the powers of a super-user
     as well.

SEE ALSO
     exit(2), fork(2), execl(3), sysctl(3), environ(7)

HISTORY
     The execve() function call appeared in 4.2BSD.

BSD 4			       January 24, 1994 			 BSD 4

The separation and interface of fork() and exec() allows the shell to run code after forking, but before execing; giving it the ability to alter the environment of the program that is about to run. What happens in a shell is that after getting the command, it finds out where the executable resides, fork() s a new child process to run the command, runs exec() for the command and then wait() s for it to return. To illustrate: in wc redir.c > newfile.txt, the output redirection happens by the shell closing stdout and opens newfile.txt before execing.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    int rc = fork();
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        // child: redirect standard output to a file
        close(STDOUT_FILENO);
        open("./newfile.txt", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);

        char *myargs[3];
        myargs[0] = strdup("wc");
        myargs[1] = strdup("redir.c");
        myargs[2] = NULL; // marks end of array
        execvp(myargs[0], myargs); // runs `wc`
    } else { // parent goes down this path (main)
        int rc_wait = wait(NULL);
    }
    return 0;
}
gcc -o redir redir.c
./redir
wait
cat newfile.txt
rm redir newfile.txt
27  97 786 redir.c

Other syscalls, such as kill() is used to signal to a process to pause, die etc. Some shell keys also sends these signals, for example: C-c sends SIGINT (interrupt, normal termination), C-z sends SIGSTP (stop, can resume with fg). This signal() subsystem allows for sending external events to processes. The process must “catch” these signals, suspend normal execution and run some code in response.

Homework

fork.py
#! /usr/bin/env python3

# from this sort of thing
# a forks b
# a forks c
# a forks d
# b forks e
# b forks f
# d forks g

# to a process tree
#a --- b --- e
#   |     |
#   |     |- f
#   |- c
#   |
#   |- d --- g


from __future__ import print_function
import random
import string
from optparse import OptionParser

#
# to make Python2 and Python3 act the same -- how dumb
# 
def random_seed(seed):
    try:
        random.seed(seed, version=1)
    except:
        random.seed(seed)
    return

def random_randint(low, hi):
    return int(low + random.random() * (hi - low + 1))

def random_choice(L):
    return L[random_randint(0, len(L)-1)]

#
# class Forker does all the work
#
class Forker:
    def __init__(self, fork_percentage, actions, action_list, show_tree, just_final,
                 leaf_only, local_reparent, print_style, solve):
        self.fork_percentage = fork_percentage
        self.max_actions = actions
        self.action_list = action_list
        self.show_tree = show_tree
        self.just_final = just_final
        self.leaf_only = leaf_only
        self.local_reparent = local_reparent
        self.print_style = print_style
        self.solve = solve

        # root process is always this
        self.root_name = 'a'

        # process list: names of all active processes
        self.process_list = [self.root_name]

        # for each process, it's children
        self.children = {}
        self.children[self.root_name] = []

        # and track parents
        self.parents = {}

        # this is for pretty process names...
        self.name_length = 1
        self.base_names = string.ascii_lowercase + string.ascii_uppercase
        self.curr_names = self.base_names
        self.curr_index = 1

        return

    def grow_names(self):
        new_names = []
        for b1 in self.curr_names:
            for b2 in self.base_names:
                new_names.append(b1 + b2)
        self.curr_names = new_names
        self.curr_index = 0
        return

    def get_name(self):
        if self.curr_index == len(self.curr_names):
            self.grow_names()

        name = self.curr_names[self.curr_index]
        self.curr_index += 1
        return name

    def walk(self, p, level, pmask, is_last):
        print('                               ', end='')
        if self.print_style == 'basic':
            for i in range(level):
                print('   ', end='')
            print('%2s' % p)
            for child in self.children[p]:
                self.walk(child, level + 1, {}, False)
            return
        elif self.print_style == 'line1':
            chars = ('|', '-', '+', '|')
        elif self.print_style == 'line2':
            chars = ('|', '_', '|', '|')
        elif self.print_style == 'fancy':
            # these characters taken from 'treelib', a fun printing package for trees
            # https://github.com/caesar0301/treelib
            # chars = ('\u2502', '\u2500', '\u251c', '\u2514')
            chars = (u'\u2502', u'\u2500', u'\u251c', u'\u2514')
        else:
            print('bad style %s' % self.print_style)
            exit(1)

        # print stuff before node
        if level > 0:
            # main printing
            for i in range(level-1):
                if pmask[i]:
                    # '|  '
                    print('%s   ' % chars[0], end='')
                else:
                    print('    ', end='')
            if pmask[level-1]:
                # '|__'
                if is_last:
                    print('%s%s%s ' % (chars[3], chars[1], chars[1]), end='')
                else:
                    print('%s%s%s ' % (chars[2], chars[1], chars[1]), end='')
            else:
                # '___' 
                print(' %s%s%s ' % (chars[1], chars[1], chars[1]), end='')

        # print node
        print('%s' % p)

        # undo parent verticals
        if is_last:
            pmask[level-1] = False

        # recurse
        pmask[level] = True
        for child in self.children[p][:-1]:
            self.walk(child, level + 1, pmask, False)
        for child in self.children[p][-1:]:
            self.walk(child, level + 1, pmask, True)
        return

    def print_tree(self):
        return self.walk(self.root_name, 0, {}, False)

    def do_fork(self, p, c):
        self.process_list.append(c)
        self.children[c] = []
        self.children[p].append(c)
        self.parents[c] = p
        return '%s forks %s' % (p, c)

    def collect_children(self, p):
        if self.children[p] == []:
            return [p]
        else:
            L = [p]
            for c in self.children[p]:
                L += self.collect_children(c)
            return L

    def do_exit(self, p):
        # remove the process from the process list
        if p == self.root_name:
            print('root process: cannot exit')
            exit(1)
        exit_parent = self.parents[p]
        self.process_list.remove(p)

        # for each orphan, set its parent to exiting proc's parent or root
        if self.local_reparent:
            for orphan in self.children[p]:
                self.parents[orphan] = exit_parent
                self.children[exit_parent].append(orphan)
        else:
            # should set ALL descendants to be child of ROOT 
            descendents = self.collect_children(p)
            descendents.remove(p)
            for d in descendents:
                self.children[d] = []
                self.parents[d] = self.root_name
                self.children[self.root_name].append(d)

        # remove the entry from its parent child list
        self.children[exit_parent].remove(p)
        self.children[p] = -1 # should never be used again
        self.parents[p] = -1  # should never be used again

        # remove the entry for this proc from children
        return '%s EXITS' % p

    def bad_action(self, action):
        print('bad action (%s), must be X+Y or X- where X and Y are processes' % action)
        exit(1)
        return

    def check_legal(self, action):
        if '+' in action:
            tmp = action.split('+')
            if len(tmp) != 2:
                self.bad_action(action)
            return [tmp[0], tmp[1]]
        elif '-' in action:
            tmp = action.split('-')
            if len(tmp) != 2:
                self.bad_action(action)
            return [tmp[0]]
        else:
            self.bad_action(action)
        return 

    def run(self):
        print('                           Process Tree:')
        self.print_tree()
        print('')

        if self.action_list != '':
            # use specific action list
            action_list = self.action_list.split(',')
        else:
            action_list = []
            actions = 0
            temp_process_list = [self.root_name]
            level_list = {}
            level_list[self.root_name] = 1

            # this got a little yucky, too much re-doing of the work
            while actions < self.max_actions:
                if random.random() < self.fork_percentage:
                    # FORK:: pick random parent, add child to it
                    fork_choice = random_choice(temp_process_list)
                    new_child = self.get_name()
                    action_list.append('%s+%s' % (fork_choice, new_child))
                    temp_process_list.append(new_child)
                else:
                    # EXIT:: pick random child, remove it
                    #        exception: no killing root process, sorry
                    exit_choice = random_choice(temp_process_list)
                    if exit_choice == self.root_name:
                        continue
                    temp_process_list.remove(exit_choice)
                    action_list.append('%s-' % exit_choice)
                actions += 1

        for a in action_list:
            tmp = self.check_legal(a)
            if len(tmp) == 2:
                fork_choice, new_child = tmp[0], tmp[1]
                if fork_choice not in self.process_list:
                    self.bad_action(a)
                action = self.do_fork(fork_choice, new_child)
            else:
                exit_choice = tmp[0]
                if exit_choice not in self.process_list:
                    self.bad_action(a)
                if self.leaf_only and len(self.children[exit_choice]) > 0:
                    action = '%s EXITS (failed: has children)' % exit_choice
                else:
                    action = self.do_exit(exit_choice)

            # if we got here, we actually did an action...
            if self.show_tree:
                # SHOW TREES (guess actions)
                if self.solve:
                    print('Action:', action)
                else:
                    print('Action?')
                # print('Process Tree:')
                if not self.just_final:
                    self.print_tree()
            else:
                # SHOW ACTIONS (guess tree)
                print('Action:', action)
                if not self.just_final:
                    if self.solve:
                        # print('Process Tree:')
                        self.print_tree()
                    else:
                        print('Process Tree?')

        if self.just_final:
            if self.show_tree:
                print('\n                        Final Process Tree:')
                self.print_tree()
                print('')
            else:
                if self.solve:
                    print('\n                        Final Process Tree:')
                    self.print_tree()
                    print('')
                else:
                    print('\n                        Final Process Tree?\n')

        return


#
# main
#

parser = OptionParser()
parser.add_option('-s', '--seed', default=-1, help='the random seed', action='store', type='int', dest='seed')
parser.add_option('-f', '--forks', default=0.7, help='percent of actions that are forks (not exits)', action='store', type='float', dest='fork_percentage')
parser.add_option('-A', '--action_list', default='', help='action list, instead of randomly generated ones (format: a+b,b+c,b- means a fork b, b fork c, b exit)', action='store', type='string', dest='action_list')
parser.add_option('-a', '--actions', default=5, help='number of forks/exits to do', action='store', type='int', dest='actions')
parser.add_option('-t', '--show_tree', help='show tree (not actions)', action='store_true', default=False, dest='show_tree')
parser.add_option('-P', '--print_style', help='tree print style (basic, line1, line2, fancy)', action='store', type='string', default='fancy', dest='print_style')
parser.add_option('-F', '--final_only', help='just show final state', action='store_true', default=False, dest='just_final')
parser.add_option('-L', '--leaf_only', help='only leaf processes exit', action='store_true', default=False, dest='leaf_only')
parser.add_option('-R', '--local_reparent', help='reparent to local parent', action='store_true', default=False, dest='local_reparent')
parser.add_option('-c', '--compute', help='compute answers for me', action='store_true', default=False, dest='solve')

(options, args) = parser.parse_args()

if options.seed != -1:
    random_seed(options.seed)

if options.fork_percentage <= 0.001:
    print('fork_percentage must be > 0.001')
    exit(1)

print('')
print('ARG seed', options.seed)
print('ARG fork_percentage', options.fork_percentage)
print('ARG actions', options.actions)
print('ARG action_list', options.action_list)
print('ARG show_tree', options.show_tree)
print('ARG just_final', options.just_final)
print('ARG leaf_only', options.leaf_only)
print('ARG local_reparent', options.local_reparent)
print('ARG print_style', options.print_style)
print('ARG solve', options.solve)
print('')

f = Forker(options.fork_percentage, options.actions, options.action_list, options.show_tree, options.just_final, options.leaf_only, options.local_reparent, options.print_style, options.solve)
f.run()
# -s is for specifying the seed
python fork.py -s 10 -c
Output

ARG seed 10
ARG fork_percentage 0.7
ARG actions 5
ARG action_list 
ARG show_tree False
ARG just_final False
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve True

                           Process Tree:
                               a

Action: a forks b
                               a
                               └── b
Action: a forks c
                               a
                               ├── b
                               └── c
Action: c EXITS
                               a
                               └── b
Action: a forks d
                               a
                               ├── b
                               └── d
Action: a forks e
                               a
                               ├── b
                               ├── d
                               └── e
# -f is for controlling the fork percentage
python fork.py -a 20 -f 0.1 -c
echo "------------------------"
python fork.py -a 20 -f 0.9 -c
Output

ARG seed -1
ARG fork_percentage 0.1
ARG actions 20
ARG action_list 
ARG show_tree False
ARG just_final False
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve True

                           Process Tree:
                               a

Action: a forks b
                               a
                               └── b
Action: b EXITS
                               a
Action: a forks c
                               a
                               └── c
Action: c forks d
                               a
                               └── c
                                   └── d
Action: d EXITS
                               a
                               └── c
Action: c EXITS
                               a
Action: a forks e
                               a
                               └── e
Action: e EXITS
                               a
Action: a forks f
                               a
                               └── f
Action: f EXITS
                               a
Action: a forks g
                               a
                               └── g
Action: g EXITS
                               a
Action: a forks h
                               a
                               └── h
Action: h EXITS
                               a
Action: a forks i
                               a
                               └── i
Action: i EXITS
                               a
Action: a forks j
                               a
                               └── j
Action: j EXITS
                               a
Action: a forks k
                               a
                               └── k
Action: k EXITS
                               a
------------------------

ARG seed -1
ARG fork_percentage 0.9
ARG actions 20
ARG action_list 
ARG show_tree False
ARG just_final False
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve True

                           Process Tree:
                               a

Action: a forks b
                               a
                               └── b
Action: b forks c
                               a
                               └── b
                                   └── c
Action: a forks d
                               a
                               ├── b
                               │   └── c
                               └── d
Action: c forks e
                               a
                               ├── b
                               │   └── c
                               │       └── e
                               └── d
Action: c forks f
                               a
                               ├── b
                               │   └── c
                               │       ├── e
                               │       └── f
                               └── d
Action: c forks g
                               a
                               ├── b
                               │   └── c
                               │       ├── e
                               │       ├── f
                               │       └── g
                               └── d
Action: b forks h
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   ├── f
                               │   │   └── g
                               │   └── h
                               └── d
Action: a forks i
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   ├── f
                               │   │   └── g
                               │   └── h
                               ├── d
                               └── i
Action: c forks j
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   ├── f
                               │   │   ├── g
                               │   │   └── j
                               │   └── h
                               ├── d
                               └── i
Action: j forks k
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   ├── f
                               │   │   ├── g
                               │   │   └── j
                               │   │       └── k
                               │   └── h
                               ├── d
                               └── i
Action: e forks l
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   ├── g
                               │   │   └── j
                               │   │       └── k
                               │   └── h
                               ├── d
                               └── i
Action: j EXITS
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   └── h
                               ├── d
                               ├── i
                               └── k
Action: i forks m
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   └── h
                               ├── d
                               ├── i
                               │   └── m
                               └── k
Action: g forks n
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       └── n
                               │   └── h
                               ├── d
                               ├── i
                               │   └── m
                               └── k
Action: h forks o
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       └── n
                               │   └── h
                               │       └── o
                               ├── d
                               ├── i
                               │   └── m
                               └── k
Action: i forks p
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       └── n
                               │   └── h
                               │       └── o
                               ├── d
                               ├── i
                               │   ├── m
                               │   └── p
                               └── k
Action: g forks q
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       ├── n
                               │   │       └── q
                               │   └── h
                               │       └── o
                               ├── d
                               ├── i
                               │   ├── m
                               │   └── p
                               └── k
Action: a forks r
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       ├── n
                               │   │       └── q
                               │   └── h
                               │       └── o
                               ├── d
                               ├── i
                               │   ├── m
                               │   └── p
                               ├── k
                               └── r
Action: b forks s
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       ├── n
                               │   │       └── q
                               │   ├── h
                               │   │   └── o
                               │   └── s
                               ├── d
                               ├── i
                               │   ├── m
                               │   └── p
                               ├── k
                               └── r
Action: n forks t
                               a
                               ├── b
                               │   ├── c
                               │   │   ├── e
                               │   │   │   └── l
                               │   │   ├── f
                               │   │   └── g
                               │   │       ├── n
                               │   │       │   └── t
                               │   │       └── q
                               │   ├── h
                               │   │   └── o
                               │   └── s
                               ├── d
                               ├── i
                               │   ├── m
                               │   └── p
                               ├── k
                               └── r
# -t is for showing the process tree and not actions
python fork.py -t
Output

ARG seed -1
ARG fork_percentage 0.7
ARG actions 5
ARG action_list 
ARG show_tree True
ARG just_final False
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve False

                           Process Tree:
                               a

Action?
                               a
                               └── b
Action?
                               a
                               ├── b
                               └── c
Action?
                               a
                               ├── b
                               └── c
                                   └── d
Action?
                               a
                               ├── b
                               └── c
Action?
                               a
                               └── c
# -A action list: a+b => a fork b; b- => b exit
python fork.py -A a+b,b+c,c+d,c+e,c- -c

# even though c exits, d & e are running with a as parent

# -R reparents to local parent i.e. b
python fork.py -A a+b,b+c,c+d,c+e,c- -R -c
Output

ARG seed -1
ARG fork_percentage 0.7
ARG actions 5
ARG action_list a+b,b+c,c+d,c+e,c-
ARG show_tree False
ARG just_final False
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve True

                           Process Tree:
                               a

Action: a forks b
                               a
                               └── b
Action: b forks c
                               a
                               └── b
                                   └── c
Action: c forks d
                               a
                               └── b
                                   └── c
                                       └── d
Action: c forks e
                               a
                               └── b
                                   └── c
                                       ├── d
                                       └── e
Action: c EXITS
                               a
                               ├── b
                               ├── d
                               └── e

ARG seed -1
ARG fork_percentage 0.7
ARG actions 5
ARG action_list a+b,b+c,c+d,c+e,c-
ARG show_tree False
ARG just_final False
ARG leaf_only False
ARG local_reparent True
ARG print_style fancy
ARG solve True

                           Process Tree:
                               a

Action: a forks b
                               a
                               └── b
Action: b forks c
                               a
                               └── b
                                   └── c
Action: c forks d
                               a
                               └── b
                                   └── c
                                       └── d
Action: c forks e
                               a
                               └── b
                                   └── c
                                       ├── d
                                       └── e
Action: c EXITS
                               a
                               └── b
                                   ├── d
                                   └── e
# -F skip intermediate steps, show final tree
python fork.py -t -F
Output

ARG seed -1
ARG fork_percentage 0.7
ARG actions 5
ARG action_list 
ARG show_tree True
ARG just_final True
ARG leaf_only False
ARG local_reparent False
ARG print_style fancy
ARG solve False

                           Process Tree:
                               a

Action?
Action?
Action?
Action?
Action?

                        Final Process Tree:
                               a
                               ├── b
                               └── d
                                   └── e

Code Homework

Write a program that calls fork while setting a variable value, what happens to variable when both child and parent change this value?

p1.C
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    int x = 100;
    printf("hello, x is '%d' (pid: %d)\n", x, (int) getpid());
    fflush(stdout);
    int rc = fork();

    x += 5;
    if (rc < 0) { // fork failed
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        x -= 10;
        printf("hello, I am child, x is '%d' (pid: %d)\n", x, (int) getpid());
    } else { // parent goes down this path (main)
        printf("hello, I am parent of %d, x is '%d' (pid: %d)\n", rc, x, (int) getpid());
    }
    return 0;
}
hello, x is '100' (pid: 90319)
hello, I am parent of 90320, x is '105' (pid: 90319)
hello, I am child, x is '95' (pid: 90320)

Write a program that opens a file and then forks. Can child and parent access fd returned by open()? What happens when both write to file concurrently? How to ensure child process prints before parent, can it be done without using wait()?

p2-p3.C
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <assert.h>

int main(int argc, char *argv[]) {
    int fd = open("/tmp/file_ostep", O_WRONLY|O_CREAT|O_TRUNC, S_IRWXU);
    assert(fd > -1);

    printf("hello, fd is '%d' (pid: %d)\n", fd, (int) getpid());
    fflush(stdout);

    int rc = fork();

    if (rc < 0) {
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) {
        int rcw = write(fd, "hello from child\n", 17); // 17 = nbyte
        assert(rcw == 17);
        printf("hello, I am child, rc is '%d', rcw is '%d' (pid: %d)\n",
               rc, rcw, (int) getpid());
    } else {
        // wait(NULL);
        sleep(2); // To ensure child writes first without using `wait()`
        int rcw = write(fd, "goodbye from parent\n", 20); // 20 = nbyte
        assert(rcw == 20);
        printf("hello, I am parent of %d, rcw is '%d', (pid: %d)\n",
               rc, rcw, (int) getpid());
    }
    close(fd);
    return 0;
}
hello, fd is '3' (pid: 18508)
hello, I am child, rc is '0', rcw is '17' (pid: 18509)
hello, I am parent of 18509, rcw is '20', (pid: 18508)
cat /tmp/file_ostep
hello from child
goodbye from parent

Write program that forks, then calls various variants of exec() to run /bin/ls.

p4.C
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    int rc = fork();

    if (rc < 0) {
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) {
        printf("hello, I am child, rc is '%d' (pid: %d)\n", rc, (int) getpid());

        char * args[] = {"ls", "-l", NULL}; // List of pointers to null-terminated strings
        char * env[] = {NULL}; // Environment of exectured process
        execl("/bin/ls", "ls", "-l", (char *) 0);
        execv("/bin/ls", args);
        execle("/bin/ls", "ls", "-l", (char *) 0, env);
        execve("/bin/ls", args, env);
        execlp("ls", "ls", "-l", (char *) 0);
        execvp("ls", args);
    } else {
        printf("hello, I am parent of %d, (pid: %d)\n", rc, (int) getpid());
    }
    return 0;
}
hello, I am parent of 29461, (pid: 29460)
total 352
-rw-r--r--@ 1 IABDCEF  staff  12109  2 Mar 23:35 fork.py
-rw-r--r--@ 1 IABDCEF  staff  14194 19 Feb 01:52 intro.org
-rwx------@ 1 IABDCEF  staff     20  2 Mar 17:59 newfile.txt
-rw-r--r--@ 1 IABDCEF  staff  13351  2 Mar 12:14 process-run.py
-rw-r--r--@ 1 IABDCEF  staff  88866  3 Mar 02:33 process.org
-rwxr-xr-x@ 1 IABDCEF  staff  33896  2 Mar 17:59 redir
-rw-r--r--@ 1 IABDCEF  staff    786  2 Mar 17:59 redir.c