yongk's dev space: 리눅스 커널 심층 분석

프로세스

task_struct는 대략 1.8k정도 됨.
kernel은 C0000000부터 FFFFFFFF까지 3G ~ 4G를 사용함.
프로세스는 고전적으로 스레드 한 개 였지만 멀티 코어 환경에서는 여러 개의 스레드로 구성됨
스레드는 조금 특별한 형태의 프로세스임. 자세한 내용은 아래 '리눅스의 thread 구현' 참고
프로세스는 작동 중인 프로그램 및 그와 관련된 자원을 뜻함.
스케쥴의 기본 단위기도 함.
같은 프로그램을 실행하는 둘 이상의 프로세스가 존재할 수 있음. 물론 thread도 여럿 있을 수 있음.
fork() - 기존 프로세스를 복사해서 새 프로세스를 만듬.

고전적인 process 개념에 lightweight process라는 개념이 생김
fork()시 task_struct 전부를 복사하는 것이 아니라 fs, memory address 등을 공유함.

다음 예에서 보듯이 fork()를 하면 부모를 복사하도록 되어있다.

fork() --> swi(인터럽트) --> PC0x08로 인터럽트 발생 --> swi 번호에서 syscall 번호로 syscall table에서 찾으니 sys_fork()임 --> do_fork() 실행
<<01 .myfork.c="">>
exit.c>do_exit() - 프로그램 종료. 자세한 설명은 아래 '프로세스 종료' 참고
중요 process
- Process 0:
  - swapper process, start_kernel()이 초기에 만드는 kernel thread임.
  - process 1 생성이 끝나면 지속적으로 hlt를 호출하고 interrupt enable 함.
  - TASK_RUNNING에 다른 process가 없으면 scheduling 됨.
- Process 1:
  - init process, start_kernel()이 인터럽트 enable 한 후 만드는 kernel thread.
  - process 0과 kernel data structure 공유함.
- 기타
  - eventd: qt_context 큐에 있는 task 실행
  - kapm: Advanced Power Manager (APM)과 관련된 event 처리
  - kswapd: 메모리 회수
  - kflushd(or bdflush): 버퍼에 쌓여있는 것을 disk로 처리하고 메모리 회수
  - kupdated: 버퍼에 쌓여있는 오래된 것을 disk로 이동하여 fs 불일치 위험성을 줄임
  - ksoftirqd: tasklet 실행, 각 CPU당 thread 하나씩 있음

Struct task_struct

<asm/thread_info.h> Struct task_struct는 slab allocator를 이용해서 메모리를 할당함.
2.6 전에는 task_struct를 kernel stack의 끝 부분에 저장했으나,
이후에는 slab allocator로 동적으로 만들고 대신 thread_info를 스택 및바닥(스택이 아래쪽으로 확장되는 경우), 또는 꼭대기(스택이 윗쪽으로 확장되는 경우)에 둠.
thread_info가 필요한 이유는 process가 자주 접근하는 정보를 넣어놓으면 쉽게 접근할 수 있기 때문에 task_struct에 들어가는 일부 정보를 넣어놓고 빠르게 접근하기 위함임.
- thread_info는 kernel 영역에 딱 한 개 있으며 bss, text, data, lib, stack 들이 쌓여있는 구조에서 stack의 및바닥에 위치함.
- user의 task_struct 마다 thread_info가 있는데 물론 가상 메모리 영역에 있으니 각 process마다 thread_info가 따로따로 있겠지?? --> 물어볼 것!!!!

task->need_resched:
- 언제 재 스케줄이 필요한지 알려줌
- 프로세스를 선점할 필요가 있을 때 scheduler_tick() 또는 깨어난 프로세스가 현재 프로세스보다 우선순위가 높을 때 try_to_wake_up() 함수를 통해 설정됨.
- 커널이 사용자 공간으로 돌아가는 순간 need_resched 플래그를 보고 schedule()을 호출하면 사용자가 선점한다.
아부지와 아들
- 멤버인 struct task_struct *parent, *children은 부모/자식 task_struct를 가리킴.
- p_opptr
  
  original parent
  
  p_pptr
  
  parent
  
  p_cptr
  
  child
  
  p_ysptr
  
  young sibling
  
  p_osptr
  
  older sibling
부모 process를 얻는 방법

struct task_struct *my_parent = current->parent;

자식 process를 얻는 방법

struct task_struct *task;

struct list_head *list;

list_for_each(list, &current->children) {

task = list_entry(list, struct task_struct, sibling); //container_of와 동일함

//환형으로 task에 children이 들어가며, 여기서 task가 자식 task임.

}

List_head에서 process 찾기

list_entry(task->tasks.next, struct task_struct, tasks); //다음 task 얻기

또는

next_task(task);

list_entry(task->tasks.prev, struct task_struct, tasks); //이전 task 얻기

또는

prev_task(task);

전체 task에처 찾기

for_each_process(task) {

printk("%s[%d]\n", task->comm, task->pid);

} //for_each_task와 동일함.

PID로 task_struct를 찾으려면 26,800개를 찾아야 되는데 비효율 적이다. pidhash를 사용하면 빠르게 찾을 수 있음
- thread_info->pidhash_pprev, thread_info->pidhash_pnext에 정보가 있고
- hash_pid() / unhash_pid() : hash table 추가 / 삭제
- find_task_by_pid(pid) 하면 됨

processor descriptor: struct thread_info

kernel stack은 8k로 2 page로 구성되어 있음.
각 task의 struct thread_info는 프로세스 커널 스택의 제일 끝부분에 할당된다.
멤버인 struct task_struct *task가 task_struct를 가리킴. 커널 내부에서 task 접근시 사용함.
Struct thread_info 주소: Current 매크로(current_thread_info()) 를 이용해서 스택 포인터의 하위 13비트를 0으로 덮9어쓰는 방식으로 thread_info 위치를 계산함.
어떤 주소에서도 addr &= ~0x1fff를 하면 current를 반환함. 또는 다음 assembly도 동일함.

movl $0xffffe000, %ecx andl $esp,%ecx movl $ecx, p(요기에 current 저장됨)

cf. esp는 stack pointer임

Struct task_struct 주소: Current_thread_info()->task
PID(pid_t)를 이용해서 process를 구분함. Pid의 최대값은 시스템에 동시에 존재할 수 있는 최대 프로세스 수임
PID 최대 값: 32,767
- $ cat /proc/sys/kernel/pid_max
- 넘어가면 1

Struct thread_info->status: process의 상태

다음 다섯 개 중 하나임
- TASK_RUNNING
- TASK_INTERRUPTIBLE: 시그널 받으면 조건이 발생하지 않아도 깨어날 수 있음.
- TASK_UNINTERRUPTIBLE: 시그널 무시
  - HW가 못깨우는 것이라는 말은 아님. HW interrupt는 어떤 경우라도 발생함!
  - kernel에서 요청하는 sw interrupt를 무시한다는 것을 말함. 즉 signal을 허용하지 않음.
- __TASK_TRACED
- __TASK_STOPPED
상태 변경: set_task_state(task, state); | set_current_state(state);
TASK_RUNNING process list
- TASK_RUNNING일 때 전부 running중은 아니고 runqueue에 들어가있는 놈은 모두 TASK_RUNNING임. 즉 runnable이라고 생각하면 좋음.
- 그 중 하나만 진짜 running 중임
- runqueue에 struct list_head run_list에서 관리하고 있음.
  - add_to_runqueue() - wake_up_process()가 호출함.
  - del_from_runqueue()
  - move_first_runqueue()
  - move_last_runqueue()
  - task_on_runqueue()
sleep인 process는 wait queue에 들어가있다가 event가 발생하면 runqueue에 간다. 자세한 설명은 생략!
- wake_queue_head_t
  - DECLARE_WAITQUEUE(): 정적으로 생성
  - init_waitqueue_head(): 동적으로 생성
task->rlim: resoure limits

Process context

Process context: 실행 파일이 코드를 읽어서 일반적으로 사용자 공간에서 실행되지만,
시스템 호출/예외 처리 인터페이스를 사용하면 커널 공간에서 실행됨. 이러한 경우를 말함. Kernel이 프로세스를 대신해서 실행한다고도 함. Kernel이 작업을 끝내면 사용자 공간에서 실행을 계속함.
Current 매크로 사용 가능

Process creation

fork()와 exec()로 구성되어 있음
- fork(): 현재 task를 복제해 자식 process를 만듬. 부모 프로세스의 page table을 복사하고, 자식 process용 task_struct를 만들어 주는 역할을 함.
- exec(): 새로운 실행파일을 주소 공간에 불러오고 실행함.
fork()를 할 경우 항상 메모리 복사하는 것이 아니라 copy-on-write(변경 사항이 있을 때 복사)함.
call stack: fork() --> clone()(system call) --> do_fork() --> copy_process()

리눅스의 thread 구현 --> pthread_create(), pthread에 p는 POSIX를 뜻함.

thread는 cow와는 관계 없음 ㅋㅋ
$ man pthread_create
<<01 .hellothread.c="">>
다른 architecture와 다르게 thread를 process로 취급하며, 자원을 공유하도록 구현되어 있음. 예를 들면, 프로그램 하나가 4개 thread를 돌린다면 자원만 공유 하는 task_struct 4개의 process로 동작한다. PID도 각각 다름.

thread creation	clone(CLONE_VM \| CLONE_FS \| CLONE_FILES \| CLONE_SIGHAND, 0);
normal fork()	clone(SIGCHILD, 0); //SIGCHILD는 parent가 죽으면 child에 신호 보냄

PID는 각 thread process가 다르지만 task->tgid는 동일함.
- 처음 만들어지는 process의 task->pid 값은 다음부터 만드는 task에 task->tgid(thread_group ID)에 저장되고 getpid() 하면 task->tgid가 return 됨.
에 플래그 정의 되어 있음.
위의 특성 때문에 thread도 다른 process들과 섞여서 쉽게 schedule 할 수 있음.
동일한 process의 thread는 다른 pid 값을 가지나, thread_group을 이용해서 getpid()를 호출하면 current->pid 대신 current->tgid를 return함.
윈도우나 솔라리스는 thread를 별도로 지원함.
자원을 공유하도록이란 말은, fork() 하면 child가 .data, .bss, .text 모두 복사하지만, thread는 메모리를 공유한다.
thread의 parent가 종료되면 플래그에 따라 thread도 종료가 되지만, pthread_exit()을 하면 pthread가 모두 종료될 때까지 기다리고 프로세스가 종료됨.
thread는 context switching이 빠른 process임.
한 process의 thread끼리는 서로 data를 공유할 수 있지만 process간에는 기본적으로는 불가능하다. 이를 가능하게 해주는 것이 IPC임.
위의 그림에서 stack은 thread별로 생기지만 힙, bss, data, text는 새로 생기지 않는다! 이게 핵심!

kernel thread --> kthread_create()

user 영역을 지속적으로 모니터링 하는 목적으로 app이 요청하여 수동적으로 동작하는 것이 아닌 kernel 자체의 thread임.
$ ps -ef 해보면 [xxxxx/0:1] 이런식으로 나오는 것이 kernel thread임. 여기서 xxxxx는 이름이고, 0은 kthread 번호이고 1은 process 번호임??????
kernel 공간에서만 존재하는 thread. 사용자 공간으로 context 전환이 없음.
task_struct->mm이 NULL임.
kernel thread 확인: ps -el
'kthreadd' process가 에 다음 인터페이스를 이용해서 모든 kernel thread를 만드는 방식으로 관리함.
kthread_create()를 하고 wake_up_process()를 하거나
아니면 그냥 kthread_run()을 하면 됨
kthread는 thread 자신이 do_exit()를 호출하거나 또는 kthread_stop() 함수로 종료됨.
만약 insmod로 kthread가 돌고 있는데 rmmod 해버리면 thread가 종료가 되지 않을 수 있으므로 kthread_should_stop()을 이용해서 '나 죽어야 되?'임을 확인하는 루틴을 넣어서 module이 종료되었는지 확인 해야 함.

프로세스 종료

의 do_exit()
task->flags & PF_EXITING --> kernel timer 제거 --> BSD 정보 기록 --> exit_mm() --> exit_sem() --> exit_files() --> exit_fs() --> task->exit_code = 종료코드 --> exit_notify()로 부모 process에 알림 --> schedule() 함수를 호출해 새로운 프로세스로 전환. --> 완료.
종료된 process는 thread_info, task_info, kernel stack은 부모 process가 필요한 정보가 있어서 zombie 상태로 남아 있지만, 부모 process가 release_task()를 호출하면 나머지도 반환하고 땡!
- zombie process로 남아있는 방법은 wait()을 이용함.
- release_task()가 put_task_struct()를 호출하면 kernel stack과 thread_info가 들어있던 page를 반환하고 thread_info가 들어 있던 slab cache도 반환한다.
만약 부모가 먼저 종료되어 자식 process가 floating 하고 있다면,
- find_new_reaper()함수가 부모를 찾아줌!
- 해당 process가 속한 thread군의 다른 process를 부모로 삼거나
- 불가능 할 때는 init process를 부모로 지정
- do_exit() --> exit_notify() --> forget_original_parent() --> find_new_repair()

<<참고>>

fork()

exec 계열 함수들을 하면 fork()하여 process를 하나 만들고 실행한다. 그러면 새로운 process가 메모리 복사하여 사용함. parent의 정보를 다시 접근할 방법은 없음.
fork() 하면 cow(copy on write)기법을 이용해서 구현하여, fork()하는 시점에 메모리를 복사하지 않고 기다림. read 해도 복사하지 않고 기다림. 다만 write를 하면 그 때 메모리를 복사함.

yongk's dev space

가장 많이 본 글

2014년 10월 6일 월요일

리눅스 커널 심층 분석 - 3장, 프로세스 관리

댓글 없음:

p_opptr	original parent
p_pptr	parent
p_cptr	child
p_ysptr	young sibling
p_osptr	older sibling