Efficient Software-Based Fault Isolation

Myoungsoo Jung

Problems, Solutions and Summaries:

The key idea to isolate fault by software is very simple, and allows us to archive the efficient way to make fault isolation cheap enough. Begin with this paper, Robert Wahbe at al. figure out the problem of the traditional fault isolation method. When we use such scheme (i.e., hardware-based fault isolation), high performance cost is necessary because preventing the code in one address space from corrupting the contents of another induces prohibitive context switch overhead and needs some additional operation such as trapping, copying arguments, saving and loading relative register, and flush look-aside buffer. To overcome this challenge, the authors provide a software approach, which implemented within a single address space. It grants a separated fault domain to load code and data for a distrusted module and modify object code to prevent faults from writing and jumping to address outside such fault domain.

To archive mentioned goal of this paper, the authors propose the software encapsulation transforming distrusted module not to escape its fault domain – fault domain consists of two segments, one for distrusted module's code, and another for its static data, heap and stack. The software encapsulation contains of two kinds of key mechanism to pinpoint the actual location of fault within a module and isolate distrust module. One is called, segment matching preventing the use of illegal addresses. In segment matching, insert checking code before every unsafe instruction that jumps to or store to statically unverified address within the correct segment. If the check fails, such code traps to a system error routine outside the distributed modules' fault domain.

Another key mechanism of software encapsulation is address sandboxing. Sanding boxing indicate inserted code that sets the upper bits of the target address to the correct segment identifier before distrusted instructions. As with segment matching, earlier mentioned, unsafe store or jump instruction can be modified to use dedicate register, and it guarantee that distrusted module code cannot produce an illegal address. There are two instruction for providing sandboxing, one to clear segment-id and store the result in a dedicate register, the other to set segment id for the correct value.

Remaining issues for supporting efficient software encapsulation are optimization, how to prevent corrupting process resources and to access among domains when they need data sharing. In the view of optimization, the authors provide the way to reduce overhead, which is induced from computing target address. Basically, instruction of RISC has register address and offset. However, sandboxing mechanism just use only register addresses or numbers and handles offset by creating guard zone, which indicate unmapped area. In process resources problem, the authors require distrusted modules for accessing resource through cross-fault-domain RPC. For instance, if a distrusted module's object code performs a direct system call, the authors transform this call into the appropriate RPC call. In last, I talk about data sharing. Because segment encapsulation doesn't alter load instruction, fault domains can read any memory mapped in the application's address space. However, each domain cannot share data among them. So the authors provide lazy pointer swizzling that alias the shared regions into multiple locations in the virtual address space by modifying the hardware page table.

Critiques:

Although this paper strives for reducing the cost of fault isolation by using not hardware but software, I afraid some kinds of possibility that its mechanism can break pipeline. Traditionally, many scientists who involved in computer architecture realm take effort to reduce the number of broken pipeline because it induces the terrible performance degradation. As software encapsulation, we should insert code for support segment matching and sandboxing, which incur to break pipeline. So the authors should provide clearer evidence that proposed approach cannot hurt breaking pipeline. Secondly, I don't find realistic solution for adding "correct" value which used for implementing segment matching. If the actual location where occupied with loadable distrusted modules is fixed then we can easily find segment-id during a compile time, but if not, segment-id is unable to be recognized when compile time. For this reason, the authors also should refine the correct value and provide the way to find segment-id without any operating system's load modules mechanism during a compile time.

Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines

Myoungsoo Jung

 

Problems, Solutions and Summaries:

This paper proposes the way to reuse unmodified device driver. At the same time, Joshua Le Vasseur et al. point out how to improve system dependability by using virtual machine. Actually full reuse for a driver base has been considered unachievable. It's because if we want to introduce device driver into new Operating System, then elusive problems occur such as unavailable code, undocumented feature, and extent of programming errors. The authors suggest pragmatic approach for it and strong isolation of legacy device drivers by guaranteeing that semantics a preserved and incompatibilities are limited using virtual machine multiplexing. Also this scheme allows system suffer from minimal resource overhead by collaborating VMs.

Traditionally, for experience driver reuse, the author divides approaches into two parts. First one is a co-hosting method that indicates binary driver reuse. Another one is transplanting driver that enables to enjoy independence. However the previous approach can interfere between VM and driver OS because they run with all privileges. Also, the last method has problem to successfully reuse device driver because they has to use glue code, and it raise conflicts. When system tries to reuse device driver, the authors propose three considerations such as semantic resource conflicts that represent accidental denial of service, sharing conflicts that are induced since fair address space sharing may be unachievable. Finally, in the view of an engineering effort, donor OS knowledge significantly required writing glue and it is even difficult.

In this paper, drivers are closely knit to kernel, applications are not. In addition to that, orthogonal drivers should be based on following principles: 1. Resource delegation by receiving only bulk resources. 2. Separation of name spaces that indicate each driver has its own address space. 3. Separation of privilege by executing driver in unprivileged mode. In last the author claims secure isolation and common API as basic principle to satisfy orthogonal device driver requirements. In architectural view, to reuse and isolate each device driver, system executes it and its native OS within VM called by DD/OS. This approach is same effect that each deriver can be executed in separate VMs and it allows systems exploit simultaneous use of driver from incompatible OS-s. To communicate each deriver located in separated VMs, translation module support abstractions and control them by adding to DD/OS to interface with client.

The authors also claim that low overhead communication by using message notification and request completion. In addition to it, to bolster their reuse approach, they insist low overhead memory sharing registering memory areas of a VM into another VM's physical memory space. Besides those claims, the authors consider virtualization issues such as DMA operation, additional requirements regarding resources, and special timing. In DMA operation case, there are problem that DD/OS can perform DMA to physical memory not allowed by memory protection system. For this reason, it uses DMA to replace hypervisor code and data. I will discuss regarding these considerations in next critique section to argue for details. This research contributes unmodified reuse, a strong isolation, and fault containment to existing device driver.

 

Critiques:

In DMA and trust section of this paper, I wonder if where actual memory locations that cannot be reclaimed are when this approach uses DMA operation. I think that the author have to solve this ambiguity. In addition to it, the author claims that client must not use pinned memory because of DD/OS until faulted it complete rebooting process. However, I am not confident this mechanism is right. For example if DMA completes before the faulted VM restart then system fall down to ambiguous status, and I wonder if VM fail to reboot their system then what can be incur. If systems need a bunch of drivers to provide one functionality like PCI device then I have question to decide whether the proposed approach still use distributed device drivers to deal with one functionality or not. With this reason, to bolster their claims, the author should provide tradeoff between performance and isolation advantage. More crucial problem is resource consumption. The authors solve this problem by using swapping memory method. To evict it, It considers which one is cold page or not. However, most device drivers are managed as block box. So, the author should suggest the way to recognize cold and hot page even if each device driver is encapsulated, and hide their own information.

Scheduling I/O in Virtual Machine Monitors

Myoungsoo Jung

 

Problems, Solutions and Summaries:

Deigo Ongaro et al. evaluate that the impact of VMM scheduler on performance using multiple guest domains concurrently running different types of applications. To concrete it, the authors add numerous schedulers to Xen. Such evaluation reveals the relationship between domain scheduling and I/O performance in a Virtual Machine Monitor(VMM), and provide the major problems in VMM's I/O scheduling. In other words, this study of Deigo Ongaro et al. is contributed by identifying I/O performance degradation, overcoming unfairness in I/O sensitive domains, and removing side effects from scheduler tickling from the Xen scheduler.

The authors figure out that traditional way to fairly scheduling results in poor and unpredictable I/O performance when applications intending to latency and bandwidth sensitive are performed by VMM schedulers. Basically, the Xen exploit default scheduler as the Credit scheduler which uses a credit/debit system to fairly share process' resources. To bolster the facts those I/O degradations, the authors show experimental results that its scheduler achieve mixed performance depending on the particular configurations with bandwidth-intensive and latency sensitive applications. In addition to it, the author also evaluate impact of Simple Earliest Deadline First(SEDF) scheduler. Its scheduler's performance is similar with Credit scheduler even if it can fairly share process' resources when plays with CPU-sensitive applications.

Turn to problems of Xen's scheduling. First of all, latency is induced by scheduling other domains. It's because event is used to communicate virtual interrupt as well as inter-domain. Especially, in the network interface card(NIC) case, there are at least two chances to run other domains to deliver a network packet, and it results in latency. More serious problem is unfairness in I/O performance. In other words, the scheduler is fairly sharing the process resources only by approximations. Modern trend is exploiting completely fair scheduler. For this reason, if low response latency is desired then such scheduler policy may not be effective. Also Credit scheduler doesn't provide fair latency because it doesn't consider when each domain should receive its allocated fraction of processor resources. Finally, the scheduler's problem occurs if it tickles too frequently. It results in preempting domain in execution is too early.

To minimize above problems, this paper provides scheduling optimizations such as boosting idle domains, re-ordering the run queue according to credits, and adjusting to tickle the scheduler by understanding I/O domain requirements. Such requirements largely divide into three parts. First one is low latency, and lasts are high bandwidth and independent of other domains' workloads respectively. By introducing boosts I/O domain into VM scheduling, latency sensitive applications have chance to try to perform their I/O more. In other view, reordering the run-queue by reinserting I/O domains into near the heard of run-queue can resolve unfairness when Xen has the heavy workload scenario. In addition to that, the authors try to minimize preemption. Actual meaning of it is that tickling is disabled altogether for avoiding to preempt the driver domain, In last, fixing event channel notifications by exploiting 2-level hierarchy of bit vectors can guarantees that no port will be processed a second time before all other pending ports have been processed once.

Critiques:

As all known, Xen has suffered from low I/O performance. I believe that these approaches are very helpful to have different views when designers of the Xen's architecture see the scheduling. However, I think it is better that the authors should provide more classifications to exploiting boost status. Since the boost method also has possible that effective is negated when they have many I/O domains. What we can we expect from when run-queue same number of UNDER domain with number of boost domain. Moreover, I believe that it is insufficient for I/O domain under heavy loads. In reordering the run-queue case, how can we compensate the low utilization of CPU when exploiting combinations of boost with ordered run-queue? If the author provide these considerations, then optimization approach regarding scheduler is more useful to enhance I/O performance of the Xen.

Microsoftware Nov. 2009

from For craft/Chat 2009/10/24 13:34

NT Virtual Memory Manger 섹션의 두번째 컬럼인 NT의 가상 주소 변환 (Virtual Address Translation with considering MMU and TLB)의 초안을 제출 하였습니다. 본 칼럼은 아래와 같은 내용을 다루고 있습니다.

1. 가상 주소 번역
2. 32비트 가상 주소를 위한 자료구조 설계
3. 공유 메모리와 메모리 맵드 파일 (Shared Memory and Memory Mapped File)
4. 프로토타입 페이지 테이블(Prototype Page Table)
5. 페이지 테이블 설계에 있어서의 고려사항
6. 섹션과 뷰

컬럼을 읽으시기전에 가상 메모리에 대한 컨셉을 잘 모르시는 분은 아래 자료들을 이해 하시는 것을 권장해 드립니다.

2008/11/16 - [Fundamental Notes/Operating Systems] - Virtual Memory#1, Introduce Memory Managements
2008/11/16 - [Fundamental Notes/Operating Systems] - Virtual Memory #2, Paging
2008/11/16 - [Fundamental Notes/Operating Systems] - Virtual Memory #3, Demand Paging and Page Tables
2008/11/16 - [Fundamental Notes/Operating Systems] - Virtual Memory #4, Cache Replacement Polcies
2008/11/16 - [Fundamental Notes/Operating Systems] - Virtual Memory #5, Working Set Model


저작자 표시 비영리 변경 금지

'For craft > Chat' 카테고리의 다른 글

Microsoftware Nov. 2009  (0) 2009/10/24
L4 and micro kernel history.  (2) 2009/10/20
Everything will be fine.  (0) 2009/10/14
Oct, Draft about VMM  (0) 2009/09/02
A next research topic  (4) 2009/07/12
Extended NT Cache Interface and communication issues  (0) 2009/05/17

The realization of drawbacks in design and performance of the first-generation Mach microkernel led a number of developers to re-examine the entire microkernel concept in the mid-1990s. The asynchronous in-kernel-buffering process communication concept used in Mach turned out to be one of the main reasons for its poor performance. This induced some of the Mach developers to put some time-critical components, like file systems or drivers, back inside the kernel, which of course, conflicted with the minimality concept of a true microkernel.

Detailed analysis of the Mach bottleneck indicated that among other things its working set is too big: there are too many cache misses and most of these are in the kernel. In other words, the code locality is poor. This raised an idea that the efficient microkernel should actually be small enough to fit the majority of critical sections into the instructions cache.


L4가 나오기 까지 IPC 오버헤드와 working set size가 문제를 해결 하고자 여러가지 이슈들이 적용 되었다는 것이 재미있다. 어셈블러고 전부 짜 내려갔다는 것은 학술적으로 별로 의미가 있지 않을 것 같고, working set size를 개선하기 위해서 L4 패밀리들이 어떤 것 들을 했는 지 알면 많은 도움이 될텐데..

최근에 느끼는 것이지만, 정말 문제는 아는 것 까지만 보인다. 문제가 아는 것 까지만 보이니, 개선 사항도 딱 아는 것 까지만 보이는 것 같고, 학술적으로 Contribution 할 수 있는 것도 딱 아는 것 까지만 할 수 있는 것 같아서, 입맛에 맞는 것들에만 집중력을 보이는 내 단점을 내년에는 크게 개선 해야 할 것 같다.

저작자 표시 비영리 변경 금지

'For craft > Chat' 카테고리의 다른 글

Microsoftware Nov. 2009  (0) 2009/10/24
L4 and micro kernel history.  (2) 2009/10/20
Everything will be fine.  (0) 2009/10/14
Oct, Draft about VMM  (0) 2009/09/02
A next research topic  (4) 2009/07/12
Extended NT Cache Interface and communication issues  (0) 2009/05/17

올해 세번째 논문을 쓰고 있는데, 시뮬레이션 결과가 예상치랑 너무 다르다.
일정에 치이면서 생각하는건, 아무래도 일정에 급해져서 주사위를 던져 결정하게 되는 회사일이랑 별로 다른 것이 없는 것 같다. (모든 회사일이 그런 것은 아니지만..) 그냥 소모적이다.
딱 올해까지만 소모적인 방법으로 문제를 해결하는 접근하는 것에 대해서 허락해야 할텐데...
이런 건, 내게 있어 좀 더 능력을 향상 시키기 발전 할 수 있는 기회를 좀 먹어 버리는 것 같다.
저작자 표시 비영리 변경 금지

'For craft > Chat' 카테고리의 다른 글

Microsoftware Nov. 2009  (0) 2009/10/24
L4 and micro kernel history.  (2) 2009/10/20
Everything will be fine.  (0) 2009/10/14
Oct, Draft about VMM  (0) 2009/09/02
A next research topic  (4) 2009/07/12
Extended NT Cache Interface and communication issues  (0) 2009/05/17