Introduction to File:
File is a collection of related data. The data stored in the computer is in the form of file. Different types of data needs different types of files to store it. Data could take different forms. It could be number, characters, images or even some symbol. A very important role played by operating systems management of file system & memory.
File name has got two parts i.e. name & its extension. An extension is used to identify the application in which file is created. Following are the different types of files with the different extension:
File.ppt= Power Point Presentation, File.pps= Presentation File, File.doc= Microsoft word file
File.pdf= Portable Document Format File, File.c= C program, etc.
- File Attributes:
When a file is named, it becomes independent of the process, the user, and even the system that created it. For instance, one user might create the file example.c, and another user might edit that file by specifying its name. A file’s attributes vary from one OS to another but typically consist of these:
- Name: It is a string of characters which is in human readable form.
- Identifier: This unique tag, usually a number, identifies the file within the file system; it is the non-human-readable name for the file.
- Type: This information is needed system that support different types of the files.
- Location: This information is a pointer to a device and to the location of the file on that device.
- Size: The current size of the file (in bytes, words, or blocks) and possibly the maximum allowed size are included in this attribute.
- Protection: Access-control information determines who can do reading, writing, executing, and so on.
- Time, date, and user identification: This information may be kept for creation, last modification, and last use. These data can be useful for protection, security & usage monitoring.
Operation on Files:
Files can be implemented on several different devices like tape, disks, drums & other more unusual, storage devices. A file is a logical entity which represents a named piece of information. A file is mapped onto a physical device. Following are the operation on Files:
- Creating File: Two steps are necessary to create a File. First, space in the file system must be found for the file & second an entry for new file must be made in directory.
- Writing File: To write a file, we make a system call specifying both the name of the file & the information to be written to the file. Given the name of file, the system searches the directory to find the file location. The system must keep a write pointer to the location in the file where the next write is to take place. The write pointer must be updated whenever a writes occurs.
- Reading a File: To read from file, we use a system call that specifies the name of the file & where the next block of file should be put. Again, the directory is searched for associated entry & the system needs to keep a read pointer to the location in the file where the next read is to take place. Once the read has taken place, the read pointer is updated.
- Repositioning Within a File: To access the file contents, it is necessary to specify the location of data. A system call seek is used that reposition the file pointer to a specific location in the file. By executing this call, data can be read or written to that position.
- Deleting a File: To delete a file, we search the directory for the named the named file. Then it is deleted by using the system call. This deletion deletes the disk space.
- Truncate a File: File truncation is used to erase the contents of a file by keeping its attributes as it is. That is, file truncation avoids deletion & recreation of a file be keeping the file attributes. Only the size attribute of a file is changed by file truncation.
- Rename: It is used to set the new name to file.
File Access Methods:
File is a collection of related information. The information in the file can be accessed in several ways. Most programming languages provide two different ways to access data stored in a file i.e. information. When it is used, this information must be accessed and read into computer memory. Following is the way to access the file:
- Sequential Access
- Direct Access
- Sequential Access Method:
The simplest access method is sequential access. Information in the file is processed in order, one record after the other. The most common operation that are done on the file are read & write. Read operation read the file & pointer will automatically go head. In write operation, the file pointer will automatically go to the end of the file & then contents added by the user or system will be added at the end of the file.
If you want to read a piece of data that is stored at the very end of the file, you have to read all of the data that comes before it-you cannot jump directly to the desired data. This is similar to the way cassette tape players work. If you want to listen to the last song on a cassette tape, you have to either fast-forward over all of the songs that come before it or listen to them. There is no way to jump directly to a specific song.
- Direct Access Method:
A file is made up of fixed-length logical records that allow programs to read and write records rapidly in no particular order. Thus, we may read block 14, then read block 53, and then write block 7. There are no restrictions on the order of reading or writing for a direct-access file.
The direct-access method is based on a disk model of a file, since disks allow random access to any file Block. Direct-access files are of great use for immediate access to large amounts of information. Databases are often of this type. For the direct-access method, the file operations must be modified to include the block number as a parameter.
The block number provided by the user to the OS is normally a relative block number.
- A relative block number is an index relative to the beginning of the file.
- Thus, the first relative block of the file is 0, the next is 1, and so on, even though the actual absolute disk address of the block may be 14703 for the first block and 3192 for the second.
The use of relative block numbers allows the OS to decide where the file should be placed (called the allocation problem) and helps to prevent the user from accessing portions of the file system that may not be part of her file.
When you work with a direct access file (which is also known as a random access file), you can jump directly to any piece of data in the file without reading the data that comes before it. This is similar to the way a CD player or an MP3 player works. You can jump directly to any song that you want to listen to. Sequential access files are easy to work with, and you can use them to gain an understanding of basic file operations.
Swapping is a simple memory/process management technique used by the operating system (os) to increase the utilization of the processor by moving some blocked process from the main memory to the secondary memory (hard disk); thus forming a queue of temporarily suspended process and the execution continues with the newly arrived process. After performing the swapping process, the operating system has two options in selecting a process for execution.
Assume a multi-programming environment with Round-Robin CPU scheduling algorithm. When a quantum expires memory manager will start to swap out the process that just finished, and swap in another price to the memory space that has been freed. In the meantime, CPU scheduler will allocate a time slice to some other process in memory. When each process finishes its quantum it will be swapped back with another process. Ideally, memory manager can swap process fast enough so that there is always process in memory, ready to execute, when CPU scheduler wants to reschedule the CPU. The quantum must also be sufficiently large that reasonable amounts of computing are done between swaps. Swapping can be implemented in various ways. For example, swapping can be priority based.
Fig: Swapping of two processes using a disk as a Backing Store
- File System Implementation Strategies:
File structure is consisting of Logical Storage unit & it is collection of related information. File system manages the files & directories in an Operating System. File system is stored on secondary storage devices like disks. File system actually organized into different layer. The files are organized using system implementation method. There are several file system implementation strategies as follows:
- Contiguous Allocation
- Linked Allocation
- Indexed Allocation
- Contiguous Allocation:
Fig: Contiguous Allocation
- Linked Memory Allocation: (Dynamic Memory Allocation)
Fig: Linked Allocation
- Indexed Memory Allocation:
Fig: Indexed Allocation
Some system has millions of files stored on terabytes of disks. To search the required file easily, it is necessary to manage the number of files stored on the disks. To manage these files we need to organize the files. The most common schemes for representing the directory structures are:
- Single-Level Directory
- Two-Level Directory
- Tree-Level Directory
Eg. C:\Windows\System32\system.in à Drive/Root\Parent Directory/ Subdirectory\Current Directory\File.
- Single-Level Directory:
Single level directory structure is a simplest structure. All files are contained in same directory, which is easy to support and understand.
In a single level directory structure all files are stored in the same directory; all files are stored in the same directory. So, a unique name must be assigned to each file of the directory. If two different users want to store their data files & they have used the same name for the storage of file, then unique name rule is violated. Advantage of this structure less time is required to search the file, because all files are in the same directory.
Fig: Single-Level Directory
Limitations: 1) When the number of files increases or when the system has more than one user. Since all files in same directory, they must have unique names. 2) To find it difficult to remember the names of all the files as the number of files increases. 3) Grouping problem. 4) Files are limited in length.
- Two-Level Directory:
Separate directory for each user each user is assigned is known as User file directory (UFD). All these directories have similar structure but each directory contains only the files of a single user. The index of all UFDs is stored in the Master File Directory. When a user job starts or a user logs in the system’s master file directory (MFD) is searched.
Fig: Two-Level Directory
- i) Path name.
- ii) Can have the same file name for different user.
iii) Efficient searching
Disadvantages: No grouping capability
- Tree-Structured Directory:
A tree is the most common directory structure. The tree has a root directory, and every file system has a unique path name.
Fig: Tree-Structured Directory
A directory contains a set of files or sub-directories. In this structure, the directory has a special one bit entry to define whether it is a file or subdirectory. If the entry is 0, it represents a file & 1 represents a subdirectory. To create & delete the directories, special system calls are used.
Advantages: 1) Efficient searching. 2) Grouping Capability.
Disadvantages: Search time may become unnecessary long.
- File Protection:
Protection of file is very important issue in an operating system. Protection should be provided against physical as well as in case of other damages. That is, reliability & protection must be maintained for the information in a computer system.
Some access methods are used to access the files & directories. These are providing by an operating system. Again it is necessary to control this access. Following are the operations, which are controlled by the Operating System:
- Read: This reads the data from the file.
- Write: This operation writes the data to the file.
- Execute: This allows file to get load in the memory & executes it.
- Append: Append is used to add new information at the end of the file.
- Delete: Used to delete the file.
- List: This operation is using full to see attributes of the file & copying, moving, renaming, editing, etc.
Following are the three most popular implementations of file protection or file security are as follows:
- File Naming: It is important that no two users try to create the file with the same name. And at the time of giving name to the file some rule are followed. Eg. The file name generally starts with alphabets instead of numbers, etc.
- Password: This scheme associates a password to each file. If a user does not know the password associated to a file then it cannot access it. This is very effective way for protecting a file from unauthorized user access.
- Access Control: An access list is associated to each file or directory. The access list contains information on the type of users & accesses that they can do on a directory or file.
Basic Memory Partitioning:
Another method used for memory management is called as memory partitioning. This method divides the memory into several small parts. These parts are called as Partitions. A new magnetic disk is a blank state. It is just a platter of a magnetic recording material. Before disk can store a data, it must be divided into sectors that the disk controller can read & write. i.e. need to create partitions. This process is called as low-level formatting or physical formation & before it can use a disk to hold files, the operating system still need one more formatting & is called as logical or high level formatting. In this formatting the operating system stores the initial file system data structure onto the disk.
Advantages of Partitioning:
- Useful for multiprogramming.
- It is easy to access data from partitions.
- Easy to access the partitions.
- Partitions can be dynamic or fixed. This provides flexibility.
Mainly two types of methods are used to make partitions in the memory:
- Static Memory Partitions (Fixed Partitions)
- Dynamic Memory Partitions (Variable Partitions)
Static Memory Partitions (Fixed Partitions):
When memory is dividing into a number of fixed sized partitions, it is called as a static partitioning. Each partition contains exactly one process. If the process assigned to a memory is less in size than the partition size, then remaining space cannot be used by any other process.
Eg. Partition 1 is of size 200 k. And input queue is shown, which is a ready queue, waiting for memory allocation. As the partition is get free the next job or process is selected.
- Partition size is fixed & can be easily accessible for the processes those matches with the partition size.
- It is good partition scheme if the operating system is not very complicated.
- The main drawback of fixed partition is memory wastage.
Dynamic Memory Allocation: (Variable Partitioning)
The memory partitioning done as per the requirement of process length or size is called as dynamic memory partitioning. When a process arrives in a ready queue the OS checks for the available enough size partitions.
If the partition available is very large then operating system divides that partition into parts. Then process is kept in appropriate part of memory & remaining partition of memory is included in the partition of availability list. Following example shows Dynamic Memory Allocation:
- Partitions are made as per the process length.
- Memory wastage is very less.
- Maximum part of memory is utilized.
- The main drawback is external fragmentation. i.e. some small partitions remains unutilized. To overcome this problem of external fragmentation is compaction.
- That is all free memory space is compacted together to form a new block or partition for utilization.
Compaction shuffles the free memory locations together in one large block. Compaction is not always possible. Following figure shows the compaction. In this figure job 3,4,5 are arrange using compaction & free memory blocks of 10k, 30k, & 36 k are clubbed together & one large block of 76k is formed.
There are three methods used for searching available partition from set of available memory partitions:
- First-fit: Allocate the first hole that is big enough There may be many holes in the memory, so the operating system, to reduce the amount of time it spends analyzing the available spaces, begins at the start of primary memory and allocates memory from the first hole it encounters large enough to satisfy the request. Using the same example as above, first fit will allocate 12KB of the 14KB block to the process.
- Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size Produces the smallest leftover hole. The allocator places a process in the smallest block of unallocated memory in which it will fit. For example, suppose a process requests 12KB of memory and the memory manager currently has a list of unallocated blocks of 6KB, 14KB, 19KB, 11KB, and 13KB blocks. The best-fit strategy will allocate 12KB of the 13KB block to the process.
- Worst-fit: Allocate the largest hole; must also search entire list Produces the largest leftover hole. The memory manager places a process in the largest block of unallocated memory available. The idea is that this placement will create the largest hold after the allocations, thus increasing the possibility that compared to best fit, another process can use the remaining space. Using the same example as above, worst fit will allocate 12KB of the 19KB block to the process, leaving a 7KB block for future use. First-fit and best-fit better than worst-fit in terms of speed and storage utilization.
Free Space Management:
When memory allocation takes place dynamically, it is the responsibility of an operating system to manage it properly. In free space management techniques, two methods are used;
- Memory Management with bitmap
- Memory Management with linked list
- Memory Management with Bitmap:
Frequently, the free-space list is implemented as bit map or bit vector. Each block is represented by 1 bit. If the block is free, the bit is 1: if the block is allocated, the bit is 0.
For example; consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11,12,13,17, 18, 25, 26 are free, and the rest of the block are allocated. The free-space bit map would be 001111001111110001100000011100000…
The main advantage of this approach is that it is relatively simple and efficient to find the first free block, or n consecutive free block on the disk. Indeed many computers supply bit-manipulation instructions that can be used effectively for that purpose.
Memory Management with linked list:
In Linked list, all free space disks are linked, keeping a pointer to the first free block in a special location on a disk and caching it in memory. This method has negligible space overhead because there is no need for a disk allocation table, merely for a pointer to the beginning of the chain and the length of the first portion. This method is suited to all the file allocation methods. The first block contains a pointer to the next free disks block & so on.
Consider last example: Blocks 2, 3, 4, 5, 8, 9, 10, 11,12,13,17, 18, 25, 26 are free & rest of the blocks are allocated. In this situation, we would keep a pointer to block 2 as the first free block. Block 2 contain a pointer to block 3, which would point to block 4, which would point to block 5, which would point to block 8 & so on. This scheme is not efficient; to traverse the list, we must read each block, which requires substantial I/O time.
In computer system, when programs are larger than the available memory, problem may occur. This problem may solved by virtual memory. If your computer lacks the random access memory (RAM) needed to run a program or operation. When RAM runs low, virtual memory moves data from RAM to a space called paging file. Moving data to & from the paging file frees up RAM to complete its work.
Virtual memory involves the separation of logical memory perceived by the users from physical memory. This separation allows an extremely large virtual memory to be provided for programmers when only a smaller physical memory is available. Virtual memory makes the task of programming much easier, because the programmer no longer needs to worry about the amount of physical memory available. The basic idea behind virtual memory is that each program has its own address space. This address space is divided into number of units called as pages.
Fig: Virtual Memory
Segmentation is a memory management scheme that supports the user’s view of memory. Segmentation is method of dividing memory in to different segments. Memory management scheme supports user view of memory. A program is a collection of segments. A segment is a logical unit such as: main program, procedure, function, method, object, local variables, global variables, common block, tack, symbol table, arrays.
- Segmentation provides protection & sharing of code & data.
- Algorithm is relative easy to implement.
Disadvantages: Difficult to maintain segment table.
Paging refers to transfer the memory pages from physical memory to disk when they are needed. Paging is a memory management technique in which the memory is divided into fixed size pages. Paging is used for faster access to data. When a program needs a page, it is available in the main memory as the OS copies a certain number of pages from your storage device to main memory. Paging allows the physical address space of a process to be noncontiguous. OS performs an operation for storing and retrieving data from secondary storage devices for use in main memory. Paging is one of such memory management scheme. Data is retrieved from storage media by OS, in the same sized blocks called as pages. Paging allows the physical address space of the process to be non-contiguous. The whole program had to fit into storage contiguously. Divide physical memory into fixed-sized blocks called Frames. Divide logical memory into blocks of same size called Pages. Keep track of all free frames. To run a program of size n pages, need to find n free frames and load program. Set up a page table to translate logical to physical addresses.
Address generated by CPU is divided into: Page number (p) – used as an index into a page table which contains base address of each page in physical memory. Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit.
- Demand Paging:
Paging refers to the transfer of memory pages from the physical memory to disk. Virtual memory uses a technique called demand paging for its implementation. When we want to execute a process, it is swapped into the memory. However, a pager (lazy swapper) does not bring the whole process into the memory. Only those pages, which are needed, are brought into the memory. That is, bring a page into memory only when it is needed. Demand paging has the many benefits such as
- Less I/O needed
- Less memory needed
- Faster response
- More users
Logical address space of a process can be noncontiguous; process is allocated physical memory whenever that memory is available and the program needs it. Divide physical memory into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 8192 bytes).
Fig: Demand Paging
- Page Table:
Page table is used to map virtual pages onto page frames. In system physical memory is broken into fixed sized block called as frames & logical memory is broken into blocks of same size called as pages. When a process is to be executed, its pages are loaded into any available memory frames from the backing store. The backing store is divided into fixed size block that are the same size as the memory frames. Every address generated by CPU has two parts: a page number (higher bits) & page offset (lower bits). A page number is used as an index in to page table.
- Page Fault:
When the page (data) requested by a program is not available in the memory, it is called as a page fault. Page fault occurs when the number of frames on memory is less than the number of pages on logical memory the problem occurs to manage the number of pages. This problem is called as page fault.
- Page Replacement Algorithm:
Page replacement algorithm is a basic to demand paging. It completely separates logical memory & physical memory. With this algorithm, a very large memory can be provided for programmers on a smaller physical memory. This algorithm selects a frame that has to be replaced.
The different page-replacement algorithms are:
- FIFO (First In First Out)
- Optimal Page Replacement
- LRU (Least Recently Used Page Replacement)
- NRU (Not Recently Used Page Replacement)
FIFO (First In First Out):
This is the simplest technique for page replacement. When page is to be replaced, the oldest page is to be chosen. For this FIFO strategy queue is created to hold all pages in the memory. The page which comes first in the queue is taken first for replacement.
E.g. consider the reference string given below & three initially empty frames are available.
4 5 6 7 4 5 8 4 5 6 7 8
- Easy to understand & execute.
- It is not very effective
- System needs to keep track of each frame.
- Sometimes it behaves abnormally. This behavior is called Belady’s anomaly: For some page replacement algorithms the page fault rate may increases as the number of allocated frames increase.
- Bad replacement choice increases the page fault rate and slow process execution.
Compare between Paging & Segmentation.
|1) Paging divides the computer’s primary memory into fixed-size units called page frames, and the program’s address space into pages of the same size.||1) Segmentation is the only memory management technique that does not provide the user’s program with a ‘linear and contiguous address space’.|
|2) The hardware memory management unit maps pages to frames.||2) Segments are areas of memory that usually correspond to a logical grouping of information such as a code procedure or a data array.|
|3) The physical memory can be allocated on a page basis while the address space appears contiguous.||3) Segments require hardware support in the form of a segment table which usually contains the physical address of the segment in memory, its size, and other data such as access protection bits and status.|
|4) Pages are used for swapping or managing memory.||4) Small pieces called segments are used for memory management.|
|5) Page is indicated by its number and offset.||5) Segment is indicated by segment number and its offset.|
|6) Page table is formed.||6) Segment table is formed.|
|7) Do not support user’s view of memory.||7) Supports user’s view of memory.|
Optimal Page Replacement Algorithm:
In this algorithm, the page that will not be used for the longest period of time is replaced. This algorithm has the lowest page fault rate of all algorithms.
- Lowest page fault rate.
- Twice as good as FIFO page replacement algorithm.
- This algorithm is difficult to implement.
- It requires future knowledge of reference string.
String: 4 5 6 7 4 5 8 4 5 6 7 8
- Least Recently Used (LRU):
In this algorithm, the page that has not been used from the longest period of time is replaced. This algorithm associates with each page the time of last use of the page. LRU uses the page that has not been used for the longest period of time for replacement.
String: 4 5 6 7 4 5 8 4 5 6 7 8