NxOS - File system specifications
General design
Structure
Metadata
Each file's metadata is put just before the file's data, on its first page on the flash. It consists in 10 32bits words of data containing the essential information on the file itself.
The first U32 contains basic file information:
| 0-7 | 8-11 | 12-32 |
| 0x42 | Permissions | File size |
The second U32 is currently not used, but may be in the future for some kind of file hierarchy. The last 8 U32 words contains the filename, stored in a packed form (4 chars per U32 word, thus 32 bytes long filenames, including the terminating NUL char).
Defragmentation
To be designed. Note that this is not a showstopper, as we can just return No space left on device where a defragmentation could have freed contiguous space for a file.
A block is a set of contiguous pages full or empty pages. A Hole depicts a set of empty contiguous blocks.
Simple
This algorithm of simple defragmentation will try to move any non contiguous blocks, in the first most appropriate free space available, thus finally ending up with all the free spaces at the end of the flash memory.
Algorithm:
Foreach hole
size of hole;
travel from the end of the hole to the end of the flash
search for the block which is the closest to the size of the hole
if a block has been found then
move the block into the hole
else
move the next contiguous block to fill up the hole
For a file
This second type of defragmentation aims at defragmenting the FS for a given file. The aim of this particular is to defragment the FS, placing the given file at the end of the FS, if possible, allowing the user to write to the file afterwards.
Algorithm:
(1) if the file is the last on the flash then
defrag_simple to the file->origin
move the file to the beginning of the free blocks
(2) else if the file can be moved to the the end of the flash then
move it
apply condtion1;
(3) else if the file is swappable with the last file;
swap it
apply condition1
(4)else
defrag from file to end of flash
try 2 or 3
if not
defrag from begining of flash to the file
retry 2 or 3
else return not doable
Best overall (Swiss cheese technique)
This is more of a fragmentation than an actual defragmentation. The aim is to allow the maximum number of files to be writable, by adding free space after each file.
algorithm:
// get the free pages and the number of pages
nx_fs_get_occupation(U16 *files, U32 *used, U32 *free_pages, U32 *wasted);
mean_space_per_file = free_pages/files;
if mean_space per_file <1
defrag_simple
else
for each file
current_loation =file->origin + nx__fs_get_file_page_count(file->size)
// calculate the amount of free pages after the file
freebloc = nx__fs_find_next_origin(current_location) - current_location
if freebloc > mean_space_per_file
move next file to (current_location+mean_space_per_file)
else if freebloc < mean_space_per_file
move next block to (current_location+mean_space_per_file)
Research
Brick context
| Total RAM | Kernel size in memory | Available RAM | Max desirable footprint |
| 64 kB | ~ 15 kB | ~ 50 kB | 5kB |
Flash specifications
- 1024 pages of 256 bytes each
- 6 ms erase + program time
- 16 lock regions, each controlling 64 pages of the flash
Finding a solution for NxOS
Goals
- "Good enough" wear-leveling
- Direct execution of files must remain possible
- Minimum memory footprint
- No "at-halt-time" operations -> FS must be synchronous
Secondary objectives
- Minimize mount time
- Good crash resistance
Design
The main idea is to get rid of any centralized file information stored on the flash in the favor of a reserved space at the beginning of each file that contains the metadata for this file. Because of the flash read speed, this has no immediate impact on the file access time (for open operations), and has a lot of advantages.
- Since this metadata is only stored at the very beginning of the file, before it's actual data, it does not break the direct execution of the stored files.
- A file's metadata is only changed when needed, without impact on the metadata of the other files. This provides a good overall crash resistance for the file system, as at most one's metadata will not get rewritten correctly or get lost, leading to the lost of only this file and not all the information the FS contains.
- The metadata kept in memory can be therefore restricted to a simple cache of the position of the most frequently used files on the flash. Performance analysis must be performed to determine the real benefit of such a cache, especially since the file system would still perform well without it.
With primary objectives 2 to 4 fulfilled, only the wear-leveling issue remains, which can be addressed with clever file placement through time. Choosing a file origin will happen only at two moments:
- On file creation
- During an internal file move when the file grows
It is during these phases that the file system must issue a clever decision on the file's origin page number to avoid concentrating the flash usage to the lower sections. Given the greatest file origin and the size of this file, it is easy to compute the foremost first available page on the flash and start from here, looping around the flash back to its beginning if required.
Functional overview
Note: it only takes approx. 1ms to read the first U32 of all flash pages.
- Open: when the user requests a file by its name, the flash is walked through, looking for start-of-a-file magic markers (at FLASH_BASE_PTR[page*64] for each needed page), reading files metadata to find the corresponding file, based on the provided file name. Two cases arise:
- File exists: if the file exists, it is found on the flash and the open command returns the file description structure.
- Non-existent file: if file does not exist, it must be created, and although this will not be actually performed until the first page write for this file, its origin page should be determined (see below File relocation for how this is accomplished).
- Read: file reading is done sequentially, byte by byte, using a read buffer. When the end of this buffer is reached, the next page of the flash is loaded into the buffer. End-of-file check is performed here.
- Write: writing to a file is always done in append mode (Note: next versions of the FS should figure out a simple and performant way to support other write modes). Appending to the end of a file may happen in one of the following two situations:
- Free space is available after the file in the flash, and pages are written one after another when needed, using a write buffer (operations are synchronous at the page level, not at the byte level)
- No free space is available except what remains from the last page of the file, and the file must be relocated when we reach the end of this page. A new origin position is found, the file contents are moved, and the write can continue.
- Delete: the metadata of this file is erased from the flash, along with any magic-marker matching page inside this file (to avoid false detections of a file beginning after the file has been removed).
File relocation
Finding the origin page number for a new file or for a candidate to relocation is the most critical part of the design of this filesystem. Indeed, it is up to this process to find intelligent placement for the files on the flash medium to reduce flash damage and simulate a "good-enough" wear-leveling.
This process is critical in the sense that it has to find a trade-off between flash fragmentation and wear-leveling. The file placement algorithm is somewhat similar to Lejos' but with the difference that the space freed by a file relocation is not reused immediately. Indeed, doing this would reduce the first flash blocks lifetime, putting to much stress on (statistically) the first hundreds pages of the flash.
The basic idea is that files are stored linearly on the flash, one after another. When creating a file, the first available page after the last file is used to store the file data, growing up towards the end of the flash, even if free space has been made available before that.
When a file requires more space, it must be relocated before being written to. The file is moved to the end of the data on the flash, just like creating a new file. Of course, if the end of the flash does not have enough free space, the file system will look for free space from the beginning of the flash that can accomodate the file. If this still fails, a defragmentation must happen (see below).
Example:
| F1.1 | F1.2 | F1.3 | F2.1 | F2.2 | F2.3 | F2.4 | F3.1 | F3.2 | F3.3 | F3.4 | F3.5 | F3.6 | F3.7 | | | | | | |
In this flash setup, three files have been written one after another, of sizes 3, 4 and 7 pages each. The user opens file 2 and writes data that requires a new page to be allocated for this file. File 2 must thus be relocated. Free space (at least 5 pages) must be found in the flash, which happen to be found at the end of the flash. The data of the file is moved after file 3 and new data is appended (F2.5). The old pages (3-6) are now made available.
| F1.1 | F1.2 | F1.3 | | | | | F3.1 | F3.2 | F3.3 | F3.4 | F3.5 | F3.6 | F3.7 | F2.1 | F2.2 | F2.3 | F2.4 | F2.5 | |
File 1 now has more space available in front of him for appending some data. Now, let's consider this situation:
| F1.1 | F1.2 | F1.3 | F1.4 | | | | F3.1 | F3.2 | F3.3 | F3.4 | F3.5 | F3.6 | F3.7 | F2.1 | F2.2 | F2.3 | F2.4 | F2.5 | |
The user wants to append data to file 3. It has to be relocated. Not enough space is available from the end of flash, obviously, and looking at the beginning of the flash does not give us a matching spot as well. But it's still not required to defragment the flash : in fact, because the file requesting space is number 3, the space it uses on the flash can be leveraged if the data of file 3 is moved :
| F1.1 | F1.2 | F1.3 | F1.4 | F3.1 | F3.2 | F3.3 | F3.4 | F3.5 | F3.6 | F3.7 | F3.8 | | | F2.1 | F2.2 | F2.3 | F2.4 | F2.5 | |
State of the art
Lejos File System
- Paradigms
- Tries to write files at the first available page, moving the whole file if space is lacking in front of it
- The file table is stored on the last page of the flash
- Advantages
- As files are stored contiguously, direct execution is possible
- Drawbacks
- Since the last page is rewritten on every file change, it is used more than the other pages, almost to frequently.
Sprite LFS
- Paradigm: log and segment structures
- 65-75% of the disk can be used for writing, the rest for cleaning
- buffering changes in the file cache and then write all the changes on the disk sequentially in a single write operation
- problem: retrieve the information from the logs and how to manage free space on the disk to let large extends of free space available for new data
- comparing to UNIX FFS, more compact writing and one sequential large write than several small writes + inode map for Sprite LFS
- the read operation remains the same
- reorganize log for compacting live data and freeing up large empty space, at the next writing, this data will be copied in a more compact way and then new contiguous files will be possible
- Advantages
- efficient writing
- Drawbacks
- logs stored in the cache memory
- no direct execution
EFS-like log structured file system
- Paradigms
- Metadata at the beginning of each page:
- In-file page number?
- Next page number
- Dirty bit
- Tick increment
- Files are not stored contiguously
- Try to use all the empty blocs before to rewrite on a "dirty" one again, rotating around the flash space
- Metadata at the beginning of each page:
- Advantages
- The flash memory is equally used, wear-levelling is achieved
- Drawbacks
- Direct execution is not possible anymore, because file data is not linear
