Ok, the title is a tongue-in-cheek, but there's very little thought put into files in most languages. It always feels a bit out of place... except in C. In fact, what you get is usually a worse version of C.
In C, files can be accessed in the same way as memory:
#include <sys/mman.h>
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h> void main() { // Create/open a file containing 1000 unsigned integers // Initialized to all zeros. int len = 1000 * sizeof(uint32_t); int file = open("numbers.u32", O_RDWR | O_CREAT, 0600); ftruncate(file, len); // Map it into memory. uint32_t* numbers = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, file, 0); // Do something: printf("%d\n", numbers[42]); numbers[42] = numbers[42] + 1; // Clean up munmap(numbers, len); close(file);
}
Memory mapping isn't the same as loading a file into memory: It still works if the file doesn't fit in RAM. Data is loaded as needed, so it won't take all day to open a terabyte file.
It works with all datatypes and is automatically cached. This cache is cleared if the system needs memory for something else.
mmap() is actually a OS feature, so many other languages have it. However, it's almost always limited to byte arrays: You have to grab a chunk of data, parse, process and finally serialize it before writing back to the disk. It's nicer then manually calling read() and write(), but not by much.
These languages have all these nice features for manipulating data in memory, but nothing for manipulating data on disk. In memory, you get dynamically sized strings and vectors, enumerated types, objects, etc, etc. On disk, you get... a bunch of bytes.
Considering that most already support custom allocators and the such, adding a better way to access files seems very doable, but no one's actually done it. It's very weird to me that C — a language known for being unergonomic — actually does this the best.
C's implementation isn't even very good: Memory mapping comes with some overhead (page faults, TLB flushes) and C does nothing to handle endianness or errors... but it doesn't take much to beat nothing.
Sure, you might want to do some parsing and validation, but it shouldn't be required every time data leaves the disk. RAM is much smaller then the disk, so it's often impossible to just parse everything into memory.
A lot of files are not untrusted data.
In the case of binary files, parsing is usually redundant. There's no reason code can't directly manipulate the on-disk representation, and for "scratchpad" temporary files, save the data as it exists in RAM. Sure, you wouldn't want to directly manipulate JSON, but there's no reason to do a bunch of work to save some integers.
File manipulation is similarly neglected. The filesystem is the original NoSQL database, but you seldom get more then a wrapper around C's readdir().
This usually results in people running another database, such as SQLite, on top of the filesystem, but relational databases never quite fit your program.
... and SQL integrates even worse than files: On top of having to serialize all your data, you have to write code in a whole separate language just to access it!
Most programmers will use it as a key-value store, and implement their own indexing: creating a bizarre triple nested database.
