This project implements a basic yet powerful distributed log system, which serves as a foundation for reliable, high-performance data storage and retrieval. It manages data through structured segments, indexes, and a high-level log abstraction.
The goal is to design a log that supports a distributed architecture, enabling seamless scaling, fault tolerance, and high availability.
This log system has been designed from the ground up to handle data persistence, reading, and management efficiently. It uses a layered approach with the following components:
- Record: The raw data entries stored in the log
- Store: Manages the storage file where records are written and read
- Index: Tracks offsets and positions for efficient record lookup
- Segment: Wraps the store and index, serving as a unit of storage
- Log: The primary interface that manages segments, appending, reading, and more
Each of these layers is built with the intent to simplify distributed log architecture by organizing data into manageable segments, facilitating data durability and quick retrieval.
Throughout this system, specific terms are used for clarity:
- Record: A data entry in the log
- Store: The file where records are saved
- Index: The file where index entries for records are saved
- Segment: The unit combining both a store and an index
- Log: The overarching system managing multiple segments
internal/log/
: Contains the primary log packagestore.go
: Manages the underlying storageindex.go
: Provides index functionalitysegment.go
: Manages segments, combining stores and indexeslog.go
: The main log that ties all segments together
git clone <repository-url>
cd <repository-directory>
This project is written in Go. Ensure that you have Go installed and then fetch the dependencies:
go mod download
Ensure the functionality of each component with:
go test ./internal/log
This log system exposes methods for basic operations like appending and reading records, truncating old data, and handling log segments. Here's a quick guide:
To add data, use the Append method on the log. This method writes data to the current active segment and updates the index.
log, err := NewLog("<directory-path>", config)
if err != nil {
log.Fatalf("failed to create log: %v", err)
}
offset, err := log.Append(&api.Record{Value: []byte("your data")})
if err != nil {
log.Fatalf("failed to append record: %v", err)
}
Retrieve records using the Read method, which accesses records by their offset.
record, err := log.Read(offset)
if err != nil {
log.Fatalf("failed to read record: %v", err)
}
fmt.Println("Record value:", string(record.Value))
To remove old data and free up space, use the Truncate method to delete segments with offsets below a specified threshold.
err := log.Truncate(lowestOffset)
if err != nil {
log.Fatalf("failed to truncate log: %v", err)
}
Comprehensive tests ensure each component (store, index, segment, and log) operates as expected. Run all tests to validate the setup:
go test ./internal/log
The tests cover:
- Appending and reading records
- Handling offsets and out-of-range errors
- Segment initialization and restoration
- Log truncation and cleanup
This log system is designed to grow into a distributed architecture. Future goals include:
- Distributed Log Replication: Implementing consensus algorithms for fault-tolerant data replication across nodes
- Snapshot and Restore: Adding capabilities for snapshotting the log state and restoring from snapshots
- Optimized Indexing: Enhancing indexing strategies to improve retrieval times for high-volume records
- Graceful and Ungraceful Shutdown Handling: Expanding functionality to handle recovery from crashes and data corruption