oak = builder.build();
```
## API
### OakMap Methods
OakMap's API implements the ConcurrentNavigableMap interface. For improved performance, it offers additional non-standard zero-copy API methods that are discussed below.
You are welcome to take a look at the OakMap's [full API](https://github.com/yahoo/Oak/wiki/Full-API).
For a more comprehensive code example please refer to the [usage](#usage) section.
### Oak Buffers
Oak uses dedicated buffer objects to access off-heap memory.
These buffers cannot be instantiated by the user and are always supplied to the user by Oak.
Their interfaces are:
Buffer Access Usage
------------------------ ---------- ------------------------------
OakBuffer read-only base class for all the buffers
├── OakScopedReadBuffer read-only attached to a specific scope
├── OakScopedWriteBuffer read/write attached to a specific scope
└── OakUnscopedBuffer read-only can be used in any scope
These buffers may represent either a key or a value.
They mimic the standard interface of Java's `ByteBuffer`, for example, `int getInt(int index)`, `char getChar(int index)`, `capacity()`, etc.
The scoped buffers (`OakScopedReadBuffer` and `OakScopedWriteBuffer`) are attached to the scope of the callback method they were first introduced to the user. The behavior of these buffers outside their attached scope is undefined.
Such a callback method might be the application's serializer and comparator, or a lambda function that can read/store/update the data.
This access reduces unnecessary copies and deserialization of the underlying data.
In their intended context, the user does not need to worry about concurrent accesses and memory management.
Using these buffers outside their intended context may yield unpredicted results, e.g., reading non-consistent data and/or irrelevant data.
The un-scoped buffer (`OakUnscopedBuffer`) is detached from any specific scope, i.e., it may be stored for future use.
The zero-copy methods of `OakMap` return this buffer to avoid copying the data and instead the user can access the underlying memory buffer directly (lazy evaluation).
While the scoped buffers' data accesses are synchronized, when using `OakUnscopedBuffer`, the same memory might be access by concurrent update operations.
Thus, the reader may encounter different values -- and even value deletions -- when accessing `OakUnscopedBuffer` multiple times.
Specifically, when trying to access a deleted mapping via an `OakUnscopedBuffer`, `ConcurrentModificationException` will be thrown.
This is of course normal behavior for a concurrent map that avoids copying.
To allow complex, multi-value atomic operations on the data, `OakUnscopedBuffer` provides a `transform()` method that allows the user to apply a transformation function atomically on a read-only, scoped version of the buffer (`OakScopedReadBuffer`).
See the [Data Retrieval](#data-retrieval) for more information.
For performance and backward compatibility with applications that are already based on the use of `ByteBuffer`, Oak's buffers also implement a dedicated unsafe interface `OakUnsafeDirectBuffer`.
This interface allows high-performance access to the underlying data of Oak.
To achieve that, it sacrifices safety, so it should be used only if you know what you are doing.
Misuse of this interface might result in corrupted data, a crash or a deadlock.
Specifically, the developer should be concerned with two issues:
1. _Concurrency_: using this interface inside the context of `serialize()`, `compute(), `compare()` and `transform()` is thread-safe.
In other contexts (e.g., `get()` output), the developer should ensure that there is no concurrent access to this data. Failing to ensure that might result in corrupted data.
2. _Data boundaries_: when using this interface, Oak will not alert the developer regarding any out of boundary access.
Thus, the developer should use `getOffset()` and `getLength()` to obtain the data boundaries and carefully access the data. Writing data out of these boundaries might result in corrupted data, a crash, or a deadlock.
To use this interface, the developer should cast Oak's buffer (`OakScopedReadBuffer` or `OakScopedWriteBuffer`) to this interface,
similarly to how Java's internal DirectBuffer is used. For example:
```java
int foo(OakScopedReadBuffer b) {
OakUnsafeDirectBuffer ub = (OakUnsafeDirectBuffer) b;
ByteBuffer bb = ub.getByteBuffer();
return bb.getInt(ub.getOffset());
}
```
*Note 1*: in the above example, the following will throw a `ReadOnlyBufferException` because the buffer mode is read-only:
```java
bb.putInt(ub.getOffset(), someInteger);
```
*Note 2*: the user should never change the buffer's state, namely the position and limit (`bb.limit(i)` or `bb.position(i)`).
Changing the buffer's state will make some data inaccessible to the user in the future.
### Data Retrieval
1. For best performance of data retrieval, `OakMap` supplies a `ZeroCopyMap` interface of the map:
`ZeroCopyMap zc()`
The `ZeroCopyMap` interface provides the following four methods for data retrieval, whose result is presented as an `OakUnscopedBuffer`:
- `OakUnscopedBuffer get(K key)`
- `Collection values()`
- `Set> entrySet()`
- `Set keySet()`
- `Set keyStreamSet()`
- `Collection valuesStream()`
- `Set> entryStreamSet()`
Note that in addition to the `ConcurrentNavigableMap` style sets, we introduce a new type of stream sets.
When a stream-set is iterated it gives a "stream" view of the elements, meaning only one element can be observed at a time.
It is preferred to use the stream iterators when possible as they instantiate significantly fewer objects, which improve performance.
2. Without `ZeroCopyMap`, `OakMap`'s data can be directly retrieved via the following four methods:
- `V get(Object key)`
- `Collection values()`
- `Set> entrySet()`
- `NavigableSet keySet()`
However, these direct methods return keys and/or values as Objects by applying deserialization (copy). This is costly, and we strongly advise to use `ZeroCopyMap` to operate directly on the internal data representation.
3. For examples of direct data manipulations, please refer to the [usage](#usage) section.
### Data Ingestion
1. Data can be ingested via the standard `ConcurrentNavigableMap` API.
2. For improved performance, data can be also ingested and updated via the following five methods provided by the `ZeroCopyMap` interface:
- `void put(K key, V value)`
- `boolean putIfAbsent(K key, V value)`
- `void remove(K key)`
- `boolean computeIfPresent(K key, Consumer computer)`
- `boolean putIfAbsentComputeIfPresent(K key, V value, Consumer computer)`
3. In contrast to the `ConcurrentNavigableMap` API, the zero-copy method `void put(K key, V value)` does not return the value previously associated with the key, if key existed. Likewise, `void remove(K key)` does not return a boolean indicating whether key was actually deleted, if key existed.
4. `boolean computeIfPresent(K key, Consumer computer)` gets the user-defined computer function. The computer is invoked in case the key exists.
The computer is provided with a mutable `OakScopedWriteBuffer`, representing the serialized value associated with the key. The computer's effect is atomic, meaning either all updates are seen by concurrent readers, or none are.
The `compute()` functionality offers the `OakMap` user an efficient zero-copy update-in-place, which allows `OakMap` users to focus on business logic without dealing with the hard problems that data layout and concurrency control present.
5. Additionally, `OakMap` supports an atomic `boolean putIfAbsentComputeIfPresent(K key, V value, Consumer computer)` interface, (which is not part of `ConcurrentNavigableMap`).
This API looks for a key. If the key does not exist, it adds a new Serialized key --> Serialized value mapping. Otherwise, the value associated with the key is updated with `computer(old value)`. This interface works concurrently with other updates and requires only one search traversal. This interface returns true if a new key was added, false otherwise.
## Memory Management
As explained above, when constructing off-heap `OakMap`, the memory capacity (per `OakMap` instance) needs to be specified. `OakMap` allocates the off-heap memory with the requested capacity at construction time, and later manages this memory.
This memory (the entire given capacity) needs to be released later, thus `OakMap` implements `AutoClosable`. Be sure to use it within try-statement or better invoke `OakMap.close()` method when `OakMap` is no longer in use.
Please pay attention that multiple Oak sub-maps can reference the same underlying memory of `OakMap`. The memory will be released only when the last of those sub-maps are closed.
However, note that each sub-map is in particular an `OakMap` and thus `AutoCloseable` and needs to be closed (explicitly or implicitly). Again, `close()` can be invoked on different objects referring to the same underlying memory, but the final release will happen only once.
## Usage
An Integer to Integer build example can be seen in [Code Examples](https://github.com/yahoo/Oak/wiki/Code-Examples). Here we illustrate individual operations.
### Code Examples
We show some examples of Oak `ZeroCopyMap` interface usage below. These examples assume `OakMap oak` is defined and constructed as described in the [Builder](#builder) section.
##### Simple Put and Get
```java
oak.put(10,100);
Integer i = oak.get(10);
```
##### Remove
```java
oak.zc().remove(11);
```
##### Get OakUnscopedBuffer
```java
OakUnscopedBuffer buffer = oak.zc().get(10);
if(buffer != null) {
try {
int get = buffer.getInt(0);
} catch (ConcurrentModificationException e) {
}
}
```
##### Scan & Copy
```java
Integer[] targetBuffer = new Integer[oak.size()]; // might not be correct with multiple threads
Iterator iter = oak.values().iterator();
int i = 0;
while (iter.hasNext()) {
targetBuffer[i++] = iter.next();
}
```
##### Compute
```java
Consumer func = buf -> {
Integer cnt = buf.getInt(0); // read integer from position 0
buf.putInt(0, (cnt+1)); // accumulate counter, position back to 0
};
oak.zc().computeIfPresent(10, func);
```
##### Conditional Compute
```java
Consumer func = buf -> {
if (buf.getInt(0) == 0) { // check integer at position 0
buf.putInt(1); // position in the buffer is promoted
buf.putInt(1);
}
};
oak.zc().computeIfPresent(10, func);
```
##### Simple Iterator
```java
Iterator iterator = oak.keySet().iterator();
while (iter.hasNext()) {
Integer i = iter.next();
}
```
##### Simple Descending Iterator
```java
try (OakMap oakDesc = oak.descendingMap()) {
Iterator> iter = oakDesc.entrySet().iterator();
while (iter.hasNext()) {
Map.Entry e = iter.next();
}
}
```
##### Simple Range Iterator
```java
Integer from = (Integer)4;
Integer to = (Integer)6;
try (OakMap sub = oak.subMap(from, false, to, true)) {
Iterator iter = sub.values().iterator();
while (iter.hasNext()) {
Integer i = iter.next();
}
}
```
##### Transformations
```java
Function intToStrings = e -> String.valueOf(e.getInt(0));
Iterator iter = oak.zc().values().stream().map(v -> v.transform(intToStrings)).iterator();
while (iter.hasNext()) {
String s = iter.next();
}
```
##### Unsafe buffer access
```java
Function intToStringsDirect = b -> {
OakUnsafeDirectBuffer ub = (OakUnsafeDirectBuffer) b;
ByteBuffer bb = ub.getByteBuffer();
return bb.getInt(ub.getOffset());
};
Iterator iter = oak.zc().values().stream().map(v -> v.transform(intToStringsDirect)).iterator();
while (iter.hasNext()) {
String s = iter.next();
}
```
##### Unsafe direct buffer access (address)
Oak support accessing its keys/values using direct memory address.
`DirectUtils` can be used to access the memory address data.
```java
Function intToStringsDirect = b -> {
OakUnsafeDirectBuffer ub = (OakUnsafeDirectBuffer) b;
return DirectUtils.getInt(ub.getAddress());
};
Iterator iter = oak.zc().values().stream().map(v -> v.transform(intToStringsDirect)).iterator();
while (iter.hasNext()) {
String s = iter.next();
}
```
Note: in the above example, the following will not throw any exception even if the buffer mode is read-only:
```java
DirectUtils.putInt(ub.getAddress(), someInteger);
```
## Contribute
Please refer to the [contributing file](./CONTRIBUTING.md) for information about how to get involved. We welcome issues, questions, and pull requests.
## License
This project is licensed under the terms of the [Apache 2.0](LICENSE-Apache-2.0) open source license.