j.ohnson.com

Building a Text File Database Part 2: Domain Concerns

After figuring out how to read data asynchronously from my reading list I worked on separating domain concerns from file access.

After some refactors my main method for querying the file looks like this:

    // FileDb.java
    public Stream<Book> stream(Criteria criteria) {
        ByteBuffer bufferReader = ByteBuffer.allocate(recordLength);
        long fileSize;
        try {
            fileSize = fileChannel.size();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

        return Stream.iterate(
                        recordLength,
                        n -> n < fileSize,
                        n -> n + recordLength)
                .map(beginningOfRecord -> {
                    bufferReader.clear();
                    Future<Integer> op = fileChannel.read(bufferReader, beginningOfRecord);
                    try {
                        op.get();
                    } catch (InterruptedException | ExecutionException e) {
                        throw new RuntimeException(e);
                    }
                    return bufferReader;
                })
                .map(buffer -> Arrays
                        .stream(new String(buffer.array()).split("\\|"))
                        .map(String::trim)
                        .toList())
                .filter(record -> {
                    if (criteria.equals.isEmpty()) {
                        return true;
                    }

                    return criteria.equals
                            .stream()
                            .map(eq -> record.get(columnIndex.get(eq.getKey())).equals(eq.getValue()))
                            .findFirst()
                            .orElseThrow();
                })
                .filter(record -> {
                    if (criteria.isEmpty.isEmpty()) {
                        return true;
                    }

                    return criteria
                            .isEmpty
                            .stream()
                            .map(columnName -> !record.get(columnIndex.get(columnName)).isEmpty())
                            .findFirst()
                            .orElseThrow();
                })
                .map(record -> new Book(
                        record.get(columnIndex.get("id")),
                        record.get(columnIndex.get("title")),
                        record.get(columnIndex.get("author"))));
    }

Straight forward but I want to get the Book concerns separated, at which point this class could pretty generically handle any file in the format outlined. I started with requiring a "mapping function" as a constructor argument. The first pass took in a line-as-array record and columnIndex so that a string could be passed (the field name) and mapped via those two structures. But this exposed too many concerns that someone doing object construction would not know how to handle without opening the class and figuring it out. For the second pass I used a function that took a string for the field name and returned the field value as a string. Not perfect but good enough:

target = new BookRepository(
        new FileDb<>(
                tempFile,
                mapper -> new Book(mapper.apply("id"), mapper.apply("title"), mapper.apply("author"))
        )
);

Querying and mapping now looks like:

    public Book findById(String id) {
        return fileDb
                .stream(new Criteria().andEquals("id", id))
                .map(record -> this.mappingFunction.apply(columnName -> record.get(columnIndex.get(columnName))));
                .findFirst()
                .orElseThrow(() -> new BookNotFoundException("Book not found for id '%s'".formatted(id)));
    }

So far this provided a clean separation of concerns but there were two things that bugged me:

  1. The mapping function would have to be constructed and provided at least twice. Once for the FileDb instance for the actual application and second time for the test. In practice though, this was something that could be worked around and was not as big of a deal as:
  2. Two different parts of the code now knew about the columns. The BookRepository knew about the column names for the books file which makes sense, but now the configuration container also knew about them. Not a deal breaker but something to keep an eye on as a change in column name would require attention in two places.

I decided to let these go for now while I tackled the next item on my list: writing data! I hadn't faced that yet as I'd been doing TDD based on my initial use cases: what books am I currently reading, what books are next up, how many books total are on the list.

For writing I'd need to map back from a Book to some internal representation that could be serialized. I played with the idea for a bit.

The first solution leaked more internals by needing to be aware of field order. The second solution was more palatable but would require the class to know even more about fields at instantiation time and handle more logic like date formatting which could end up missed by tests leaving more places for bugs to hide. And while I hadn't given a ton of thought to it yet I was starting to have concerns about updating records.

Playing with use cases for updating

I had deliberately put off the problem of updating fields as that seemed like a more complex problem than adding new records to the end of the file but I wanted to work through some use cases as an exercise in flushing out potential sticky spots with the way I was handling mapping of domain objects. First, I realized that I wouldn't use a BookRepository in real life and renamed it to BookShelf.

In practice, "adding a book I want to read" and "starting a book" happened in two steps. Books were typically added when I came across them or someone recommended them and they wouldn't be started until a days, weeks, or months later. The "classic" approach:

    @Test
    public void startBook() {
        Book book = bookShelf.wantToRead("The Man in the High Castle", "Philip K. Dick");
        bookShelf.start(book, Instant.now());
    }

I don't like this API because it looks like I can go to the bookshelf to start reading with any old Book e.g. bookShelf.start(new Book("The Man in the High Castle", "Philip K. Dick") but in reality the book MUST be an instance that originally came from the bookShelf (I could allow this through the API and just create the new instance along with the start date but that's not a use case that exists for me. The chances of me learning about the existence of a book and starting it immediately are almost non-existent so I don't want to support that code). And in reality I don't think of myself as going to the bookshelf to start a book, I go there to add a book I "want to read" and later to find that book again:

    @Test
    public void startBook() {
        bookShelf.wantToRead("The Man in the High Castle", "Philip K. Dick");
        // Time passes...
        Book book = bookShelf.findByTitleAndAuthor("The Man in the High Castle", "Philip K. Dick")
            .orElseThrow();
        bookShelf.start(book, Instant.now());
    }

Fleshing the use case out has made the awkwardness of the API more apparent; I go to the bookshelf to add a book to read and then go back to the bookshelf to find it and then back again to start it. How about this: In practice I think of myself as starting the book:

    @Test
    public void startBook() {
        bookShelf.wantToRead("The Man in the High Castle", "Philip K. Dick");
        // Time passes...
        Book book = bookShelf.findByTitleAndAuthor("The Man in the High Castle", "Philip K. Dick")
            .orElseThrow();
        bookShelf.start(book, Instant.now());
        book.start(Instant.now());
    }

I like how that reads. I go to the bookShelf to add a Book I want to read, later I go back to find the Book, then I start the Book. But I see my plans for how I'm mapping data from the file to Books getting more complex. Getting this to work will require either a proxy object (harder to do with my generic approach) or some function provided to the constructor that says what to do with start() (also complex and leaking more details to deal with at instantiation time).

I think I've jumped ahead without sufficient reason to make such a generalized solution. The complexity is more than I'm willing to bear at this point. I decide to have the API return a Record object to hold each row, with each field wrapped in an Item. I also create a Column to store the column definitions when the file is opened in the constructor. This allows the Item to have a reference to it since it will probably need to know its own size at some point. Even this is getting a little ahead but when I get to updating fields I'll need to know the position of each Item within the Record and it's easy enough to get for now. I extract a method for reading one record into a buffer given a position in the file:

    private Record readRecord(ByteBuffer bufferReader, long beginningOfRecord) {
        bufferReader.clear();
        Future<Integer> op = fileChannel.read(bufferReader, beginningOfRecord);
        try {
            op.get();
        } catch (InterruptedException | ExecutionException e) {
            throw new RuntimeException(e);
        }

        String rowAsString = new String(bufferReader.array());

        return new Record(field -> {
            Column column = null;
            int startPosition = 0;
            for (int i = 0; i < columns.size(); i++) {
                Column c = columns.get(i);
                if (c.name.equals(field)) {
                    column = c;
                    break;
                }

                startPosition = startPosition + c.size + 1;
            }

            return new Item(
                    Optional.of(rowAsString.substring(startPosition, startPosition + column.size).trim())
                            .filter(Predicate.not(String::isEmpty))
                            .orElse(null),
                    beginningOfRecord + startPosition,
                    column
            );
        },
                beginningOfRecord);
    }

Maybe I'll figure out a more declarative solution later but this gets the job done. I like how this solution lazy loads the field, leaving the design open to optimize in the future. The mapping is now done by the BookShelf:

    public Book findById(String id) {
        return fileDb
                .stream(new Criteria().andEquals("id", id))
                .findFirst()
                .map(record -> new Book(record.getRaw("id"), record.getRaw("title"), record.getRaw("author")))
                .orElseThrow(() -> new BookNotFoundException("Book not found for id '%s'".formatted(id)));
    }

This works for me as the BookShelf already knows about column names so why not know about Records as well? Though I do make it a point not to expose Items since they contain too much internal knowledge about file pointers and column sizes. Up next: writing to the file.