Slides Architecture based in the Git architecture
A summary and lessons learned
Scope
Marcio was able to make a commit-based system based in the Git architecture. The system is being tested in a presentation/slides authoring prototype. The goal of this project is to enable users to collaborate in a given presentation. At this point in time, the initial goal is to support the launch of FastClip.net 0.1, an MVP, so annotations can be done to enhance existing videos and slides can be produced in synchrony with videos enable a synchronized playback experience of slides<->video, enabling users to like/dislike the moments of videos, enhance the data, and more.
Estimates time and cost of achievement (2 years vs 20h?)
This project took about 2–3 years to get out of the door, from the perspective of motivation. However, in fact, based in our measuring system for projects going on, it is clear that current point of implementation was achieved with a total amount of hours of about 20–30h; being about 10h of reading. Now, knowing the right document or tutorials, it would be possible to do the initial reading work in about 3h, perhaps a professional developer would reach the same status with the straight 20–30h of work;
The side projects
Resulted from the learning are: https://github.com/taboca/node-version-control-tree and https://github.com/taboca/dag-acyclic-versioning-objects plus the new repository which is private. The main difference between the new project and these older relates to the storage mechanism — in the new it’s using Google Cloud Data Storage.
The architecture
In the prototype, the architecture for making presentations is based in a Presentation entity (similar to a repository or representing the user project), a Commit entity (linking to a tree), a Tree entity (linking to a collection of slides), and the Slide entity. When a user changes one existing slide, the hash of that slide changes, therefore changing the tree entity, therefore the system understands that it is important to make new objects: slide, tree, and a new commit. The existing Presentation is therefore updated with the new commit. If the user wants to see prior revision, she can simply query such most recent commit, and from that it pulls the parent commit, which points to another/prior tree, the prior collection of slides.
Learnings
Cache at the top level — 1
When being used in a development server, it became clear the importance of a caching mechanism that should be present at the top-level entity, Presentation; therefore enabling a quick retrieval of the current/HEAD presentation, the whole tree with slides included.
The reuse of user slides — 2
In Git, as a distributed revision control system, two users at different locations ar able to create similar structures that could be later described in a central repository as one. This is possible because tree and blob elements have their keys formed based in actual content and now user´s particular data such as date-time or any other user system information. If user A makes content A and user B makes content A, their remote trees should look the same.
If we just keep all the user content blobs as objects available to all users, and make their hash keys based in the actual content, it would imply that any user would be able to stumble in potential other user content. While ideally this indicates the idea of content reuse, in reality multiple users would end up doing slide contents that differ in one or other thing, thus it would be possible that a huge amount of users would have a huge amount of similar slides in nature but with different hashes. Therefore, for text which is subjective and may contain errors and different typing structure (user’s individuality in writing) it would not solve really nothing much.
It’s also important to notice that the user case which Git is good for has to do with programmers behavior when they are dealing with file changes in the filesystem such as when they move a file from a subtree to a parent tree, therefore the blob of file wont change, the key-hash also not change, and mainly the tree key-hash is changed (leading to a possible commit change) since the hash of the tree is the hash sum of the file contents.
With that, it could be then argued that a user A should be able to create slide S, put in a a Presentation’s Tree, and later change the order of slides, therefore changing the tree but not needed to put a new slide in the datastore. This case is code for saying that one user, or a private group, could be benefited from reusing their slides when they move slides in the tree — at this point moving the order.
Namespaces and silos for key-hashes — 3
Based in [2] and since we don’t want to have user A/group A to stumble/incorporate user B/group B slide, we would need to make slide’s key-hashes comprised not only of the “sum” of bytes of the content but also including a group or presentation key:
- Presentation_HASH+”Slide File content”
Behavior Lessons Leaned
- The 20–30h initial reading and learning work is important;
- If Marcio had followed his intuition and just do the work back at the time, it would have achieved more now, but he found himself stuck thinking and fearing going in this direction. Depending on who you ask, developers for example, they may tend to encourage you to not bother about these things and just follow basic concepts;
This memo is related to Fast Clip index: Slides 92a79c9e-0561–4f3d-81e0–03ca59e955d8, initially written by Marcio S Galli in 2018/September and intended to serve as a reflection and assessment about the project.