Sean Cribbs - Latest Comments in Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Luca Garulli — Mon, 04 Oct 2010 05:08:37 -0000

OrientDB is a document-graph dbms. The main difference with MongoDB and CouchDB is that all the relationships are stored as direct links. This speed up the load of entire trees and graphs of at least 10x.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

robtweed — Sun, 21 Feb 2010 13:22:21 -0000

You ought to also take a look at GT.M: its database is a naturally hierarchical schemaless architecture that makes the creation of trees very simple and natural. The new M/Wire protocol - http://www.mgateway.com/mwi... - is now making GT.M a very accessible database.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

adamaig — Sat, 06 Feb 2010 21:12:11 -0000

It seems like this is one of those things that is solved when you start treating your runtime as separate than your datastore, and is actually why I asked @jnunemaker about including an identity map in mongo_mapper a few months back. If you add an id to each node of the tree for the tree it is in, you can index the tree id, and quickly pull all nodes into the IM. When you walk the tree then, you'd only be assembling the graph in memory since you'd always get a hit in the IM. Or am I missing something?

This makes the strong assumptions that you're most frequently accessing the tree from the root node, and that you'd always want the full tree, so if you want the ability to quickly load any subtree this is non-optimal.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

cemerick — Sat, 23 Jan 2010 22:17:20 -0000

It looks like these sorts of structures are made simpler in couch 0.11:

http://wiki.apache.org/couc...

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

jamieorc — Wed, 20 Jan 2010 23:17:20 -0000

@jnunemaker: can you give an example? Are the ancestors just a flat array or an array of arrays?

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

J Chris A — Wed, 20 Jan 2010 23:09:47 -0000

This is a good technique. It's simple and it also will scale (you can fragment the tree if it outgrows a single doc).

A common pattern in CouchDB is to cache the full path to the root on every node. Then a view lookup on the path will just work. The downside is you have to touch a lot of documents to do a high-level rename or move.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Marko A. Rodriguez — Sun, 17 Jan 2010 17:30:17 -0000

This is how I modeled a graph in MongoDB : http://bit.ly/6d6LKf ... I rely on indices of the _id property only. However, to traverse one step in a graph, it requires two queries --- one to get the edges of the current vertex, and then one to get the vertex to traverse to.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Sean Cribbs — Wed, 16 Dec 2009 19:24:10 -0000

Hogan, can you elaborate? You can only have one value per key in an JSON object, so I don't think I understand your solution.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Hogan Long — Wed, 16 Dec 2009 16:51:58 -0000

Make a heap.

{ "id:a","id:b", "id:c", "id:d", "null", "null", "null" }
Working with heaps is easy.
-- For those that don't know, child-right is index*2, child-left is index*2+1, parent is floor(index/2)
(if it gets to big, make heap of heaps... say 255 nodes or something)

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

mlmilleratmit — Fri, 20 Nov 2009 13:41:58 -0000

Exactly, I was a bit too short with my comment. We just build an index using:

for (node in doc.mpath) {
emit( node, <something_to_aggregate>);
}

and then you've got a fully traversable index to query with start/endkey. Augmented with a reduce function you can also then get summary statistics (e.g. total descendant count, tree level, etc). We actually do a bunch of stuff in a single view by using complex keys:

emit( ["descendant_count", node], 1);
emit( ["daughters_at_depth", node_level], 1);
...

This is such a natural pattern that we should likely build it into couch as a default tree view

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

jnunemaker — Thu, 19 Nov 2009 22:50:49 -0000

Yep, we are doing something similar. We store parent_id and parent_ids which is an array in mongo of all the ancestors up the tree. The benefit with mongo is you can index that and query on it to get all descendants as well.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

mlmilleratmit — Tue, 03 Nov 2009 17:00:52 -0000

We've worked this out for customers, and the best solution I've seen in production is for a single document to represent a parent-child edge:

{
"node" : "some_id",
"parent": "parent_id",
"mpath": [parent_id, that_nodes_parent,....,root]
}

The material path is the killer, because then a mapreduce view in couch will allow you to emit for each node in the material path to get aggregate counts for any sub-tree in the graph (assumes acyclic).

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Sean Cribbs — Wed, 07 Oct 2009 17:07:50 -0000

I'm just saying that document databases have different notions about how an index should be defined, and I was trying to reason about how to model this problem in the most idiomatic fashion for the datastore. Most of these patterns wouldn't work in CouchDB unless you have a view defined, period. Well... they'd "work", but you'd be loading up a lot of documents just to find a few that you need.

Re: Modeling a Tree in a Document Database - sean cribbs :: digital renaissance man

Jim Van Fleet — Wed, 07 Oct 2009 16:52:27 -0000

Since when is requiring an index a con? Trying to do this with one hand tied behind your back? :-D