Data Structures

A while ago, I wrote a href="http://scienceblogs.com/goodmath/2009/05/finally_finger_trees.php">couple of posts that claimed to talk about finger trees. Unfortunately, I really botched it. I'd read a bunch of data structure papers, and managed to get myself thoroughly scrambled. What I wrote about was distantly related to finger trees, and it was useful to help understand how fingertrees work - but it was not, in any way, shape, or form, actually a description of fingertrees. Since then, I've been meaning to write a proper post explaining finger trees - but with the work on my book, and…
From the April Communications of the ACM, the Kode Vicious column is on The Data-Structure Canon. The reader question is: In most areas of science there are a few basic underlying laws that inform the rest of the study of a given subject. Physics, chemistry, and electrical engineering all have these basic equations. What are the basic equations in computer science? Or is computer science baseless? In other words, what's the fundamental intellectual basis of computer science? Well, according to KV, it's data structures! If there were any set of basics that I wanted to hammer into software…
In the Haskell stuff, I was planning on moving on to some monad-related stuff. But I had a reader write in, and ask me to write another post on data structures, focusing on a structured called a zipper. A zipper is a remarkably clever idea. It's not really a single data structure, but rather a way of building data structures in functional languages. The first mention of the structure seems to be a paper by Gerard Huet in 1997, but as he says in the paper, it's likely that this was used before his paper in functional code --- but no one thought to formalize it and write it up. (In the…
As an alert commenter pointed out, I left out one really important thing in my earlier post about finger trees. That's what I get for trying to write when I'm sick :-). Seriously, this is a point that's implied by the post as it stands, but never explicitly stated - and since it's really important, it should be stated explicitly. The monoid-annotated tree structure doesn't replace the original data structure: it's superimposed on it. So, as I said: cons-cell style lists are ubiquitous and beloved by functional and non-functional programmers. In finger trees, you're not getting rid of them…
For ages, I've been promising to write about finger trees. Finger trees are an incredibly elegant and simple structure for implementing sequence-based data structures. They're primarily used in functional languages, but there's nothing stopping an imperative-language programmer from using them as well. In functional programming languages, lists are incredibly common. They show up everywhere; they're just so easy to use for recursive iteration that they're ubiquitous. I can't think of any non-trivial program that I've written in Haskell, Lisp, or OCaml that doesn't use lists. (Heck, I've…
This post is very delayed, but things have been busy. I'm working my way up to finger trees, which are a wonderful functional data structure. They're based on trees - and like many tree-based structures, their performance relies heavily on the balance of the tree. The more balanced the tree is, the better they perform. In general, my preference when I need a balanced tree is a red-black tree, which I wrote about before. But there are other kinds of balanced trees, and for finger trees, many people prefer a kind of tree called a 2/3 tree. A two-three tree is basically a B-tree with a…
Last week, I promised to continue my discussion of ropes. I'm going to break that promise. But it's in a good cause. If you're a decent engineer, one of the basic principles you should follow is keeping this as simple as possible. One of the essential skills for keeping things simple is realizing when you're making something overly complicated - even if it's cool, and fun, and interesting. Working out the rope code for more complex operations, I found myself thinking that it really seemed complex for what I was doing. What I'm doing is writing myself a text editor. I'm not writing a string…
It's been a while since I've written about any data structures. But it just so happens that I'm actually really working on implementing a really interesting and broadly useful data structure now, called a Rope. A bit of background, to lead in. I've got this love-hate relationship with some of the development tools that Rob Pike has built. (Rob is one of the Unix guys from Bell labs, and was one of the principal people involved in both the Plan9 and Inferno operating systems.) Rob has implemented some amazing development tools. The two that fascinate me were called Sam and Acme. The best and…
Another cool, but frequently overlooked, data structure in the tree family is called the B-tree. A B-tree is a search tree, very similar to a BST in concept, but optimized differently. BSTs provide logarithmic time operations, where the performance is fundamentally bounded by the number of comparisons. B-trees also provide logarithmic performance with a logarithmic number of comparisons - but the performance is worse by a constant factor. The difference is that B-trees are designed around a different tradeoff. The B-tree is designed to minimize the number of tree nodes that need to be…
Last post, I described the basic idea of the binary heap data structure. But I didn't talk about how to implement it - and there's a very cool way of implementing it - optimally space efficient, very fast, and with respectable cache locality for moderate sized datasets. The idea is to use a compact, array-based implementation of a binary tree. If we put the nodes in a graph into an array in the order in which they would by traversed by a breadth-first traversal of the tree, then you'll get a representation of a graph in which the children of the node at position N in the tree will be…
One of the most neglected data structures in the CS repertoire is the heap. Unfortunately, the jargon is overloaded, so "heap" can mean two different things - but there is a connection, which I'll explain in a little while. In many programming languages, when you can dynamically create objects in memory, the space from which you allocate the memory to create them is called the heap. That is not what I'm talking about. What I am talking about is an extremely simple but powerful structure that is used for maintaining a collection of values from which you want to be able to quickly remove the…