Text 20 Feb 10 notes Explorations in Clojure’s core.logic

I first heard of core.logic three or four months ago, and I’ve been trying to learn more about it ever since. It’s apparently an implementation of miniKanren in Clojure, where miniKanren is a small language implemented in Scheme that’s a stripped-down version of a system called KANREN.
Long story short, core.logic is a library that lets you do logic programming in Clojure. 

Okay, so what’s logic programming?

I’d never heard of logic programming before, probably because I didn’t pay as much attention in school as I should have. Wikipedia says that it’s one of the four main programming paradigms, the other three being object-oriented, imperative, and functional. Here’s a simple logic program:

Let’s look at this program one step at a time. We’re invoking run*, which is a macro that gives you a fresh logic variable (which we’ve bound to the name q), lets you specify a series of goals that might relate to that logic variable, and returns a list containing all of the values that the logic variable can possibly take on in order for those goals to be fulfilled.

Here’s another simple program:

The result of this program is a list containing one value, _0. _0 denotes a fresh logic variable; the fact that our program returns _0 means that q can take on any value at all and the goals in the body of the run* call will still hold true. Neat!

At this point, I’d like to refer you to Ryan Senior’s post "The Magical Island of Kanren". I like the metaphor he uses in that post; he also has some great examples, and he introduces conde, which we’ll be using in a second. You’ll need to have read and understood that post in order to make sense of the rest of this one. It’s short and simple, go read it and come right back!


Okay: now that we have the beginnings of an understanding of core.logic’s vocabulary, let’s get to the reason I wrote this post.

core.logic allows you to ask questions about your data in a declarative way. It’s kind of like SQL in that respect, but it’s embedded in Clojure (so you have all of your favorite functional-programming tools available) and you don’t need to send all of your data over to a database before you can ask questions about it. Ryan Senior’s "Practical core.logic" talk (particularly minutes 7 through 13) really helped crystallize that for me, and right after watching it, I thought to myself - wouldn’t it be fun to write a simple core.logic program that implements document.getElementsByClassName? So I did. Let’s look at it!

To get started, we’ll need to import some libraries, pull down a Web page, and parse its HTML into a DOM-tree representation. Here’s some code that does that:

Now that we’ve got our data, let’s start writing our logic program. core.logic programs are expressed using relations, which is another way of saying “functions that return goals”. We’ll need to write a relation that takes a node and a classname and then returns a goal that says that the node should have the specified classname. Here’s one:

We introduce a relation called secondo, and use it to bind a fresh logic variable named attrs to the second element of the node, since the second element of a node is a map containing the node’s HTML attributes. Once we’ve done that, we can say that attrs should contain a k/v pair like e.g. {:class "user-passport"}. Goal written!

Now that we’ve got a way of saying that a logic variable that represents an HTML node should have a certain class name, we’re going to have to give ourselves a way of applying that goal to every element of the DOM tree. Here’s what that looks like:

One step at a time: to start off, we introduce a relation called childreno. It takes two logic variables, one representing an HTML node, and one representing the node’s children, and then uses resto twice to unify children with the (potentially empty) list of all elements in node after the first two elements, because that’s where the node stores its children.

Next up is nodes-existo, which is the workhorse of this program. This relation takes a relation and a couple of logic variables. It starts by using conso to split the passed-in nodes lvar apart into node and remaining-nodes. Now we use a conde to say that we want our program to check each of three cases. (And when I say “each”, I mean “each and every!” Note that while, as pg says, a cond expression will evaluate “only an L-shaped path of its subexpressions”, a conde doesn’t short-circuit like that - once it’s done checking one of its lines, it’ll backtrack out and check the next, regardless of whether the first line’s goals succeeded or failed.)

The first line’s simple enough: we say that if the goal returned by a call to (condition node) succeeds, then node should be unified with out.

In the next line, we introduce an lvar called children, use childreno to unify it with node's children, and recur on those children.

In the third line, we recur on remaining-nodes, which we used conso earlier to unify with the rest of the nodes in nodes.

And that’s basically it! All that’s left is get-elements-by-class-name, which is a plain old function that takes a node and a classname and uses nodes-existo and has-classnameo to tell run* how to identify the possible values of q. Simple!

So, what has this bought us?

At this point, the program we’ve written doesn’t look that much different from how we would have written it without core.logic. What’s the logic-programming approach bought us so far?

Well, for starters, we haven’t had to worry at all about assembling the list of nodes that we want to return. All we do is tell our program how to identify the sort of values we’re interested in, and tell it where it can look to potentially find more of them, and run* handles the rest. So that’s cool! One other cool property about this program is that you can run it backwards trivially - you can give it a classname and one or more nodes with that classname, and it’ll generate all of the possible DOM trees that could contain that node; but that’s not exactly useful in this particular situation.

So at this point, I’d say that core.logic has let us write our program slightly more declaratively, but this program probably isn’t a lot shorter than the non-core.logic version of it would have been.

Enter matche

I noticed in Ryan Senior’s talk - as well as in his article on appendo the great - that he used a construct called matche, which brings the joy of pattern matching to your core.logic programs. As far as I can tell, it’s only fully documented in Appendix C of William Byrd’s thesis, but Ambrose Bonnaire-Sergeant has a good introduction to it near the end of his great core.logic intro. Let’s take it out for a spin!

As long as we’re making changes, let’s first make has-classnameo a little more general - let’s rename it to has-attrso, and have it take a map of attributes (e.g. {:class "foo" :id "bar"}) that we want matching nodes to have. We can define it using defne, which is just shorthand for defining a function that does a matche on its arguments:

Note that we’re able to completely throw secondo away - we just say: this is what a node looks like, and here is a constraint that we want to apply to the node’s attributes. Neat! Now let’s take a look at the redefinition of big bad nodes-existo:

We’re able to inline the opening conso and fresh calls into the implied matche expression, which is cool. The first two lines of the conde look pretty familiar, but the line concerning the node’s children has changed a lot. We’re now using another matche in order to pull ?node apart and get at its ?children. This saves us a fresh call, a childreno call, and actually lets us delete the definition of childreno entirely.

Here’s our final program in its entirety:

This is about as declarative as it gets, folks. Our program grabs some data, and then very simply and succinctly (the meat of the program is ten lines of code!) tells run*: this is what the nodes we’re interested in look like, and this is where the rest of the nodes you’ll want to look at are; please find the good ones for us. Logic programming and pattern matching are a truly fearsome combination. I’m really excited about core.logic - it seems like a really useful tool.


Here’s an earlier version of nodes-existo:

It has a subtle bug, because out of reflex, I thought I had to define a base case. Can you find it? How do you think the bug will change the behavior of the program? Do you understand why the code works when that line is removed?

Also: our code to get elements by class name won’t work on elements that have multiple class names - that is, if we use this code to search for elements that have the class "foo", elements that have the classes "foo bar" won’t be matched. Enhancing this code to support that behavior is left as an exercise for the reader.


If you want to start diving into this stuff, here’s how I’d recommend doing it:

  • Read the official core.logic primer.
  • Watch this video by the miniKanren guys. It has what I seriously think is the coolest demo of anything ever. You’ll know what I’m talking about when you see it. 
  • Watch this other video by the same guys.
  • Here’s another one - it shares a lot of content with the first two videos, but in the final fifteen minutes they build and demo a working type inferencer from scratch for the hell of it. It’s really a treat to see these guys at work.
  • Read The Reasoned Schemer.

That’s actually about as far as I’ve gotten so far - I bought a copy of “The Art of Prolog” that I’ll read at some point, and I’m working on getting a hard copy of Byrd’s thesis to read through. I also have core.logic’s source code open in a tab and will hopefully read it all one of these days, and the author of core.logic has a recommended reading list. Aside from that, though, I’m not aware of many other resources besides the ones I cited throughout this post - this field is the coolest thing I’ve ever heard of, but it seems relatively small as of yet. Hopefully that’ll change as more and more people are introduced to it via core.logic!

(Acknowledgements: I’d like to thank the miniKanren guys and David Nolen for existing, everyone whose articles and documentation I cited for writing those things, and my roommates’ cat Isis for sitting on me for the entire five hours I spent writing this post.)

  1. tangrammer reblogged this from jrheard
  2. crypsys reblogged this from liamgoodacre and added:
    It’s like Prolog but slightly easier to read!
  3. liamgoodacre reblogged this from technotalk
  4. technotalk reblogged this from jrheard
  5. jrheard posted this

Design crafted by Prashanth Kamalakanthan. Powered by Tumblr.