Guile - Glob patterns

March 26, 2022
Tags:

Guile has built-in support for pattern matching with regular expressions. However, there's no support for globbing. Glob patterns are often use for matching pathnames with common suffixes and are easier to used compared to regexes.

One could use the foreign function interface around the glob(3) family. However, glob patterns are very easy to implement in Scheme itself. Thus in this post, I will show how to implement a simple globbing module for Guile.

Making a DSL

When making a domain specific language, one has to define the primitives, the means of combinations and the means of abstractions of the language (see Structure and Interpretation of Computer Programs).

Primitives

We will represent the glob patterns as a linear state machine. Every state is represented by a node. Nodes are chained together to form a pattern. The first node of the pattern is called the entry point and the last node is the terminal node. A string match a pattern, if when starting from the entry node, the terminal node was executed.

Thus, the primitive of our language is the node. A node is composed of a match procedure and a next node. Making a node from these parameters results in a procedure that accepts a string. This procedure returns the result of calling the next node with the rest to be matched or false.

(define (make-node match next)
  (lambda (string)
    (let ((rest (match string)))
      (if rest
          (next rest)
          #f))))

Using the previous definitions, we can define the terminal node.

(define the-terminal-node
  (make-node
   string-null?
   (const #t)))

Means of combination

The means of combination of our language is the chain procedure. One apply a list of matchers to this procedure in order to get a state machine.

(define (chain . matches)
  (if (null? matches)
      the-terminal-node
      (make-node (car matches)
                 (apply chain (cdr matches)))))

Means of abstraction

Finally, we need a means of abstraction. For this, we will add two functions that will built the different matchers for the nodes.

(define (make-prefix-match prefix)
  (lambda (string)
    (let ((len (string-prefix-length prefix string)))
      (if (= 0 len)
          (if (string-null? prefix) string #f)
          (substring string len)))))

(define (make-contains-match suffix)
  (if (string-null? suffix)
      (const "")
      (lambda (string)
        (let ((k (string-contains string suffix)))
          (if k
              (substring string (+ k (string-length suffix)))
              #f)))))

Using the DSL

We can now use our DSL like so.

(pk ((chain (make-prefix-match "include/")
            (make-contains-match ".h"))
     "include/foo.h"))
;;; (#t)

Compiling the matches

So far so good. However, it would be nice to be able to compile from a string representation our pattern. We won't botter implementing escaping character here for the sake of simplicity.

(define compile-glob-pattern
  (let ((cache (make-hash-table)))
    (lambda (pattern)
      (or (hash-ref cache pattern)
          (hash-set! cache pattern
           (apply chain
                  (let ((lst (string-split pattern (cut char=? <> #\*))))
                    (cons
                     (make-prefix-match (car lst))
                     (map make-contains-match (cdr lst))))))))))

(pk ((compile-glob-pattern "include/*.h") "include/foo.h"))
;;; (#t)

Matching patterns

Finally, we can add a little more abstraction by adding an helpful macro.

(define-syntax match-glob
  (syntax-rules (else)
    ((_ string ((pattern ...) expr expr* ...) ... (else eexpr eexpr* ...))
     (cond
      ((or ((compile-glob-pattern pattern) string) ...) expr expr* ...) ...
      (else eexpr eexpr* ...)))))


(pk
 (match-glob "include/foo.h"
             (("include/*.h") #t)
             (else #f)))
;;; (#t)

Testing

We finally can test our code to see if it works correctly.

(use-modules
 (glob)
 (srfi srfi-64))

(define (glob? pattern string)
  ((compile-glob-pattern pattern) string))

(test-begin "glob")

(test-assert "Empty string"
  (glob? "" ""))

(test-group "Glob any"
  (test-assert
      (and
       (glob? "*" "")
       (glob? "*" "foo"))))

(test-assert "No glob"
  (glob? "foo" "foo"))

(test-assert "prefix+glob"
  (glob? "foo*" "foobuz"))

(test-assert "glob+infix+glob"
  (glob? "*foo*" "barfoobuz"))
  
(test-assert "Multiple globs"
  (glob? "*foo*buz*baz*" "foo123buz456baz"))

(test-assert "glob+suffix"
  (glob? "*buz" "foobuz"))

(test-group "No match"
  (test-assert
      (not
       (or
        (glob? "foo" "fooo")
        (glob? "*foo" "foobuz")))))

(test-end "glob")

Full code

;;; SPDX-License-Identifier: GPL-3.0-or-later
;;;
;;; Copyright (c) 2022 Olivier Dion <olivier.dion@polymtl.ca>

(define-module (glob)
  #:use-module (srfi srfi-26)
  #:export-syntax (match-glob)
  #:export (compile-glob-pattern))

(define (make-node match next)
  (lambda (string)
    (let ((rest (match string)))
      (if rest
          (next rest)
          #f))))

(define the-terminal-node
  (make-node
   string-null?
   (const #t)))

(define (chain . matches)
  (if (null? matches)
      the-terminal-node
      (make-node (car matches)
                 (apply chain (cdr matches)))))

(define (make-prefix-match prefix)
  (lambda (string)
    (let ((len (string-prefix-length prefix string)))
      (if (= 0 len)
          (if (string-null? prefix) string #f)
          (substring string len)))))

(define (make-contains-match suffix)
  (if (string-null? suffix)
      (const "")
      (lambda (string)
        (let ((k (string-contains string suffix)))
          (if k
              (substring string (+ k (string-length suffix)))
              #f)))))

(define compile-glob-pattern
  (let ((cache (make-hash-table)))
    (lambda (pattern)
      (or (hash-ref cache pattern)
          (hash-set! cache pattern
           (apply chain
                  (let ((lst (string-split pattern (cut char=? <> #\*))))
                    (cons
                     (make-prefix-match (car lst))
                     (map make-contains-match (cdr lst))))))))))

(define-syntax match-glob
  (syntax-rules (else)
    ((_ string ((pattern ...) expr expr* ...) ... (else eexpr eexpr* ...))
     (cond
      ((or ((compile-glob-pattern pattern) string) ...) expr expr* ...) ...
      (else eexpr eexpr* ...)))))