09 Dec 2017

Haskell deep dive: zsh-git-prompt

Tags	deep-dive haskell zsh zsh-git-prompt

In How To Become A Hacker, Eric S. Raymond gives the following golden advice:

Learning to program is like learning to write good natural language. The best way to do it is to read some stuff written by masters of the form, write some things yourself, read a lot more, write a little more, read a lot more, write some more … and repeat until your writing begins to develop the kind of strength and economy you see in your models.

This year, when I decided to give another shot at learning Haskell again, I realized that I needed not just tutorials to study, but also actual code. The reason is, as much as tutorials help to illustrate concepts, it is in actual code that one learns how to compose things together and see some tricks that are not covered in tutorials. About 6 years ago, I was an active user of Arch Linux and wanted to contribute to their package manager, Pacman. Pacman was written in C, which was a language I was using rather heavily at that time. I thought I knew C, but it was a rather eye opening experience to study the Pacman source code and see some real world C code from a program that I used on a day to day basis. Heck, I even contributed slightly to pacman-key probably as a result of that.

Ok, enough with the stuff that doesn’t concern anyone else.

After some serious searching, I found zsh-git-prompt. It is the probably the first serious Haskell program I’ve studied and understood. What makes this codebase so good for a beginner are:

It is pretty short. 464 lines to be precise for .hs files in the src dir based on the output of a find command
It is a real world program. At least for zsh users. What zsh-git-prompt does is, whenever you cd into a directory that is a git repository (and all subdirs in it), it will show you some information about the git repo. For instance, whether the git repo is clean, the number of staged changes, how many commits has it diverged from its tracking branch, etc
Once you install it, you see it all the time you are working with code. If you happen to be learning Haskell and happen to hit a wall and feel like giving up (happens to most people I believe), look at that shiny zsh-git-prompt showing you your git repo’s status and you know that Haskell is capable of doing so much and the difference maker is the person that is between the chair and the keyboard. Extra motivation to work harder to eventually be able to write something useful in Haskell!

Prerequisite knowledge

As I was writing this post, I realized that there are a number of things that the reader must know to truly understand the code (even with my guidance) and that for me to explain those concepts in detail will make an already long post even longer.

This knowledge is often summarized by the phrase “the first N chapters of LYAH”, where N is usually 7 and LYAH is the Learn You a Haskell book. I would say that the prereqs for understanding this post is pretty much the first 12 chapters of LYAH. Specifically, the following:

Some knowledge of Monads
Definition of the Maybe monad and the List monad. Specifically, each of their definition of >>= and what it does in do notation

Non Haskell related knowledge:

Some knowledge of git and shell scripting

Target Audience

Haskell beginners who have some / all of the prereq knowledge above. You should also be willing to google to find out more information about concepts I didn’t explain too well / skipped over.

If you have read LYAH or similar but you are finding it very hard to use your newfound knowledge to write a real world application, I believe that you will find this post helpful.

Software required

It is also highly recommended that you install zsh and zsh-git-prompt; you will doubly appreciate this post and what the zsh-git-prompt does. If you are a zsh user but just lack zsh-git-prompt, check out our blog post on how to install zsh-git-prompt.

Alternatively, if you do not wish to go through the hassle of installing zsh and zsh-git-prompt on your system, you can head over to https://github.com/yanhan/zsh-git-prompt-docker to pull / build our Docker image; simply follow the instructions in the README of that repo.

Version we are covering

We will be going through tag v0.5 of zsh-git-prompt. At the time of writing, it happens to be the HEAD of master branch. You can also go to https://github.com/olivierverdier/zsh-git-prompt/tree/v0.5 and browse the files there.

Throughout this post, we will be referencing zsh-git-prompt source code on its GitHub repo that fall under the v0.5 tag.

Finding main

Looking at stack.yaml, we see:

packages:
- 'src'

which tells us that we should look at the src directory. Listing that directory shows us there is a .cabal file in git-prompt.cabal. In the executable section, we see the following:

executable gitstatus
  hs-source-dirs:      app
  main-is:             Main.hs
  ghc-options:         -threaded -rtsopts -with-rtsopts=-N
  build-depends:       base, git-prompt, parsec >=3.1, process>=1.1.0.2, QuickCheck
  default-language:    Haskell2010
  ghc-options: -Wall -O2 -fno-warn-tabs -fno-warn-unused-do-bind
  cc-options: -O3

So the main function sits at app/Main.hs (within the top level src dir). As an aside, there are very few dependencies on third party libraries.

I have to admit that this is a rather roundabout way to find the main function. In practice, it is much easier to do a git grep -n main. But this process teaches us some stuff about Stack and Cabal.

The main function

main :: IO ()
main = do -- IO
  status <- getContents
  mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
  let result = do -- Maybe
    strings <- stringsFromStatus mhash status
    return (unwords strings)
  putStr (fromMaybe "" result)

Ok. This is short but not very straightforward at first glance. There are some functions that we may not be familiar with, so we turn to Hoogle.

getContents :: IO String
-- The getContents operation returns all user input as a single string, which
-- is read lazily as it is needed (same as hGetContents stdin).

unsafeInterleaveIO :: IO a -> IO a
-- unsafeInterleaveIO allows an IO computation to be deferred lazily. When
-- passed a value of type IO a, the IO will only be performed when the value of
-- the a is demanded. This is used to implement lazy file reading, see
-- hGetContents.

unwords :: [String] -> String
-- unwords is an inverse operation to words. It joins words with separating
-- spaces.

Ok. The first question is, what is with the status <- getContents? It is not like we are supplying any input via stdin to zsh-git-prompt; we simply see the zsh-git-prompt displayed on our terminal when we are in a git repo without having us to do anything. So this input must be coming from somewhere else.

Indeed, if we look at the Install section of the README, we see the following:

Source the file zshrc.sh from your ~/.zshrc config file, and configure your prompt. So, somewhere in ~/.zshrc, you should have:

source path/to/zshrc.sh
# an example prompt
PROMPT='%B%m%~%b$(git_super_status) %# '

The magic lies with the git_super_status zsh function and the zshrc.sh script. We open that file and find the git_super_status function. This is where the prompt gets constructed. Most notably, it starts with:

git_super_status() {
    precmd_update_git_vars

Here’s the definition of the precmd_update_git_vars function:

function precmd_update_git_vars() {
    if [ -n "$__EXECUTED_GIT_COMMAND" ] || [ ! -n "$ZSH_THEME_GIT_PROMPT_CACHE" ]; then
        update_current_git_vars
        unset __EXECUTED_GIT_COMMAND
    fi
}

which points to the update_current_git_vars function as the likely workhorse:

function update_current_git_vars() {
    unset __CURRENT_GIT_STATUS

    if [[ "$GIT_PROMPT_EXECUTABLE" == "python" ]]; then
        local gitstatus="$__GIT_PROMPT_DIR/gitstatus.py"
        _GIT_STATUS=`python ${gitstatus} 2>/dev/null`
    fi
    if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
        _GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
    fi
     __CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")
  GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
  GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
  GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
  GIT_STAGED=$__CURRENT_GIT_STATUS[4]
  GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
  GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
  GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]
}

What should catch our attention is the following 3 lines:

    if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
        _GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
    fi

Suppose GIT_PROMPT_EXECUTABLE has the value haskell. Then git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus is executed. Despite some experience in Bash, the &> tripped me up because I didn’t use it. So I did some googling and I found out that in Bash, the &> redirects both standard output and standard error to the same location, which in this case, is /dev/null.

That doesn’t make sense. If both standard output and standard error are redirected to /dev/null, wouldn’t the $__GIT_PROMPT_DIR/src/.bin/gitstatus program not get any input? Or, does that program not require any standard input and it will just work? To verify, I ran the following commands in a git repo:

git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus

versus

$__GIT_PROMPT_DIR/src/.bin/git status </dev/null

The first showed me:

master 95 0 0 0 1 1

and the second did not output anything. So clearly, it was receiving standard input from the git status --porcelain --branch command!

At this point, I was wondering, what the hell was going on? If all output from the git status --porcelain --branch command was redirected to /dev/null, shouldn’t it effectively be doing the same thing as supplying no standard input to the next program?

I tried a few other things but this one kind of blew my mind:

git status --porcelain --branch &>/dev/null >a >o

Both a and o contained the output of the command! Seems like there is multiple output redirection going on. Something I didn’t know was possible.

A google search for “stdout redirect to multiple linux” turned out the usual answers (most commonly using tee), but also this answer on Unix & Linux Stack Exchange:

With zsh:

ls > file1 > file2

(internally, zsh creates a pipe and spawns a process that reads from that pipe and writes to the two files as tee does. ls stdout is the other end of the pipe).

and also the following answer:

As @jofel mentioned in a comment under the answer, this can be done natively in zsh:

echo foobar >file1 >file2 >file3

or, with brace expansion:

echo foobar >file{1..3}

Internally this works very similarly to the tee answers provided above. The shell connects the command’s stdout to a process that pipes to multiple files; therefore, there isn’t any compelling technical advantage to doing it this way (but it does look real good). See the zsh manual for more.

And it links to the Redirection chapter of the zsh manual. Turns out zsh has a feature known as Multios that allows multiple output redirection. That section opens with:

If the user tries to open a file descriptor for writing more than once, the shell opens the file descriptor as a pipe to a process that copies its input to all the specified outputs, similar to tee, provided the MULTIOS option is set, as it is by default. Thus:

date >foo >bar

writes the date to two files, named ‘foo’ and ‘bar’. Note that a pipe is an implicit redirection; thus

date >foo | cat

writes the date to the file ‘foo’, and also pipes it to cat.

So we totally misunderstood the context. Our premise of reasoning about the behavior of the command in Bash is totally wrong because we are not using Bash but zsh!

Therefore

git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus

does indeed redirect the standard output of git status --porcelain --branch to the $__GIT_PROMPT_DIR/src/.bin/gitstatus program.

Looking at the lines 30 to 34 of stack.yaml:

# Extra directories used by stack for building
# extra-include-dirs: [/path/to/dir]
# extra-lib-dirs: [/path/to/dir]

local-bin-path: './src/.bin'

and line 23 of src/git-prompt.cabal:

executable gitstatus

We see that stack install will indeed build a program named gitstatus and place it in the src/.bin directory of the repo. So indeed our guess that something else is piping its output as standard input to the main function of the zsh-git-prompt Haskell program is correct. So we explained a grand total of… one truly meaningful line of Haskell code:

main :: IO ()
main = do -- IO
  status <- getContents

Nevertheless, we have learnt a lot more about how zsh-git-prompt works overall. Let’s return to our main function:

main :: IO ()
main = do -- IO
  status <- getContents
  mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
  let result = do -- Maybe
    strings <- stringsFromStatus mhash status
    return (unwords strings)
  putStr (fromMaybe "" result)

The next line of code is:

  mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash

and from our Hoogle search above:

unsafeInterleaveIO :: IO a -> IO a
-- unsafeInterleaveIO allows an IO computation to be deferred lazily. When
-- passed a value of type IO a, the IO will only be performed when the value of
-- the a is demanded. This is used to implement lazy file reading, see
-- hGetContents.

So unsafeInterleaveIO gitrevparse will only call the gitrevparse function when necessary. As for why it is unsafe, please read this Stack Overflow question and its answers. Truth to be told, I do not know enough to explain it and any explanation will make this already long post even longer.

The gitrevparse function is defined in the src/app/Main.hs file and is as follows:

gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
    mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
    let rev = do -- Maybe
      result <- mresult
      return (MkHash (init result))
    return rev

Here is the safeRun function, also in the src/app/Main.hs file:

safeRun :: String -> [String] -> IO (Maybe String)
safeRun command arguments =
  do -- IO
    output <- readProcessWithExitCode command arguments ""
    return (successOrNothing output)

Some relevant documentation for the System.Process.readProcessWithExitCode function:

readProcessWithExitCode
  :: FilePath                         -- Filename of the executable
  -> [String]                         -- any arguments
  -> String                           -- standard input
  -> IO (ExitCode, String, String)    -- exitcode, stdout, stderr

-- readProcessWithExitCode is like readProcess but with two differences:
-- * it returns the ExitCode of the process, and does not throw any exception if
--   the code is not ExitSuccess
-- * it reads and returns the output from process' standard error handle, rather
--   than the process inheriting the standard error handle.

Some relevant documentation for the System.Process.readProcess function:

readProcess
  :: FilePath     -- Filename of the executable (see RawCommand for details)
  -> [String]     -- any arguments
  -> String       -- standard input
  -> IO String    -- stdout

-- readProcess forks an external process, reads its standard output strictly,
-- blocking until the process terminates, and returns the output string. The
-- external process inherits the standard error.
--
-- If an asynchronous exception is thrown to the thread executing readProcess,
-- the forked process will be terminated and readProcess will wait (block) until
-- the process has been terminated.
--
-- Output is returned strictly, so this is not suitable for interactive
-- applications.

Hence, the following code:

gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
    mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
    -- some code omitted

safeRun :: String -> [String] -> IO (Maybe String)
safeRun command arguments =
  do -- IO
    output <- readProcessWithExitCode command arguments ""
    return (successOrNothing output)

is equivalent to running git rev-parse --short HEAD on the command line while supplying the empty string as stdin, waits for it to finish, then send the (ExitCode, stdout, stderr) 3-tuple to the successOrNothing function, which is also defined in the src/app/Main.hs file:

successOrNothing :: (ExitCode, a, b) -> Maybe a
successOrNothing (exitCode, output, _) =
  if exitCode == ExitSuccess then Just output else Nothing

successOrNothing is pretty straightforward; if our git rev-parse --short HEAD command exited successfully, then it will return the standard output string wrapped in a Just. Otherwise, it returns a Nothing.

Going back to the gitrevparse function:

gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
    mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
    let rev = do -- Maybe
      result <- mresult
      return (MkHash (init result))
    return rev

we see the use of the Maybe monad. If git rev-parse --short HEAD ran successfully, then mresult will be a Just String. The result <- mresult will then extract the standard output string, and init result will return everything except the last character, which in this case is a newline. If you run git rev-parse --short HEAD in a git repo, its standard output will be a short git commit SHA1 similar to 055f126c and ending with a newline. This git commit SHA1 is then passed to the MkHash data constructor:

newtype Hash = MkHash {getHash :: String}

which turns out to be a newtype wrapper. The return then wraps the whole thing in a Just again.

To summarize what the gitrevparse function does:

It runs git rev-parse --short HEAD and if successful, returns a Just (MkHash s) where s is a String wrapped in a Hash newtype that represents the git commit SHA1 that the HEAD is on
If git rev-parse --short HEAD fails, then a Nothing is returned.

Let us revisit the main function again:

main :: IO ()
main = do -- IO
  status <- getContents
  mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
  let result = do -- Maybe
    strings <- stringsFromStatus mhash status
    return (unwords strings)
  putStr (fromMaybe "" result)

Tying all that we know so far, we may or may not need the output of git rev-parse --short HEAD, hence the use of unsafeInterleaveIO to defer the computation. This deferred IO (Maybe Hash) action, along with status (which contains the output of git status --porcelain --branch &>/dev/null) are passed to the stringsFromStatus function, which seems to be doing the bulk of the work. We know this because a Hoogle search shows the following docs for unwords and fromMaybe:

unwords :: [String] -> String
-- unwords is an inverse operation to words. It joins words with separating
-- spaces.

fromMaybe :: a -> Maybe a -> a
-- The fromMaybe function takes a default value and a Maybe value. If the Maybe
-- is a Nothing, it returns the default value; otherwise it returns the value
-- contained in the Maybe.

So we pretty much covered the main function. Let’s get to the stringsFromStatus function next.

The `stringsFromStatus` function

We can find the stringsFromStatus function in src/src/Utils.hs:

stringsFromStatus :: Maybe Hash
                  -> String -- status
                  -> Maybe [String]
stringsFromStatus h status = do -- List
    processed <- processGitStatus (lines status)
    return (showGitInfo h processed)

The comment on line 67 is a mistake; this function lives inside the Maybe monad, not the List monad. Anyways. Here is some relevant documentation for lines:

lines :: String -> [String]
-- lines breaks a string up into a list of strings at newline characters. The
-- resulting strings do not contain newlines.
-- Note that after splitting the string at newline characters, the last part of
-- the string is considered a line even if it doesn't end with a newline.

So lines status will break the output of git status --porcelain --branch, which can consist of multiple lines, into a list of String, with each element in the list being one line in the original string. This list of strings is then passed to processGitStatus, defined as follows:

processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
    do -- Maybe
      mbranch <- processBranch branchLine
      status <- processStatus statusLines
      return (MkGitInfo mbranch status)

As its name suggests, processGitStatus handles output from git status. Specifically, git status --porcelain --branch &>/dev/null.

We will deal with the easy case first, where processGitStatus pattern matches its first argument against the empty list. In this case, a Nothing is returned. This case happens when git status --porcelain --branch &>/dev/null does not print anything to standard output, which occurs when we are not in a git repo. (Verify it!)

The other pattern match will lead us deeper into the code. It is a pattern match against a non-empty list. For this pattern match, we see that the author once again uses the do notation and we are inside the Maybe monad. First, the head of the list is bound to branchLine and passed to the processBranch function, which also lives inside the Maybe monad.

To understand the motivation behind this code, we have to know what the git status --porcelain --branch command is outputting. Here is the documentation for the --porcelain flag from the git status 2.15.0 manpage:

–porcelain[=<version>]

\ \ Give the output in an easy-to-parse format for scripts. This is similar to the short output, but will remain stable across Git versions and regardless of user configuration. See below for details.

\ \ The version parameter is used to specify the format version. This is optional and defaults to the original version v1 format.

and documentation for the --branch flag:

-b

--branch

\ \ Show the branch and tracking info even in short-format.

and the final part of the docs explaining the short-format output:

If -b is used the short-format status is preceded by a line

\ \ ## branchname tracking info

Armed with this information, we know that git status --porcelain --branch:

is a form of git status whose output is easy to parse for scripts and is similar to the short-format output
uses the porcelain v1 format
will show the branch and tracking info as the first line. This line looks similar to " ## branchname tracking info" and is precisely what the processGitStatus function passes to the processBranch function

The processBranch function is defined in the same file:

processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo

From line 4 of the same file:

import BranchParse (Branch(MkBranch), MBranchInfo, BranchInfo(MkBranchInfo), branchInfo, getDistance, pairFromDistance, Remote)

we see that both the MBranchInfo type and the branchInfo function are defined in src/src/BranchParse.hs. That is where we shall go to next.

The `branchInfo` function

The branchInfo function is defined at line 150 of src/src/BranchParse.hs:

branchInfo :: String -> Either ParseError MBranchInfo
branchInfo = parse branchParser' ""

The parse function is from the Parsec library. I am not the best guy to explain what Parsec does even though I know how to use it, but a simple explanation is, Parsec allows one to write parsers that look and work very much the same way as Context Free Grammars. Since Context Free Languages are a superset of Regular Languages, by extension, one can use Parsec to write Regular Expressions as well (even though they will look like CFGs) - do note that regexes in many languages are not truly regular and I am not certain how much of these non-regular features Parsec provides.

The docs for parse are slightly… difficult. But we will be needing its type signature, so here goes:

parse :: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a

The simpler way to explain it is, it takes in a Parsec “object” which is the parser, followed by a String (actually a type synonym named SourceName that is equivalent to String; usually I just use the empty string), followed by a String / Text / similar (in this case a String) containing the content we want to parse using the parser given in the first argument. Note that currying is used here because only 2 arguments were given to parse when it needs 3; that is reflected in the type signature of branchInfo, because it returns a function that takes in a String argument.

On success, parse returns a Right a. Based on the type signature of branchInfo, this a is an MBranchInfo - that is defined in the same file. On failure, parse returns a Left ParseError; a ParseError is a data type defined in the Parsec library that represents, well, a parse error.

Before we get into branchParser', recall how we got here. We were in the second half of processGitStatus:

processGitStatus (branchLine:statusLines) =
    do -- Maybe
      mbranch <- processBranch branchLine

where we were handed the first line in the output of git status --porcelain --branch, which is bound to branchLine.

processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo

And processBranch calls branchInfo and hands branchLine to it. Which branchInfo will now attempt to parse using the branchParser' parser.

Parsing the branch line

branchParser' is defined in src/src/Branchparse.hs:

branchParser' :: Parser MBranchInfo
branchParser' =
  do -- Parsec
    string "## "
    branchParser

Parser is a type synonym for Parsec String () and is defined in Text.Parsec.String of the Parsec library. MBranchInfo is a type synonym for Maybe BranchInfo and is defined on line 63 of src/src/BranchParse.hs. BranchInfo is a type constructor defined on line 61 of src/src/BranchParse.hs. Hence, Parser MBranchInfo expands to Parsec String () (Maybe BranchInfo). What this means is, if there are no parsing errors, we get a Maybe BranchInfo. (Not exactly but we’ll get to that later).

The string function is from the Parsec library. It literally looks for the string supplied to it in the content it is supposed to parse. In this case, it looks for the ## string (there is a trailing space but it doesn’t show up in the HTML here) in the first line of git status --porcelain --branch. If that line starts with the given string, we move on to the next parser ,branchParser. Otherwise, a ParseError results and parsing stops.

Note that the branchParser' code uses do notation (once again) but this time we are in the Parser or equivalently Parsec String () monad.

Here is the definition of branchParser:

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

This consumes the remaining of the line after the ## (with a trailing space). try and <|> are both defined in the Parsec library and they are often used together. The documentation for <|> is especially good:

(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
-- This combinator implements choice. The parser p <|> q first applies p. If it
-- succeeds, the value of p is returned. If p fails without consuming any input,
-- parser q is tried. This combinator is defined equal to the mplus member of
-- the MonadPlus class and the (<|>) member of Alternative.

The initial part of the documentation for the try function is pretty good too:

try :: ParsecT s u m a -> ParsecT s u m a
-- The parser try p behaves like parser p, except that it pretends that it
-- hasn't consumed any input when an error occurs.
--
-- This combinator is used whenever arbitrary look ahead is needed. Since it
-- pretends that it hasn't consumed any input when p fails, the (<|>) combinator
-- will try its second alternative even when the first parser failed while
-- consuming input.

Essentially, branchParser will first attempt to parse the input string using the try noBranch parser, then if that fails, the try will ensure that no input is consumed by noBranch - and because no input is consumed, the <|> will then move on to the next parser, which is try newRepo. And if that fails, no input will be consumed, and it moves on to try branchRemoteTracking, and so on, in the specified order. If there is any successful parse, the parsing halts. People with knowledge of CFGs will appreciate how this code looks.

To understand each of these parsers, we need to play around with some git repositories and observe the output of the git status --porcelain --branch command. In this process, we will also be learning more about Parsec.

The `noBranch` parser

First off the list, the noBranch parser (defined here):

noBranch :: Parser MBranchInfo
noBranch =
  do -- Parsec
    manyTill anyChar (try (string " (no branch)"))
    eof
    return Nothing

The manyTill, anyChar and eof parsers are new to us. They are defined in the Parsec library and pretty much do what they say.

manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
-- manyTill p end applies parser p zero or more times until parser end succeeds.
-- Returns the list of values returned by p.

anyChar :: Stream s m Char => ParsecT s u m Char
-- This parser succeeds for any character. Returns the parsed character.

eof :: (Stream s m t, Show t) => ParsecT s u m ()
-- This parser only succeeds at the end of the input. This is not a primitive
-- but it is defined using notFollowedBy.

So manyTill anyChar (try (string " (no branch)")) will apply the anyChar parser zero or more times until the try (string " (no branch)") parser succeeds. On success, it returns a list of all the Char consumed by anyChar. We know that the string " (no branch)" parser expects and consumes the string " (no branch)"; wrapping it in a try allows us to avoid a parse error while it is used in conjunction with manyTill anyChar, as more and more characters are consumed by repeated applications of anyChar until we finally encounter the string " (no branch)". Then the manyTill anyChar (try (string " (no branch)")) parser succeeds.

The eof parser then expects us to have reached the end of the input. Or in this case, the end of the first line of the output of git status --porcelain --branch. If everything goes well, a Parser Nothing is returned.

To put this in simpler terms, noBranch is expecting a single line that looks like abcdefgh ijklm nopqrs (no branch). Notice how the list of characters accumulated by manyTill anyChar are discarded.

We can probably guess that noBranch is meant for parsing a branch line for a git repo that isn’t on a branch. This happens in the detached HEAD state. To see what the line looks like, simply go to any of your git repos with at least 2 commits, make sure you have committed / stashed all your changes, then run the following commands:

git checkout -b HEAD~
git status --porcelain --branch

The first line should look similar to the following:

## HEAD (no branch)

and this will be happily parsed by branchParser' first with string "## " followed by branchParser using the try noBranch parser, returning a Parser Nothing. So now we know that if branchParser' returns a Parser Nothing, then the git repo is in the detached HEAD state. Nice.

The `newRepo` parser

The try newRepo parser will be used by branchParser if parsing using try noBranch fails.

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

Its definition is as follows:

newRepo :: Parser MBranchInfo
newRepo =
  do -- Parsec
    string "Initial commit on "
    branchOnly

Based on the string "Initial commit on " parser alone, we can safely assume that this is for a new git repo. By now, we are quite familiar with what string does, so let’s look at the branchOnly parser, defined here:

branchOnly :: Parser MBranchInfo
branchOnly =
  do -- Parsec
    branch <- many (noneOf " ")
    eof
    let bi = MkBranchInfo (MkBranch branch) Nothing
    return (Just bi)

Documentation for the noneOf parser combinator:

noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
-- As the dual of oneOf, noneOf cs succeeds if the current is character not in
-- the supplied list of characters cs. Returns the parsed character.

-- Example code:
    consonant = noneOf "aeiou"

When used with many, this will consume as many characters as possible, as long as they are not the space character, and return the list of characters consumed. Notice that this time, the author binds the list of characters consumed by many (noneOf " ") to branch. Immediately following that, an eof is expected. Therefore, branchOnly expects the input to consist of only non space characters.

I was expecting newRepo to handle the first line of git status --porcelain --branch for new git repositories but that was not the case. On git 2.15.0 for a new repo initialized using git init but with zero commits, I am getting the following output:

## No commits yet on master

That is in the Porcelain v1 output format, which zsh-git-prompt expects. Porcelain v2 is in a different format and is not supported by zsh-git-prompt. I do not see anything on my zsh prompt that indicates this new directory I ran git init in is a git repo. Since this doesn’t work for a git repo that was just created using git init and has zero commits, I added the initial commit and ran git status --porcelain --branch again and… it wasn’t what we are expecting but is instead ## master. Changing the commit message to Initial commit and similar does not change anything too.

The only explanation I can come up with is this: perhaps the git status Porcelain format changed since the last version of zsh-git-prompt? After all, at this time of writing, the most recent commit was on 15 Feb 2016 and for v0.5, which is what we are studying right now.

Regardless, let’s go back to branchOnly and go through the final 2 lines:

    let bi = MkBranchInfo (MkBranch branch) Nothing
    return (Just bi)

MkBranch is a newtype wrapper defined at line 38 of src/src/BranchParse.hs:

newtype Branch = MkBranch String deriving (Eq)

while MkBranchInfo is a data constructor defined at line 61 of the same file:

data BranchInfo = MkBranchInfo Branch (Maybe Remote) deriving (Eq, Show)

We can see that Branch just wraps a String that is a git branch name. BranchInfo has the one MkBranchInfo data constructor which takes in 2 arguments: a Branch and a Maybe Remote. We shall not cover the Remote type for now. Essentially, this code:

    let bi = MkBranchInfo (MkBranch branch) Nothing
    return (Just bi)

Creates a representation for a git branch with a Nothing for the Maybe Remote part, then returns a Just BranchInfo if the parsing succeeds.

Putting everything together:

newRepo :: Parser MBranchInfo
newRepo =
  do -- Parsec
    string "Initial commit on "
    branchOnly

branchOnly :: Parser MBranchInfo
branchOnly =
  do -- Parsec
    branch <- many (noneOf " ")
    eof
    let bi = MkBranchInfo (MkBranch branch) Nothing
    return (Just bi)

We see that the newRepo parser expects a string similar to:

Initial commit on some-branch-name

and on a successful parse, returns a Just BranchInfo which represents a git branch.

The `branchRemoteTracking` parser

If both try noBranch and try newRepo fail, then branchParser tries the try branchRemoteTracking parser.

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

The branchRemoteTracking parser is the most complicated of the bunch, at line 84 of src/src/BranchParse.hs:

branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
  do -- Parsec
    branch <- trackedBranch
    tracking <- many (noneOf " ")
    char ' '
    behead <- inBrackets
    let remote = MkRemote (MkBranch tracking) (Just behead)
    let bi = MkBranchInfo branch  (Just remote)
    return (Just bi)

Definition of trackedBranch:

trackedBranch :: Parser Branch
trackedBranch =
    do -- Parsec
      b <- manyTill anyChar (try (string "..."))
      return (MkBranch b)

Our experience with Parsec tells us that trackedBranch will consume as many characters as possible until it hits the string .... The list of characters consumed is bound to b and then wrapped in the MkBranch newtype wrapper and returned.

Following that (still in branchRemoteTracking), tracking <- many (noneOf " ") will consume as many characters as possible until it hits the space character. The list of characters consumed is bound to tracking. Subsequently, char ' ' expects a single space character and consumes and discards it.

inBrackets is defined as follows, on line 128:

inBrackets :: Parser Distance
inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)

The Distance type constructor is defined at line 21, but I will be showing the comments from lines 11 to 19 as well because they pretty much describe what we will be covering next:

{-
 The idea is to parse the first line of the git status command.
 Such a line may look like:
  ## master
or
  ## master...origin/master
or
  ## master...origin/master [ahead 3, behind 4]
 -}

data Distance = Ahead Int | Behind Int | AheadBehind Int Int deriving (Eq)

So Distance represents how many commits the current branch is ahead and/or behind its remote tracking branch; its data constructors are all aptly named.

Going back to inBrackets:

inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)

between is a function defined in the Parsec library. Documentation as follows:

between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a
-- between open close p parses open, followed by p and close. Returns the value returned by p.

So essentially, inBrackets expects some string that satisfies one of behind, try aheadBehind or ahead in between a [ and ]. There is a subtlety with the use of try in try aheadBehind that we will explain later. Now, let’s talk a look at behind, aheadBehind and ahead.

behind is defined at line 140 of src/src/BranchParse.hs:

behind :: Parser Distance
behind = makeAheadBehind "behind" Behind

Recall that Behind is one of the data constrcutors of Distance. makeAheadBehind is defined at line 131 of the same file:

makeAheadBehind :: String -> (Int -> Distance) -> Parser Distance
makeAheadBehind name constructor =
  do -- Parsec
    string (name ++ " ")
    dist <- many1 digit
    return (constructor (read dist))

Documentation for many1 and digit, both in the Parsec library:

many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a]
-- many1 p applies the parser p one or more times. Returns a list of the
-- returned values of p.

digit :: Stream s m Char => ParsecT s u m Char
-- Parses a digit. Returns the parsed character.

We see that behind = makeAheadBehind "behind" Behind. This will first consume the string "behind " (and discard it), then consume 1 or more digits and bind the list of digits to dist. Since constructor has type Int -> Distance, read dist will convert the list of digits into an Int, then pass it to constructor to create a Distance. In this case, the constructor is the Behind data constructor, which takes in 1 Int and creates a Distance.

The behind parser wants to parse a string similar to behind 5 and returns a Behind n. inBrackets can therefore consume a string similar to [behind 5].

inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)

The other possibility that inBrackets can go down is try aheadBehind. Let’s look at the aheadBehind parser, defined at line 142 of src/src/BranchParse.hs:

aheadBehind :: Parser Distance
aheadBehind =
  do -- Parsec
    Ahead aheadBy <- ahead
    string ", "
    Behind behindBy <- behind
    return (AheadBehind aheadBy behindBy)

ahead is defined at line 138 of the same file:

ahead :: Parser Distance
ahead = makeAheadBehind "ahead" Ahead

aheadBehind will first call ahead, which calls makeAheadBehind, which consumes the string "ahead " (and discard it), then consume 1 or more digits and creates an Ahead Int. The string ", " will consume the string ", ". Next, behind springs into action (we covered that above) and consumes "behind " followed by 1 or more digits. Note that pattern matching is done to get the Int in the Behind so that the Int is bound to behindBy. Finally, an AheadBehind Int Int is created. All in all, inBrackets that goes down the route of aheadBehind consumes a string similar to the following:

[ahead 13, behind 7]

Returning to inBrackets once again:

inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)

We see that the final possible branch is ahead. We have already covered this while going through aheadBehind. For completeness, if inBrackets goes down the route of ahead, a string similar to [ahead 10] is desired.

Earlier, we mentioned a subtlety in the use of try in try aheadBehind for the inBrackets parser. One might ask, why only wrap aheadBehind in a try? Why not wrap behind and ahead in try as well?

We do not have to wrap the behind parser in a try, because it uses the string "behind " parser to consume the string "behind ". Notice that the string "behind " and the string "ahead " differ in the first character (b vs. a) - this causes the behind parser to fail immediately without consuming any input. Since it does not consume any input, the <|> ensures that it will go on to try the next parser in try aheadBehind.

We see this fine print in the documentation for (<|>):

(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
-- This combinator implements choice. The parser p <|> q first applies p. If it
-- succeeds, the value of p is returned. If p fails without consuming any input,
-- parser q is tried. This combinator is defined equal to the mplus member of
-- the MonadPlus class and the (<|>) member of Alternative.

Specifically, the part that says If p fails without consuming any input, parser q is tried.

There is overlap between strings that aheadBehind and ahead parse. aheadBehind expects strings of the form ahead M, behind N, while ahead expects a string similar to ahead M, with M and N being non negative integers. If we were to rearrange things and use behind <|> ahead <|> try aheadBehind, then for the input string ahead 7, behind 9, the behind parser will fail without consuming any input, then <|> will use the ahead parser to consume the string "ahead 7" and stop there. The (behind <|> ahead <|> try aheadBehind) parser succeeds, but between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead) will fail because the next character is not a ] but a ,. Hence, aheadBehind must be attempted before ahead.

So we have established that aheadBehind must be attempted before ahead. Minimally, we have to use behind <|> aheadBehind <|> ahead. Now for the try. What happens if behind <|> aheadBehind <|> ahead parses the string "ahead 5" (which is valid for a git branch that is only ahead but not behind its remote tracking branch)? The behind parser fails without consuming any input, so <|> tries aheadBehind, which consumes the entire "ahead 5" but then that expects a ", ", so parsing fails. Because input was consumed, the next <|> does not try the ahead parser. Hence we need to wrap aheadBehind in a try so it will not consume any input on parse failure and chaining it with <|> ahead will move on to try the ahead parser.

Now that we know what the inBrackets parser does, we go back to what brought us here in the first place, branchRemoteTracking:

branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
  do -- Parsec
    branch <- trackedBranch
    tracking <- many (noneOf " ")
    char ' '
    behead <- inBrackets
    let remote = MkRemote (MkBranch tracking) (Just behead)
    let bi = MkBranchInfo branch  (Just remote)
    return (Just bi)

Because inBrackets took a while to explain, if necessary, you might want to read what we previously covered for branchRemoteTracking to refresh your knowledge before carrying on.

To understand the data structures involved, we have to know what we are trying to do here. branchRemoteTracking is trying to parse a string where the current git branch that has a remote tracking branch and falls under one of the 3 cases:

it is some commits ahead of its remote tracking branch
it is some commits behind its remote tracking branch
it is some commits ahead AND some commits behind its remote tracking branch

An example of a string that satisfies case 3 is:

master...origin/feat [ahead 5, behind 3]

Armed with this information, we know that

    let remote = MkRemote (MkBranch tracking) (Just behead)

Captures the information about the remote tracking branch in MkBranch tracking and the number of commits the current branch is ahead and/or behind the remote tracking branch in Just behead.

The Remote type constructor and the MkRemote data constructor are defined at line 56 of src/src/BranchParse.hs:

data Remote = MkRemote Branch (Maybe Distance) deriving (Eq, Show)

There is only 1 data constructor, MkRemote. We see that a remote represents a remote tracking branch (the Branch parameter) and the number of commits the current branch is ahead and/or behind this remote tracking branch (the Maybe Distance parameter). It is possible that the current branch and its remote tracking branch are in sync and Maybe Distance allows us to use Nothing to represent that.

The remaining lines in branchRemoteTracking:

    let bi = MkBranchInfo branch  (Just remote)
    return (Just bi)

creates a BranchInfo object using its single data constructor MkBranchInfo, passing in the current branch (in branch) and information about the remote tracking branch (in Just remote). Then it wraps the BranchInfo inside a Just and uses return on it.

Here’s the definition for the BranchInfo type constructor:

data BranchInfo = MkBranchInfo Branch (Maybe Remote) deriving (Eq, Show)

Earlier when we covered the branchOnly parser, we mentioned we will explain the Maybe Remote part in MkBranchInfo. See how branchOnly also uses MkBranchInfo but passes in a Nothing for the Maybe Remote:

branchOnly :: Parser MBranchInfo
branchOnly =
    -- omitted
    let bi = MkBranchInfo (MkBranch branch) Nothing
    -- omitted

The Nothing indicates that there is no remote tracking branch for the current branch.

To summarize, the branchRemoteTracking parser wants to consume a string similar to one of the three variants below:

master...origin/feat [ahead 7]
bourbon...origin/rice-noodles [ahead 10, behind 4]
fix-a-pesky-bug...workplace/nice-feature-work [behind 2]

In other words, a branch that has a remote tracking branch and is some commits ahead and/or behind that remote tracking branch.

The `branchRemote` parser

In the event that try noBranch, try newRepo and try branchRemoteTracking all fail, branchParser attempts the try branchRemote parser.

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

The branchRemote parser is defined at line 96 of src/src/BranchParse.hs:

branchRemote :: Parser MBranchInfo
branchRemote =
  do -- Parsec
    branch <- trackedBranch
    tracking <- many (noneOf " ")
    eof
    let remote = MkRemote (MkBranch tracking) Nothing
    let bi = MkBranchInfo branch (Just remote)
    return (Just bi)

Its definition is eerily similar to that of branchRemoteTracking:

branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
  do -- Parsec
    branch <- trackedBranch
    tracking <- many (noneOf " ")
    char ' '
    behead <- inBrackets
    let remote = MkRemote (MkBranch tracking) (Just behead)
    let bi = MkBranchInfo branch  (Just remote)
    return (Just bi)

Except that in terms of parsers, these 2 are not there:

    char ' '
    behead <- inBrackets

but are instead replaced by the eof parser, which expects there to be no more input.

With what we have covered for branchRemoteTracking, it should not be difficult to see that branchRemote expects a string similar to:

refactoring...origin/refactoring

which is a git branch that has a remote tracking branch and is perfectly in sync with it. From

    let remote = MkRemote (MkBranch tracking) Nothing
    let bi = MkBranchInfo branch (Just remote)

we see that the 2nd argument to MkRemote is a Nothing, which indicates that the git branch and its remote tracking branch are perfectly in sync.

Due to the overlap between the strings that branchRemoteTracking and branchRemote parsers consume, specifically, that branchRemoteTracking will consume what branchRemote consumes and more, therefore, try branchRemoteTracking has to be attempted before try branchRemote.

The `branchOnly` parser

The final parser the branchParser will use, when all else fails, is the branchOnly parser:

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

Notice that it is not wrapped in a try, because this is the final parser in the chain and we do not need to care about whether input is consumed upon failure and we can just let it fail.

branchOnly is defined at line 106 of src/src/BranchParse.hs:

branchOnly :: Parser MBranchInfo
branchOnly =
  do -- Parsec
    branch <- many (noneOf " ")
    eof
    let bi = MkBranchInfo (MkBranch branch) Nothing
    return (Just bi)

We have covered it when we went through the newRepo parser, so we shall not cover it here again. In short, branchOnly consumes a string containing just a branch name and has no remote tracking branch. To see this in an actual git repo, simply do a git checkout -b some-crazy-weird-branch-name and run git status --porcelain --branch. This first line in the output will look similar to:

## some-crazy-weird-branch-name

Because this overlaps with what the try branchRemoteTracking and try branchRemote parsers consume, we have to attempt those before the branchOnly parser.

With that, we have completed our coverage of branchParser.

Summary of `branchParser'`

branchParser' :: Parser MBranchInfo
branchParser' =
  do -- Parsec
    string "## "
    branchParser

branchParser :: Parser MBranchInfo
branchParser =
      try noBranch
    <|> try newRepo
    <|> try branchRemoteTracking
    <|> try branchRemote
    <|> branchOnly

To summarize branchParser', below, we give one example on each line for each of the parsers that branchParser can use:

## HEAD (no branch)
## Initial commit on something-that-doesnt-seem-to-work-for-git-2-15-0
## localbranch...remote/remote-tracking-branch [ahead 5, behind 5]
## localbranch...remote-two/another-remote-tracking-branch
## just-a-local-branch

Going back to the caller of `branchParser'`

Now that we are done with branchParser (and branchParser'), we go back to what led us down this path:

branchInfo :: String -> Either ParseError MBranchInfo
branchInfo = parse branchParser' ""

processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo

On parse success, branchInfo returns a Right MBranchInfo. On parse failure, branchInfo returns a Left ParseError. Its calling function processBranch uses rightOrNothing, defined at line 15 of src/src/Utils.hs:

rightOrNothing :: Either a b -> Maybe b
rightOrNothing = either (const Nothing) Just

to convert a Left ParseError into a Nothing, and convert a Right MBranchInfo into a Just MBranchInfo. The either function is from the Data.Either module:

either :: (a -> c) -> (b -> c) -> Either a b -> c
-- Case analysis for the Either type. If the value is Left a, apply the first
-- function to a; if it is Right b, apply the second function to b.

while the const function should be a familiar staple:

const :: a -> b -> a
-- const x is a unary function which evaluates to x for all inputs.
-- For instance,
-- >>> map (const 42) [0..3]
-- [42, 42, 42, 42]

Notice that rightOrNothing will discard the ParseError that is embedded in the Left on a parse failure. In other applications, the ParseError may be used to display a meaningful error message giving some hints as to why parsing failed. But in this case, we do not care for that.

processBranch is invoked by processGitStatus, defined at line 21 of src/src/Utils.hs:

processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
    do -- Maybe
      mbranch <- processBranch branchLine
      status <- processStatus statusLines
      return (MkGitInfo mbranch status)

On a successful parse of branchLine by processBranch, mbranch will be a MBranchInfo. Do note that we are in the Maybe monad. On an unsuccessful parse, processBranch branchLine will result in Nothing and the rest of the computations in processGitStatus will not be performed and a Nothing will be its return value.

We shall move on to processStatus, the next major piece of this program.

The `processStatus` function

processStatus is defined at line 50 of src/src/StatusParse.hs:

processStatus :: [String] -> Maybe (Status Int)
processStatus statLines =
  do -- Maybe
    statList <- for statLines extractMiniStatus
    return (countStatus statList)

This function parses all the lines from 2nd line to the final line of the output of git status --porcelain --branch. The function for is defined in the Data.Traversable module:

for :: (Traversable t, Applicative f) => t a -> (a -> f b) -> f (t b)
-- for is traverse with its arguments flipped. For a version that ignores the
-- results see for_

traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
-- Map each element of a structure to an action, evaluate these actions from
-- left to right, and collect the results. For a version that ignores the
-- results see traverse_

traverse is part of the Traversable type class. We include it here because for is defined in terms of traverse.

When we fit for statLines extractMiniStatus to the type signature of for, we get:

for :: (Applicative f) => [] String -> (String -> f b) -> f ([] b)

The Traversable here is the [] constructor (not to be confused with the empty list). extractMiniStatus is the String -> f b function. It is defined at line 45 of src/src/StatusParse.hs and has the following type signature:

extractMiniStatus :: String -> Maybe MiniStatus

Maybe is an Applicative, and we see that our b is MiniStatus. Using this new found information about the types, we have:

for :: [] String -> (String -> Maybe MiniStatus) -> Maybe ([] MiniStatus)

Hence, statList in:

statList <- for statLines extractMiniStatus

has type [MiniStatus].

MiniStatus is a type constructor defined at line 14 of the same file. To better explain things, we include the comment above it as well:

{- The two characters starting a git status line: -}
data MiniStatus = MkMiniStatus Char Char

Here is the definition of extractMiniStatus:

extractMiniStatus :: String -> Maybe MiniStatus
extractMiniStatus [] = Nothing
extractMiniStatus [_] = Nothing
extractMiniStatus (index:work:_) = Just (MkMiniStatus index work)

We see that if a string has less than 2 characters, it returns a Nothing. Otherwise, it uses pattern matching to extract the first 2 characters and pass them to the MkMiniStatus data constructor. The author uses index and work for the name bindings for the first and second characters respectively, which is a hint that this has something to do with the git index and the work tree.

To understand the behavior of for, we look at its definition:

for = flip traverse

It is as the documentation says. This is not very meaningful, so we have to look at the definition of traverse for lists:

instance Traversable [] where
    {-# INLINE traverse #-} -- so that traverse can fuse
    traverse f = List.foldr cons_f (pure [])
      where cons_f x ys = liftA2 (:) (f x) ys

In our case, extractMiniStatus is the f. Notice the liftA2 (:) (f x) ys. If f x returns a Nothing at some point, then we have:

liftA2 (:) Nothing ys

which should stay a Nothing for the remaining of the computation and there is no escape from it. But let us verify whether this is the case, by looking at the definition of liftA2 in the Applicative instance of Maybe:

instance Applicative Maybe where
    pure = Just

    Just f  <*> m       = fmap f m
    Nothing <*> _m      = Nothing

    liftA2 f (Just x) (Just y) = Just (f x y)
    liftA2 _ _ _ = Nothing

    Just _m1 *> m2      = m2
    Nothing  *> _m2     = Nothing

liftA2 (:) Nothing ys is covered by the case

    liftA2 _ _ _ = Nothing

Therefore, once we get a Nothing in traverse, this definition of liftA2 ensures that we will always get a Nothing. Which means that extractMiniStatus is banking on its final pattern match:

extractMiniStatus (index:work:_) = Just (MkMiniStatus index work)

for any meaningful computation to be done. The other pattern matches (which return Nothing) all indicate failure.

If the 2nd till the final line of git status --porcelain --branch all pattern match against the final pattern match in extractMiniStatus, then for statusList extractMiniStatus returns a Just [MiniStatus]. If even one line doesn’t pattern match against the final pattern match, then for statusList extractMiniStatus returns a Nothing.

To understand what extractMiniStatus is pattern matching on, we quote some relevant documentation from the short format section of the git status manpage for git 2.15.0:

In the short-format, the status of each path is shown as

    XY PATH1 -> PATH2

where PATH1 is the path in the HEAD, and the " -> PATH 2" part is shown only
when PATH1 corresponds to a different path in the index/worktree (i.e. the file
is renamed). The XY is a two-letter status code.

For paths with merge conflicts, X and Y show the modification states of each
side of the merge. For paths that do not have merge conflicts, X shows the
status of the index, and Y shows the status of the work tree. For untracked
paths, XY are ??. Other status codes can be interpreted as follows:

...omitted...

Indeed the first character of each line shows the state of the file in the index, while the second character shows the state of the file in the work tree. Notice how extractMiniStatus does not care about the rest of the characters on each line.

The final line of processStatus:

    return (countStatus statList)

calls the countStatus on the [MiniStatus] gathered, assuming all went well. If for statusList extractMiniStatus returns Nothing, then processStatus also returns a Nothing. Let us look at the countStatus function next.

The `countStatus` function

The countStatus function is defined at line 36 of src/src/StatusParse.hs:

countStatus :: [MiniStatus] -> Status Int
countStatus l = MakeStatus
  {
  staged=countByType isStaged l,
  conflict=countByType isConflict l,
  changed=countByType isChanged l,
  untracked=countByType isUntracked l
  }

It returns a Status Int. The Status type constructor is defined at line 7 of the same file. But we include the comment at line 6 as well:

{- Full status information -}
data Status a = MakeStatus {
  staged :: a,
  conflict :: a,
  changed :: a,
  untracked :: a} deriving (Eq, Show)

With Status Int, all the fields in MakeStatus will be Int. This seems to be used to count the number of files in the git repo that are not in a “clean” state.

We see that the countStatus function uses the countByType function to compute each of the fields in MakeStatus. The countByType function is defined at line 33 of the same file:

countByType :: (MiniStatus -> Bool) -> [MiniStatus] -> Int
countByType isType = length . filter isType

countByType counts the number of lines in the [MiniStatus] computed by for statusList extractMiniStatus that fulfil the isType predicate. Based on the usage of countByType that we see in the MakeStatus data constructor, the isStaged, isConflict, isChanged and isUntracked predicates are used as the isType argument to countByType. Let’s take a look at isStaged, defined at line 21 of src/src/StatusParse.hs:

isStaged :: MiniStatus -> Bool
isStaged (MkMiniStatus index work) =
    (index `elem` "MRC") || (index == 'D' && work /= 'D') || (index == 'A' && work /= 'A')

There are 3 distinct cases where isStaged returns True:

First character of a status line is one of M, R, C
First character of a status line is D and the second character is not D
First character of a status line is A and the second character is not A

The code is simple enough, but what exactly do these characters stand for? To find out, we consult the documentation for the short-format of git status:

In the short-format, the status of each path is shown as

    XY PATH1 -> PATH2

where PATH1 is the path in the HEAD, and the " -> PATH 2" part is shown only
when PATH1 corresponds to a different path in the index/worktree (i.e. the file
is renamed). The XY is a two-letter status code.

For paths with merge conflicts, X and Y show the modification states of each
side of the merge. For paths that do not have merge conflicts, X shows the
status of the index, and Y shows the status of the work tree. For untracked
paths, XY are ??. Other status codes can be interpreted as follows:

- '' = unmodified
- M = modified
- A = added
- D = deleted
- R = renamed
- C = copied
- U = updated but unmerged

Ignored files are not listed, unless --ignored option is in effect, in which
case XY are !!.

X          Y     Meaning
-------------------------------------------------
          [MD]   not updated
M        [ MD]   updated in index
A        [ MD]   added to index
D         [ M]   deleted from index
R        [ MD]   renamed in index
C        [ MD]   copied in index
[MARC]           index and work tree matches
[ MARC]     M    work tree changed since index
[ MARC]     D    deleted in work tree
-------------------------------------------------
D           D    unmerged, both deleted
A           U    unmerged, added by us
U           D    unmerged, deleted by them
U           A    unmerged, added by them
D           U    unmerged, deleted by us
A           A    unmerged, both added
U           U    unmerged, both modified
-------------------------------------------------
?           ?    untracked
!           !    ignored
-------------------------------------------------

The table on the codes for X and Y are very useful to us and allows us to show some of the cases covered by the isStaged function.

index `elem` "MRC" covers these cases:

M        [ MD]   updated in index
R        [ MD]   renamed in index
C        [ MD]   copied in index
[MARC]           index and work tree matches

(index == 'D' && work /= 'D') covers these cases:

D         [ M]   deleted from index
D           U    unmerged, deleted by us

while (index == 'A' && work /= 'A') covers these cases:

A        [ MD]   added to index
[MARC]           index and work tree matches
[ MARC]     M    work tree changed since index
[ MARC]     D    deleted in work tree
A           U    unmerged, added by us

But based on first principles, index `elem` "MRC" covers the case where the file in the index has been modified, renamed, or copied, relative to HEAD. Starting from a clean repository, M can be achieved by making a change to a file tracked by git and then using git add on that file. R can be achieved by using git mv. I have no idea how we can get a C but I am guessing it might have something to do with one of git rebase, git merge, git am and similar.

One way to satisfy (index == 'D' && work /= 'D') is to use git rm on a tracked file. To be precise, that shows a "D " for the first character and a space for the second character. If the table is exhaustive, it seems that we are ok with every entry that has a D in the first character, except for this one case:

D           D    unmerged, both deleted

which seems that it will only arise during a git merge when there’s a merge conflict in another file that’s awaiting the user to resolve manually or a similar situation involving some merge conflict - this is just a guess and I am not certain if I am correct.

One way to satisfy (index == 'A' && work /= 'A') is to git add a previously untracked file. That gives us a "A " to be precise. It seems that we are trying to avoid this case:

A           A    unmerged, both added

which once again seems that it will only arise during a merge conflict pending human resolution.

Whether these cases covered by the isStaged function are exhaustive, they all indicate that the file has changed in the index, relative to HEAD, except for in the case of merge conflicts.

We shall do a quick walkthrough of isConflict, isChanged and isUntracked.

isConflict :: MiniStatus -> Bool
isConflict (MkMiniStatus index work) =
    index == 'U' || work == 'U' || (index == 'A' && work == 'A') || (index == 'D' && work == 'D')

As its name suggests, isConflict covers the case where a file has a merge conflict.

isChanged :: MiniStatus -> Bool
isChanged (MkMiniStatus index work) =
    work == 'M' || (work == 'D' && index /= 'D')

isChanged takes care of files which are modified in the work tree relative to HEAD (work == 'M') and files deleted from the work tree but not deleted in the index (can be gotten by using rm to remove a tracked file).

isUntracked :: MiniStatus -> Bool
isUntracked (MkMiniStatus index _) =
    index == '?'

and finally, isUntracked takes care of files which are not tracked by git.

Returning to processStatus and countStatus:

processStatus :: [String] -> Maybe (Status Int)
processStatus statLines =
  do -- Maybe
    statList <- for statLines extractMiniStatus
    return (countStatus statList)

countStatus :: [MiniStatus] -> Status Int
countStatus l = MakeStatus
  {
  staged=countByType isStaged l,
  conflict=countByType isConflict l,
  changed=countByType isChanged l,
  untracked=countByType isUntracked l
  }

we see that for statLines extractMiniStatus computes a list of MkMiniStatus from the output of git status --porcelain --branch. Then, countStatus is used to create a Status with 4 fields that counts the number of files which are modified in the index relative to the work tree (staged), in a merge conflict, modified in the work tree relative to HEAD (changed) and untracked. This Status is then wrapped in a Just and returned by processStatus.

In the event that some line in the output of git status --porcelain --branch has less than 2 characters, for statLines extractMiniStatus results in a Nothing and it is returned by processStatus, without running return (countStats statList), because we are inside the Maybe monad.

That finishes our coverage of processStatus.

Going back to `processGitStatus`

processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
    do -- Maybe
      mbranch <- processBranch branchLine
      status <- processStatus statusLines
      return (MkGitInfo mbranch status)

In the final line, MkGitInfo mbranch status constructs a GitInfo (defined at line 11 of src/src/Utils.hs):

data GitInfo = MkGitInfo MBranchInfo (Status Int)

which wraps over the MBranchInfo from processBranch branchLine and the Status Int from processStatus statusLines. Assuming everything went smoothly and both processBranch and processStatus returned Justs, the GitInfo itself will be wrapped inside Just. Otherwise, processGitStatus returns a Nothing.

The GitInfo value captures all the information obtained from the output of git status --porcelain --branch.

Going back to `stringsFromStatus`

stringsFromStatus :: Maybe Hash
                  -> String -- status
                  -> Maybe [String]
stringsFromStatus h status = do -- List
    processed <- processGitStatus (lines status)
    return (showGitInfo h processed)

stringsFromStatus lives inside the Maybe monad. processGitStatus returns either a Just GitStatus or a Nothing. If it is a Nothing, everything else is skipped and stringsFromStatus returns a Nothing. If it is a Just GitStatus, the GitStatus is bound to processed. That, along with h, is passed to showGitInfo, defined at line 57 of src/src/Utils.hs:

showGitInfo :: Maybe Hash
      -> GitInfo
      -> [String]
showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat
  where
    branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)

This pattern matches the GitInfo argument using its only MkGitInfo constructor and binds its 2 components to the names bi and stat.

Because the return type of ShowGitInfo is [String] and a ++ is used to concatenate branchInfoString and showStatusNumbers stat, this means that branchInfoString is a [String].

Let’s look at the definition of branchOrHashWith, along with its comment at line 50:

{- Combine status info, branch info and hash -}

branchOrHashWith :: Char -> Maybe Hash -> Maybe BranchInfo -> BranchInfo
branchOrHashWith _ _ (Just bi) = bi
branchOrHashWith c (Just hash) Nothing = MkBranchInfo (MkBranch (c : getHash hash)) Nothing
branchOrHashWith _ Nothing _ = MkBranchInfo (MkBranch "") Nothing

The first pattern match ignores the first 2 arguments and tries to pattern match against the MBranchInfo inside the GitInfo. Recall that this is the result of the processBranch function and captures all the important information about the current git branch. Also recall that MBranchInfo is a type synonym for Maybe BranchInfo. If this is a Just, then branchOrHashWith simply returns the BranchInfo value that’s wrapped inside the Just.

The second pattern match covers the case where the return value from processBranch is a Nothing. This happens when parsing the branch line fails and we have no information on the current git branch. The second argument passed to branchOrHashWith is originally from the main function:

main = do -- IO
  status <- getContents
    mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
    -- omitted

gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
    mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
    let rev = do -- Maybe
      result <- mresult
      return (MkHash (init result))
    return rev

to be exact, it is the result of unsafeInterleaveIO gitrevparse, which is a deferred run of git rev-parse --short HEAD. This command shows the git commit SHA1 of the top most commit on the current git branch. We covered this early on and noted that the result of unsafeInterleaveIO gitrevparse will be a Just Hash if the git rev-parse --short HEAD runs successfully and a Nothing otherwise. So we finally see the purpose of this deferred computation: it allows us to obtain a git commit SHA1 as a fallback in the event that we cannot obtain any information about the git branch. The unsafeInterleaveIO will prevent it from running until it is actually needed.

Returning to the second pattern match of branchOrHashWith:

branchOrHashWith c (Just hash) Nothing = MkBranchInfo (MkBranch (c : getHash hash)) Nothing

The (Just hash) will only pattern match on a successful executation of git rev-parse --short HEAD. The c here is a colon character. The getHash function is defined at line 9 of src/src/Utils.hs:

newtype Hash = MkHash {getHash :: String}

getHash hash extracts the String that is wrapped by the MkHash newtype constructor, which is used by the gitrevparse function to wrap around the git commit SHA1 (except for the newline character).

Overall, this second pattern match of branchOrHashWith returns a BranchInfo value whose Branch component is the git commit SHA1 prepended with a colon character, and whose Maybe Remote component is a Nothing.

The third and final pattern match of branchOrHashWith:

branchOrHashWith _ Nothing _ = MkBranchInfo (MkBranch "") Nothing

covers the case where both parsing the branch line failed and the command git rev-parse --short HEAD failed. In this case, a BranchInfo object is created with the Branch component being a MkBranch "" and whose Maybe Remote component is a Nothing.

Going back to showGitInfo, we see that the BranchInfo returned by branchOrHasWith is passed to showBranchInfo.

    branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)

which is defined at line 47 of src/src/Utils.hs:

showBranchInfo :: BranchInfo -> [String]
showBranchInfo (MkBranchInfo branch mremote) = show branch : showRemoteNumbers mremote

This first runs show branch to convert the Branch value within MkBranchInfo into a String. The Show instance of Branch is defined at line 40 of src/src/BranchParse.hs:

instance Show Branch where
    show (MkBranch b) = b

Because Branch is just a newtype wrapper over String, this is essentially just returns the String that is being wrapped. The value of this String can be the current git branch name or if parsing the branch line fails, the current git commit SHA1 prepended by a colon, or if that fails, it will be the empty string.

This String is prepended to the [String] created by showRemoteNumbers mremote. The showRemoteNumbers function is defined at line 35 of src/src/Utils.hs:

showRemoteNumbers :: Maybe Remote -> [String]
showRemoteNumbers mremote =
    do -- List
      ab <- [ahead, behind]
      return (show ab)
  where
    (ahead, behind) = fromMaybe (0,0) distance  -- the script needs some value, (0,0) means no display
    distance = do -- Maybe
      remote <- mremote
      dist <- getDistance remote
      return (pairFromDistance dist)

And it makes use of the list monad. The idea is simple. ahead and behind will each be bound to ab (one at a time) and then show ab converts it to a String, which will be in the resulting [String]. Hence the return value of showRemoteNumbers will always be a list of 2 strings.

ahead and behind are defined in the where clause by fromMaybe (0,0) distance. The fromMaybe function is part of the Data.Maybe module. We covered it earlier but to refresh our memory, here is its documentation:

fromMaybe :: a -> Maybe a -> a
-- The fromMaybe function takes a default value and a Maybe value. If the Maybe
-- is a Nothing, it returns the default values; otherwise, it returns the value
-- contained in the Maybe.

If distance is a Just _, then we will be taking (ahead, behind) from inside it. Otherwise, ahead and behind will both be 0. distance is defined as follows:

    distance = do -- Maybe
      remote <- mremote
      dist <- getDistance remote
      return (pairFromDistance dist)

It lives in the Maybe monad. The mremote is the Maybe Remote part of the bigger BranchInfo value passed to showBranchInfo. If it is a Nothing, all bets are off and fromMaybe (0,0) distance will return (0,0). This applies for the case where there is no information on the number of commits the current branch is ahead and/or behind its remote tracking branch, or perhaps the current branch does not have a remote tracking branch.

If there is a Remote value, it is bound to the name remote and passed to the getDistance function, defined at line 58 of src/src/BranchParse.hs:

getDistance :: Remote -> Maybe Distance
getDistance (MkRemote _ md) = md

Here is the definition of the Remote data type:

data Remote = MkRemote Branch (Maybe Distance) deriving (Eq, Show)

so getDistance is essentially extracting the Maybe Distance part. This will only be a Just if parsing the branch line was successful and it is one of the following variants:

## master...origin/feat [ahead 7]
## bourbon...origin/rice-noodles [ahead 10, behind 4]
## fix-a-pesky-bug...workplace/nice-feature-work [behind 2]

which will be parsed by the branchParser' parser using the branchParser parser which goes down the route of the branchRemoteTracking parser, all of which we covered earlier.

The Distance type is defined at line 21 of src/src/BranchParse.hs:

data Distance = Ahead Int | Behind Int | AheadBehind Int Int deriving (Eq)

If getDistance extracts a Just Distance value, the Distance value is bound to the name dist, which is then passed to the pairFromDistance function, defined at line 153 of src/src/BranchParse.hs:

pairFromDistance :: Distance -> (Int, Int)
pairFromDistance (Ahead n) = (n,0)
pairFromDistance (Behind n) = (0,n)
pairFromDistance (AheadBehind m n) = (m,n)

which covers all the different data constructors of Distance. It returns a 2 tuple, with each element being the number of commits the current branch is ahead or behind of its remote tracking branch, respectively.

showRemoteNumbers :: Maybe Remote -> [String]
showRemoteNumbers mremote =
    do -- List
      ab <- [ahead, behind]
      return (show ab)
  where
    (ahead, behind) = fromMaybe (0,0) distance  -- the script needs some value, (0,0) means no display
    distance = do -- Maybe
      remote <- mremote
      dist <- getDistance remote
      return (pairFromDistance dist)

With our newfound knowledge, what showRemoteNumbers does is pretty obvious. It returns a list of 2 strings indicating how many commits the current branch is ahead or behind its remote tracking branch respectively, if applicable. Otherwise, both elements will be "0".

Backtracking to showBranchInfo:

showBranchInfo (MkBranchInfo branch mremote) = show branch : showRemoteNumbers mremote

We see that it returns a list of 3 strings:

the current git branch / SHA1 / the empty string
the number of commits the current git branch is ahead of its remote tracking branch
the number of commits the current git branch is behind its remote tracking branch

And backtracking to showGitInfo:

showGitInfo :: Maybe Hash
      -> GitInfo
      -> [String]
showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat
  where
    branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)

After having generated the list of 3 strings in branchInfoString, we concatenate it with the result of showStatusNumbers stat, defined at line 29 of src/src/Utils.hs:

showStatusNumbers :: Status Int -> [String]
showStatusNumbers (MakeStatus s x c t) =
    do -- List
      nb <- [s, x, c, t]
      return (show nb)

Looking at the definition of the Status data type:

{- Full status information -}
data Status a = MakeStatus {
  staged :: a,
  conflict :: a,
  changed :: a,
  untracked :: a} deriving (Eq, Show)

we see that showStatusNumbers extracts the number of staged, conflicted, changed and untracked files, converts each of them to String, then packs them into a list.

showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat

and showGitInfo combines all the information into one list of 7 elements, which are String versions of the following:

the branch name / git commit sha1 / the empty string
the number of commits the current git branch is ahead of its remote tracking branch
the number of commits the current git branch is behind its remote tracking branch
the number of files that are modified in the index relative to HEAD
the number of files that are in a merge conflict
the number of files that are modified in the work tree relative to HEAD
the number of untracked files

Backtracking to stringsFromStatus:

stringsFromStatus :: Maybe Hash
                  -> String -- status
                  -> Maybe [String]
stringsFromStatus h status = do -- List
    processed <- processGitStatus (lines status)
    return (showGitInfo h processed)

If processGitStatus returns a Just GitInfo, the GitInfo is bound to the name processed, then showGitInfo h processed is executed and the list it returns is wrapped inside a Just and returned by stringsFromStatus. If processGitStatus returns a Nothing, then stringsFromStatus returns a Nothing.

Backtracking to the `main` function

main :: IO ()
main = do -- IO
  status <- getContents
  mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
  let result = do -- Maybe
    strings <- stringsFromStatus mhash status
    return (unwords strings)
  putStr (fromMaybe "" result)

If stringsFromStatus returns a Just [String], the [String] is bound to strings. The unwords function then joins the Strings in the list together into one big String, with each String in the list separated by a space character. This Maybe String is then bound to the result let binding. If result is a Just String, then putStr (fromMaybe "" result) will print the String to standard output; otherwise it will print the empty string to standard output.

And… we are done with our main function.

The rest of the post covers how the output of this Haskell program is used to generate a prompt containing information about the git repo.

Generating the prompt

Very early on, we briefly covered the update_current_git_vars function defined in line 43 of zshrc.sh:

function update_current_git_vars() {
    unset __CURRENT_GIT_STATUS

    if [[ "$GIT_PROMPT_EXECUTABLE" == "python" ]]; then
        local gitstatus="$__GIT_PROMPT_DIR/gitstatus.py"
        _GIT_STATUS=`python ${gitstatus} 2>/dev/null`
    fi
    if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
        _GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
    fi
     __CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")
  GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
  GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
  GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
  GIT_STAGED=$__CURRENT_GIT_STATUS[4]
  GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
  GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
  GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]
}

This is the line that runs the Haskell program to process the output of git status --porcelain --branch:

        _GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`

and the output of the Haskell program is stored in the _GIT_STATUS variable. The line

__CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")

splits the _GIT_STATUS variable using space as the delimiter and stores the result as an array in the __CURRENT_GIT_STATUS variable. Right after that

  GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
  GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
  GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
  GIT_STAGED=$__CURRENT_GIT_STATUS[4]
  GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
  GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
  GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]

we see that the author makes the assumption that there are 7 elements in the __CURRENT_GIT_STATUS array and assigns each element to a variable. These are the same 7 elements in the list created by the showGitInfo Haskell function.

We go back to the git_super_status function, defined at line 64 of zshrc.sh:

git_super_status() {
  precmd_update_git_vars
  if [ -n "$__CURRENT_GIT_STATUS" ]; then
    STATUS="$ZSH_THEME_GIT_PROMPT_PREFIX$ZSH_THEME_GIT_PROMPT_BRANCH$GIT_BRANCH%{${reset_color}%}"
    # omitted
}

In the if statement, __CURRENT_GIT_STATUS variable is checked for non emptiness. If so, STATUS is assigned a value which begins with $ZSH_THEME_GIT_PROMPT_PREFIX, defined at line 96 of zshrc.sh:

ZSH_THEME_GIT_PROMPT_PREFIX="("

followed by $ZSH_THEME_GIT_PROMPT_BRANCH, defined at line 99 of the same file:

ZSH_THEME_GIT_PROMPT_BRANCH="%{$fg_bold[magenta]%}"

This changes the foreground color (text color) to magenta.

This is followed by $GIT_BRANCH, which gives us the branch name produced by the Haskell program. Then we have a %{${reset_color}%} which resets the foreground color.

If the current directory is in a git repo and the branch is named my-branch, the STATUS variable will have the following value:

(my-branch

Next up, we have the following code inside the overall if branch in git_super_status:

    if [ "$GIT_AHEAD" -ne "0" ]; then
      STATUS="$STATUS$ZSH_THEME_GIT_PROMPT_AHEAD$GIT_AHEAD%{${reset_color}%}"
    fi

This appends extra stuff to STATUS, but only if GIT_AHEAD is a non-zero value. It starting with ZSH_THEME_GIT_PROMPT_AHEAD, defined at line 104 of zshrc.sh:

ZSH_THEME_GIT_PROMPT_AHEAD="%{UpArrow%G%}"

There is an up arrow character ↑ which I have replaced with the text UpArrow because of some technical issues that prevents it from rendering in a code block.

This is then followed by GIT_AHEAD, which is the number of git commits the current branch is ahead of its remote tracking branch (if any). Then we have another %{${reset_color}%}.

The %{UpArrow%G%} is used to include a ‘glitch’ to output the ↑ character. According to zsh documentation:

%G

Within a %{…%} sequence, include a ‘glitch’: that is, assume that a single character width will be output. This is useful when outputting characters that otherwise cannot be correctly handled by the shell, such as the alternate character set on some terminals. The characters in question can be included within a %{…%} sequence together with the appropriate number of %G sequences to indicate the correct width. An integer between the ‘%’ and ‘G’ indicates a character width other than one. Hence %{seq%2G%} outputs seq and assumes it takes up the width of two standard characters.

Multiple uses of %G accumulate in the obvious fashion; the position of the %G is unimportant. Negative integers are not handled.

Note that when prompt truncation is in use it is advisable to divide up output into single characters within each %{…%} group so that the correct truncation point can be found.

Building on our hypothetical example, if my-branch is 5 commits ahead of its remote tracking branch, the GIT_AHEAD variable will have value 5 and the STATUS variable will have the value (my-branch↑5. However, if my-branch is not ahead of its remote tracking branch, then GIT_AHEAD will be zero and STATUS will still be (my-branch.

The next line in git_super_status:

    STATUS="$STATUS$ZSH_THEME_GIT_PROMPT_SEPARATOR"

appends ZSH_THEME_GIT_PROMPT_SEPARATOR, which is defined at line 98:

ZSH_THEME_GIT_PROMPT_SEPARATOR="|"

so it is a pipe character. This separates the (git branch, number of commits ahead and number of commits behind) from the rest of the information.

The rest of the code in git_super_status is of a similar nature and we shall not go through them here. We make an exception for line 91, where echo "$STATUS" prints the prompt that is built. For zsh-git-prompt to display iinformation about a git repo, code which calls the git_super_status function has to be at the user’s ~/.zshrc (or included by it). Example code from the README:

source path/to/zshrc.sh
# an example prompt
PROMPT='%B%m%~%b$(git_super_status) %# '

The prompt from the STATUS variable printed by the git_super_status function will be part of the PROMPT variable, which presumably forms the actual prompt that the user sees. Thus when the user is in a directory which is a git repository, information about that repository will be shown.

Note that in git_super_status, if __CURRENT_GIT_STATUS is empty, which can happen from either a failure to parse the branch line or a failure to parse any of the status lines from the output of git status --porcelain --branch, then git_super_status will not print anything and hence in

PROMPT='%B%m%~%b$(git_super_status) %# '

the $(git_super_status) part will interpolate to nothing. A “conventional” prompt will be shown.

With that, our deep dive into zsh-git-prompt has come to an end.

Conclusion / Ramblings

We have not covered all the important code in the zsh-git-prompt repo, only the code that is actually run during normal usage. There are some test code in the src/test directory that the reader might want to take a look at, along with supporting code that is littered throughout the main code but used in tests as well. For instance, line 28 of src/src/BranchParse.hs. This code offers some insight on how one can use the venerable QuickCheck library for testing Haskell code. I could go through that in a follow up post, or maybe not, because it has taken me about a week of my free time to write this post and I need to get back to other stuff I was working on.

This is a pretty intense post (hence I called it a deep dive) and sometimes even I was lost in the details (but I managed to find my way back). The parts where I pasted previously discussed code was more for myself to refresh my memory than for you the reader. If you have made it all the way here and understood most of the content, then you deserve a pat on the back and my mission was successful.

References

https://github.com/olivierverdier/zsh-git-prompt (zsh-git-prompt GitHub repo)
https://unix.stackexchange.com/a/129184 (Unix & Linux Stack Exchange: Redirect output of a command to two different files)
https://unix.stackexchange.com/a/345508 (Unix & Linux Stack Exchange: how to redirect output to multiple log files)
http://zsh.sourceforge.net/Doc/Release/Redirection.html (The Z Shell Manual chapter 7: Redirection)
https://stackoverflow.com/q/13263692 (Stack Overflow: When is unsafeInterleaveIO unsafe?)
http://hackage.haskell.org/package/base-4.10.1.0/docs/System-IO-Unsafe.html#v:unsafeInterleaveIO (unsafeInterleaveIO documentation)
http://hackage.haskell.org/package/process-1.6.2.0/docs/System-Process.html#v:readProcessWithExitCode (readProcessWithExitCode documentation)
http://hackage.haskell.org/package/process-1.6.2.0/docs/System-Process.html#v:readProcess (readProcess documentation)
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:unwords (unwords documentation)
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Maybe.html#v:fromMaybe (fromMaybe documentation)
http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:lines (lines documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:parse (parse documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:-60–124–62- ((<|>) documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:try (try documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:manyTill (manyTill documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec-Char.html#v:anyChar (anyChar documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:eof (eof documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec-Char.html#v:noneOf (noneOf documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:between (between documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec.html#v:many1 (many1 documentation)
https://hackage.haskell.org/package/parsec-3.1.11/docs/Text-Parsec-Char.html#v:digit (digit documentation)
http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:either (either documentation)
http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:const (const documentation)
http://hackage.haskell.org/package/base-4.10.1.0/docs/Data-Traversable.html#v:for (for documentation)
http://hackage.haskell.org/package/base-4.10.1.0/docs/Data-Traversable.html#v:traverse (traverse documentation)
https://hackage.haskell.org/package/base-4.10.1.0/docs/src/Data.Traversable.html#line-235 (definition of traverse for lists)
https://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.Base.html#line-716 (Applicative instance for Maybe)
https://git-scm.com/docs/git-status/2.15.0 (git status manpage, for git 2.15.0)
http://zsh.sourceforge.net/Doc/Release/Prompt-Expansion.html (The Z Shell Manual, Chapter 13: Prompt Expansion)

Disclaimer: Opinions expressed on this blog are solely my own and do not express the views or opinions of my employer(s), past or present.

Haskell deep dive: zsh-git-prompt

Prerequisite knowledge

Target Audience

Software required

Version we are covering

Finding main

The main function

The stringsFromStatus function

The branchInfo function

Parsing the branch line

The noBranch parser

The newRepo parser

The branchRemoteTracking parser

The branchRemote parser

The branchOnly parser

Summary of branchParser'

Going back to the caller of branchParser'

The processStatus function

The countStatus function

Going back to processGitStatus

Going back to stringsFromStatus

Backtracking to the main function