09
Dec 2017
On Computer Technology
In How To Become A Hacker, Eric S. Raymond gives the following golden advice:
Learning to program is like learning to write good natural language. The best way to do it is to read some stuff written by masters of the form, write some things yourself, read a lot more, write a little more, read a lot more, write some more … and repeat until your writing begins to develop the kind of strength and economy you see in your models.
This year, when I decided to give another shot at learning Haskell again, I realized that I needed not just tutorials to study, but also actual code. The reason is, as much as tutorials help to illustrate concepts, it is in actual code that one learns how to compose things together and see some tricks that are not covered in tutorials. About 6 years ago, I was an active user of Arch Linux and wanted to contribute to their package manager, Pacman. Pacman was written in C, which was a language I was using rather heavily at that time. I thought I knew C, but it was a rather eye opening experience to study the Pacman source code and see some real world C code from a program that I used on a day to day basis. Heck, I even contributed slightly to pacman-key probably as a result of that.
Ok, enough with the stuff that doesn’t concern anyone else.
After some serious searching, I found zsh-git-prompt. It is the probably the first serious Haskell program I’ve studied and understood. What makes this codebase so good for a beginner are:
.hs
files in the src
dir based on the output of a find commandAs I was writing this post, I realized that there are a number of things that the reader must know to truly understand the code (even with my guidance) and that for me to explain those concepts in detail will make an already long post even longer.
This knowledge is often summarized by the phrase “the first N chapters of LYAH”, where N
is usually 7 and LYAH is the Learn You a Haskell book. I would say that the prereqs for understanding this post is pretty much the first 12 chapters of LYAH. Specifically, the following:
>>=
and what it does in do
notationNon Haskell related knowledge:
Haskell beginners who have some / all of the prereq knowledge above. You should also be willing to google to find out more information about concepts I didn’t explain too well / skipped over.
If you have read LYAH or similar but you are finding it very hard to use your newfound knowledge to write a real world application, I believe that you will find this post helpful.
It is also highly recommended that you install zsh and zsh-git-prompt; you will doubly appreciate this post and what the zsh-git-prompt does. If you are a zsh user but just lack zsh-git-prompt, check out our blog post on how to install zsh-git-prompt.
Alternatively, if you do not wish to go through the hassle of installing zsh and zsh-git-prompt on your system, you can head over to https://github.com/yanhan/zsh-git-prompt-docker to pull / build our Docker image; simply follow the instructions in the README of that repo.
We will be going through tag v0.5
of zsh-git-prompt. At the time of writing, it happens to be the HEAD of master branch. You can also go to https://github.com/olivierverdier/zsh-git-prompt/tree/v0.5 and browse the files there.
Throughout this post, we will be referencing zsh-git-prompt source code on its GitHub repo that fall under the v0.5
tag.
Looking at stack.yaml, we see:
packages:
- 'src'
which tells us that we should look at the src
directory. Listing that directory shows us there is a .cabal
file in git-prompt.cabal. In the executable
section, we see the following:
executable gitstatus
hs-source-dirs: app
main-is: Main.hs
ghc-options: -threaded -rtsopts -with-rtsopts=-N
build-depends: base, git-prompt, parsec >=3.1, process>=1.1.0.2, QuickCheck
default-language: Haskell2010
ghc-options: -Wall -O2 -fno-warn-tabs -fno-warn-unused-do-bind
cc-options: -O3
So the main function sits at app/Main.hs (within the top level src
dir). As an aside, there are very few dependencies on third party libraries.
I have to admit that this is a rather roundabout way to find the main function. In practice, it is much easier to do a git grep -n main
. But this process teaches us some stuff about Stack and Cabal.
main :: IO ()
main = do -- IO
status <- getContents
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
let result = do -- Maybe
strings <- stringsFromStatus mhash status
return (unwords strings)
putStr (fromMaybe "" result)
Ok. This is short but not very straightforward at first glance. There are some functions that we may not be familiar with, so we turn to Hoogle.
getContents :: IO String
-- The getContents operation returns all user input as a single string, which
-- is read lazily as it is needed (same as hGetContents stdin).
unsafeInterleaveIO :: IO a -> IO a
-- unsafeInterleaveIO allows an IO computation to be deferred lazily. When
-- passed a value of type IO a, the IO will only be performed when the value of
-- the a is demanded. This is used to implement lazy file reading, see
-- hGetContents.
unwords :: [String] -> String
-- unwords is an inverse operation to words. It joins words with separating
-- spaces.
Ok. The first question is, what is with the status <- getContents
? It is not like we are supplying any input via stdin to zsh-git-prompt; we simply see the zsh-git-prompt displayed on our terminal when we are in a git repo without having us to do anything. So this input must be coming from somewhere else.
Indeed, if we look at the Install section of the README, we see the following:
Source the file zshrc.sh from your ~/.zshrc config file, and configure your prompt. So, somewhere in ~/.zshrc, you should have:
source path/to/zshrc.sh
# an example prompt
PROMPT='%B%m%~%b$(git_super_status) %# '
The magic lies with the git_super_status
zsh function and the zshrc.sh script. We open that file and find the git_super_status function. This is where the prompt gets constructed. Most notably, it starts with:
git_super_status() {
precmd_update_git_vars
Here’s the definition of the precmd_update_git_vars function:
function precmd_update_git_vars() {
if [ -n "$__EXECUTED_GIT_COMMAND" ] || [ ! -n "$ZSH_THEME_GIT_PROMPT_CACHE" ]; then
update_current_git_vars
unset __EXECUTED_GIT_COMMAND
fi
}
which points to the update_current_git_vars function as the likely workhorse:
function update_current_git_vars() {
unset __CURRENT_GIT_STATUS
if [[ "$GIT_PROMPT_EXECUTABLE" == "python" ]]; then
local gitstatus="$__GIT_PROMPT_DIR/gitstatus.py"
_GIT_STATUS=`python ${gitstatus} 2>/dev/null`
fi
if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
_GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
fi
__CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")
GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
GIT_STAGED=$__CURRENT_GIT_STATUS[4]
GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]
}
What should catch our attention is the following 3 lines:
if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
_GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
fi
Suppose GIT_PROMPT_EXECUTABLE
has the value haskell
. Then git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus
is executed. Despite some experience in Bash, the &>
tripped me up because I didn’t use it. So I did some googling and I found out that in Bash, the &>
redirects both standard output and standard error to the same location, which in this case, is /dev/null
.
That doesn’t make sense. If both standard output and standard error are redirected to /dev/null
, wouldn’t the $__GIT_PROMPT_DIR/src/.bin/gitstatus
program not get any input? Or, does that program not require any standard input and it will just work? To verify, I ran the following commands in a git repo:
git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus
versus
$__GIT_PROMPT_DIR/src/.bin/git status </dev/null
The first showed me:
master 95 0 0 0 1 1
and the second did not output anything. So clearly, it was receiving standard input from the git status --porcelain --branch
command!
At this point, I was wondering, what the hell was going on? If all output from the git status --porcelain --branch
command was redirected to /dev/null
, shouldn’t it effectively be doing the same thing as supplying no standard input to the next program?
I tried a few other things but this one kind of blew my mind:
git status --porcelain --branch &>/dev/null >a >o
Both a
and o
contained the output of the command! Seems like there is multiple output redirection going on. Something I didn’t know was possible.
A google search for “stdout redirect to multiple linux” turned out the usual answers (most commonly using tee
), but also this answer on Unix & Linux Stack Exchange:
With zsh:
ls > file1 > file2
(internally, zsh creates a pipe and spawns a process that reads from that pipe and writes to the two files as tee does. ls stdout is the other end of the pipe).
and also the following answer:
As @jofel mentioned in a comment under the answer, this can be done natively in zsh:
echo foobar >file1 >file2 >file3
or, with brace expansion:
echo foobar >file{1..3}
Internally this works very similarly to the tee answers provided above. The shell connects the command’s stdout to a process that pipes to multiple files; therefore, there isn’t any compelling technical advantage to doing it this way (but it does look real good). See the zsh manual for more.
And it links to the Redirection chapter of the zsh manual. Turns out zsh has a feature known as Multios that allows multiple output redirection. That section opens with:
If the user tries to open a file descriptor for writing more than once, the shell opens the file descriptor as a pipe to a process that copies its input to all the specified outputs, similar to tee, provided the MULTIOS option is set, as it is by default. Thus:
date >foo >bar
writes the date to two files, named ‘foo’ and ‘bar’. Note that a pipe is an implicit redirection; thus
date >foo | cat
writes the date to the file ‘foo’, and also pipes it to cat.
So we totally misunderstood the context. Our premise of reasoning about the behavior of the command in Bash is totally wrong because we are not using Bash but zsh!
Therefore
git status --porcelain --branch &>/dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus
does indeed redirect the standard output of git status --porcelain --branch
to the $__GIT_PROMPT_DIR/src/.bin/gitstatus
program.
Looking at the lines 30 to 34 of stack.yaml:
# Extra directories used by stack for building
# extra-include-dirs: [/path/to/dir]
# extra-lib-dirs: [/path/to/dir]
local-bin-path: './src/.bin'
and line 23 of src/git-prompt.cabal:
executable gitstatus
We see that stack install
will indeed build a program named gitstatus
and place it in the src/.bin
directory of the repo. So indeed our guess that something else is piping its output as standard input to the main function of the zsh-git-prompt Haskell program is correct. So we explained a grand total of… one truly meaningful line of Haskell code:
main :: IO ()
main = do -- IO
status <- getContents
Nevertheless, we have learnt a lot more about how zsh-git-prompt works overall. Let’s return to our main function:
main :: IO ()
main = do -- IO
status <- getContents
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
let result = do -- Maybe
strings <- stringsFromStatus mhash status
return (unwords strings)
putStr (fromMaybe "" result)
The next line of code is:
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
and from our Hoogle search above:
unsafeInterleaveIO :: IO a -> IO a
-- unsafeInterleaveIO allows an IO computation to be deferred lazily. When
-- passed a value of type IO a, the IO will only be performed when the value of
-- the a is demanded. This is used to implement lazy file reading, see
-- hGetContents.
So unsafeInterleaveIO gitrevparse
will only call the gitrevparse
function when necessary. As for why it is unsafe, please read this Stack Overflow question and its answers. Truth to be told, I do not know enough to explain it and any explanation will make this already long post even longer.
The gitrevparse
function is defined in the src/app/Main.hs file and is as follows:
gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
let rev = do -- Maybe
result <- mresult
return (MkHash (init result))
return rev
Here is the safeRun
function, also in the src/app/Main.hs file:
safeRun :: String -> [String] -> IO (Maybe String)
safeRun command arguments =
do -- IO
output <- readProcessWithExitCode command arguments ""
return (successOrNothing output)
Some relevant documentation for the System.Process.readProcessWithExitCode
function:
readProcessWithExitCode
:: FilePath -- Filename of the executable
-> [String] -- any arguments
-> String -- standard input
-> IO (ExitCode, String, String) -- exitcode, stdout, stderr
-- readProcessWithExitCode is like readProcess but with two differences:
-- * it returns the ExitCode of the process, and does not throw any exception if
-- the code is not ExitSuccess
-- * it reads and returns the output from process' standard error handle, rather
-- than the process inheriting the standard error handle.
Some relevant documentation for the System.Process.readProcess
function:
readProcess
:: FilePath -- Filename of the executable (see RawCommand for details)
-> [String] -- any arguments
-> String -- standard input
-> IO String -- stdout
-- readProcess forks an external process, reads its standard output strictly,
-- blocking until the process terminates, and returns the output string. The
-- external process inherits the standard error.
--
-- If an asynchronous exception is thrown to the thread executing readProcess,
-- the forked process will be terminated and readProcess will wait (block) until
-- the process has been terminated.
--
-- Output is returned strictly, so this is not suitable for interactive
-- applications.
Hence, the following code:
gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
-- some code omitted
safeRun :: String -> [String] -> IO (Maybe String)
safeRun command arguments =
do -- IO
output <- readProcessWithExitCode command arguments ""
return (successOrNothing output)
is equivalent to running git rev-parse --short HEAD
on the command line while supplying the empty string as stdin, waits for it to finish, then send the (ExitCode, stdout, stderr)
3-tuple to the successOrNothing
function, which is also defined in the src/app/Main.hs file:
successOrNothing :: (ExitCode, a, b) -> Maybe a
successOrNothing (exitCode, output, _) =
if exitCode == ExitSuccess then Just output else Nothing
successOrNothing
is pretty straightforward; if our git rev-parse --short HEAD
command exited successfully, then it will return the standard output string wrapped in a Just
. Otherwise, it returns a Nothing
.
Going back to the gitrevparse function:
gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
let rev = do -- Maybe
result <- mresult
return (MkHash (init result))
return rev
we see the use of the Maybe monad. If git rev-parse --short HEAD
ran successfully, then mresult
will be a Just String
. The result <- mresult
will then extract the standard output string, and init result
will return everything except the last character, which in this case is a newline. If you run git rev-parse --short HEAD
in a git repo, its standard output will be a short git commit SHA1 similar to 055f126c
and ending with a newline. This git commit SHA1 is then passed to the MkHash data constructor:
newtype Hash = MkHash {getHash :: String}
which turns out to be a newtype wrapper. The return
then wraps the whole thing in a Just
again.
To summarize what the gitrevparse
function does:
git rev-parse --short HEAD
and if successful, returns a Just (MkHash s)
where s
is a String
wrapped in a Hash
newtype that represents the git commit SHA1 that the HEAD is ongit rev-parse --short HEAD
fails, then a Nothing
is returned.Let us revisit the main function again:
main :: IO ()
main = do -- IO
status <- getContents
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
let result = do -- Maybe
strings <- stringsFromStatus mhash status
return (unwords strings)
putStr (fromMaybe "" result)
Tying all that we know so far, we may or may not need the output of git rev-parse --short HEAD
, hence the use of unsafeInterleaveIO
to defer the computation. This deferred IO (Maybe Hash)
action, along with status
(which contains the output of git status --porcelain --branch &>/dev/null
) are passed to the stringsFromStatus
function, which seems to be doing the bulk of the work. We know this because a Hoogle search shows the following docs for unwords
and fromMaybe
:
unwords :: [String] -> String
-- unwords is an inverse operation to words. It joins words with separating
-- spaces.
fromMaybe :: a -> Maybe a -> a
-- The fromMaybe function takes a default value and a Maybe value. If the Maybe
-- is a Nothing, it returns the default value; otherwise it returns the value
-- contained in the Maybe.
So we pretty much covered the main function. Let’s get to the stringsFromStatus
function next.
stringsFromStatus
functionWe can find the stringsFromStatus
function in src/src/Utils.hs:
stringsFromStatus :: Maybe Hash
-> String -- status
-> Maybe [String]
stringsFromStatus h status = do -- List
processed <- processGitStatus (lines status)
return (showGitInfo h processed)
The comment on line 67 is a mistake; this function lives inside the Maybe
monad, not the List
monad. Anyways. Here is some relevant documentation for lines
:
lines :: String -> [String]
-- lines breaks a string up into a list of strings at newline characters. The
-- resulting strings do not contain newlines.
-- Note that after splitting the string at newline characters, the last part of
-- the string is considered a line even if it doesn't end with a newline.
So lines status
will break the output of git status --porcelain --branch
, which can consist of multiple lines, into a list of String
, with each element in the list being one line in the original string. This list of strings is then passed to processGitStatus
, defined as follows:
processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
do -- Maybe
mbranch <- processBranch branchLine
status <- processStatus statusLines
return (MkGitInfo mbranch status)
As its name suggests, processGitStatus
handles output from git status
. Specifically, git status --porcelain --branch &>/dev/null
.
We will deal with the easy case first, where processGitStatus
pattern matches its first argument against the empty list. In this case, a Nothing
is returned. This case happens when git status --porcelain --branch &>/dev/null
does not print anything to standard output, which occurs when we are not in a git repo. (Verify it!)
The other pattern match will lead us deeper into the code. It is a pattern match against a non-empty list. For this pattern match, we see that the author once again uses the do
notation and we are inside the Maybe
monad. First, the head of the list is bound to branchLine
and passed to the processBranch
function, which also lives inside the Maybe
monad.
To understand the motivation behind this code, we have to know what the git status --porcelain --branch
command is outputting. Here is the documentation for the --porcelain
flag from the git status 2.15.0 manpage:
–porcelain[=<version>]
\ \ Give the output in an easy-to-parse format for scripts. This is similar to the short output, but will remain stable across Git versions and regardless of user configuration. See below for details.
\ \ The version parameter is used to specify the format version. This is optional and defaults to the original version v1 format.
and documentation for the --branch
flag:
-b
--branch
\ \ Show the branch and tracking info even in short-format.
and the final part of the docs explaining the short-format output:
If -b is used the short-format status is preceded by a line
\ \ ## branchname tracking info
Armed with this information, we know that git status --porcelain --branch
:
" ## branchname tracking info"
and is precisely what the processGitStatus
function passes to the processBranch
functionThe processBranch
function is defined in the same file:
processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo
From line 4 of the same file:
import BranchParse (Branch(MkBranch), MBranchInfo, BranchInfo(MkBranchInfo), branchInfo, getDistance, pairFromDistance, Remote)
we see that both the MBranchInfo
type and the branchInfo
function are defined in src/src/BranchParse.hs. That is where we shall go to next.
branchInfo
functionThe branchInfo
function is defined at line 150 of src/src/BranchParse.hs:
branchInfo :: String -> Either ParseError MBranchInfo
branchInfo = parse branchParser' ""
The parse
function is from the Parsec library. I am not the best guy to explain what Parsec does even though I know how to use it, but a simple explanation is, Parsec allows one to write parsers that look and work very much the same way as Context Free Grammars. Since Context Free Languages are a superset of Regular Languages, by extension, one can use Parsec to write Regular Expressions as well (even though they will look like CFGs) - do note that regexes in many languages are not truly regular and I am not certain how much of these non-regular features Parsec provides.
The docs for parse
are slightly… difficult. But we will be needing its type signature, so here goes:
parse :: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a
The simpler way to explain it is, it takes in a Parsec
“object” which is the parser, followed by a String
(actually a type synonym named SourceName
that is equivalent to String
; usually I just use the empty string), followed by a String
/ Text
/ similar (in this case a String
) containing the content we want to parse using the parser given in the first argument. Note that currying is used here because only 2 arguments were given to parse
when it needs 3; that is reflected in the type signature of branchInfo
, because it returns a function that takes in a String
argument.
On success, parse
returns a Right a
. Based on the type signature of branchInfo
, this a
is an MBranchInfo
- that is defined in the same file. On failure, parse
returns a Left ParseError
; a ParseError
is a data type defined in the Parsec library that represents, well, a parse error.
Before we get into branchParser'
, recall how we got here. We were in the second half of processGitStatus:
processGitStatus (branchLine:statusLines) =
do -- Maybe
mbranch <- processBranch branchLine
where we were handed the first line in the output of git status --porcelain --branch
, which is bound to branchLine
.
processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo
And processBranch
calls branchInfo
and hands branchLine
to it. Which branchInfo
will now attempt to parse using the branchParser'
parser.
branchParser'
is defined in src/src/Branchparse.hs:
branchParser' :: Parser MBranchInfo
branchParser' =
do -- Parsec
string "## "
branchParser
Parser
is a type synonym for Parsec String ()
and is defined in Text.Parsec.String
of the Parsec library. MBranchInfo
is a type synonym for Maybe BranchInfo
and is defined on line 63 of src/src/BranchParse.hs. BranchInfo
is a type constructor defined on line 61 of src/src/BranchParse.hs. Hence, Parser MBranchInfo
expands to Parsec String () (Maybe BranchInfo)
. What this means is, if there are no parsing errors, we get a Maybe BranchInfo
. (Not exactly but we’ll get to that later).
The string
function is from the Parsec library. It literally looks for the string supplied to it in the content it is supposed to parse. In this case, it looks for the ##
string (there is a trailing space but it doesn’t show up in the HTML here) in the first line of git status --porcelain --branch
. If that line starts with the given string, we move on to the next parser ,branchParser
. Otherwise, a ParseError
results and parsing stops.
Note that the branchParser'
code uses do
notation (once again) but this time we are in the Parser
or equivalently Parsec String ()
monad.
Here is the definition of branchParser:
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
This consumes the remaining of the line after the ##
(with a trailing space). try
and <|>
are both defined in the Parsec library and they are often used together. The documentation for <|>
is especially good:
(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
-- This combinator implements choice. The parser p <|> q first applies p. If it
-- succeeds, the value of p is returned. If p fails without consuming any input,
-- parser q is tried. This combinator is defined equal to the mplus member of
-- the MonadPlus class and the (<|>) member of Alternative.
The initial part of the documentation for the try
function is pretty good too:
try :: ParsecT s u m a -> ParsecT s u m a
-- The parser try p behaves like parser p, except that it pretends that it
-- hasn't consumed any input when an error occurs.
--
-- This combinator is used whenever arbitrary look ahead is needed. Since it
-- pretends that it hasn't consumed any input when p fails, the (<|>) combinator
-- will try its second alternative even when the first parser failed while
-- consuming input.
Essentially, branchParser
will first attempt to parse the input string using the try noBranch
parser, then if that fails, the try
will ensure that no input is consumed by noBranch
- and because no input is consumed, the <|>
will then move on to the next parser, which is try newRepo
. And if that fails, no input will be consumed, and it moves on to try branchRemoteTracking
, and so on, in the specified order. If there is any successful parse, the parsing halts. People with knowledge of CFGs will appreciate how this code looks.
To understand each of these parsers, we need to play around with some git repositories and observe the output of the git status --porcelain --branch
command. In this process, we will also be learning more about Parsec.
noBranch
parserFirst off the list, the noBranch
parser (defined here):
noBranch :: Parser MBranchInfo
noBranch =
do -- Parsec
manyTill anyChar (try (string " (no branch)"))
eof
return Nothing
The manyTill
, anyChar
and eof
parsers are new to us. They are defined in the Parsec library and pretty much do what they say.
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
-- manyTill p end applies parser p zero or more times until parser end succeeds.
-- Returns the list of values returned by p.
anyChar :: Stream s m Char => ParsecT s u m Char
-- This parser succeeds for any character. Returns the parsed character.
eof :: (Stream s m t, Show t) => ParsecT s u m ()
-- This parser only succeeds at the end of the input. This is not a primitive
-- but it is defined using notFollowedBy.
So manyTill anyChar (try (string " (no branch)"))
will apply the anyChar
parser zero or more times until the try (string " (no branch)")
parser succeeds. On success, it returns a list of all the Char
consumed by anyChar
. We know that the string " (no branch)"
parser expects and consumes the string " (no branch)"
; wrapping it in a try
allows us to avoid a parse error while it is used in conjunction with manyTill anyChar
, as more and more characters are consumed by repeated applications of anyChar
until we finally encounter the string " (no branch)"
. Then the manyTill anyChar (try (string " (no branch)"))
parser succeeds.
The eof
parser then expects us to have reached the end of the input. Or in this case, the end of the first line of the output of git status --porcelain --branch
. If everything goes well, a Parser Nothing
is returned.
To put this in simpler terms, noBranch
is expecting a single line that looks like abcdefgh ijklm nopqrs (no branch)
. Notice how the list of characters accumulated by manyTill anyChar
are discarded.
We can probably guess that noBranch
is meant for parsing a branch line for a git repo that isn’t on a branch. This happens in the detached HEAD state. To see what the line looks like, simply go to any of your git repos with at least 2 commits, make sure you have committed / stashed all your changes, then run the following commands:
git checkout -b HEAD~
git status --porcelain --branch
The first line should look similar to the following:
## HEAD (no branch)
and this will be happily parsed by branchParser'
first with string "## "
followed by branchParser
using the try noBranch
parser, returning a Parser Nothing
. So now we know that if branchParser'
returns a Parser Nothing
, then the git repo is in the detached HEAD state. Nice.
newRepo
parserThe try newRepo
parser will be used by branchParser
if parsing using try noBranch
fails.
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
Its definition is as follows:
newRepo :: Parser MBranchInfo
newRepo =
do -- Parsec
string "Initial commit on "
branchOnly
Based on the string "Initial commit on "
parser alone, we can safely assume that this is for a new git repo. By now, we are quite familiar with what string
does, so let’s look at the branchOnly
parser, defined here:
branchOnly :: Parser MBranchInfo
branchOnly =
do -- Parsec
branch <- many (noneOf " ")
eof
let bi = MkBranchInfo (MkBranch branch) Nothing
return (Just bi)
Documentation for the noneOf
parser combinator:
noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
-- As the dual of oneOf, noneOf cs succeeds if the current is character not in
-- the supplied list of characters cs. Returns the parsed character.
-- Example code:
consonant = noneOf "aeiou"
When used with many
, this will consume as many characters as possible, as long as they are not the space character, and return the list of characters consumed. Notice that this time, the author binds the list of characters consumed by many (noneOf " ")
to branch
. Immediately following that, an eof
is expected. Therefore, branchOnly
expects the input to consist of only non space characters.
I was expecting newRepo
to handle the first line of git status --porcelain --branch
for new git repositories but that was not the case. On git 2.15.0 for a new repo initialized using git init
but with zero commits, I am getting the following output:
## No commits yet on master
That is in the Porcelain v1 output format, which zsh-git-prompt expects. Porcelain v2 is in a different format and is not supported by zsh-git-prompt. I do not see anything on my zsh prompt that indicates this new directory I ran git init
in is a git repo. Since this doesn’t work for a git repo that was just created using git init
and has zero commits, I added the initial commit and ran git status --porcelain --branch
again and… it wasn’t what we are expecting but is instead ## master
. Changing the commit message to Initial commit
and similar does not change anything too.
The only explanation I can come up with is this: perhaps the git status
Porcelain format changed since the last version of zsh-git-prompt? After all, at this time of writing, the most recent commit was on 15 Feb 2016 and for v0.5, which is what we are studying right now.
Regardless, let’s go back to branchOnly
and go through the final 2 lines:
let bi = MkBranchInfo (MkBranch branch) Nothing
return (Just bi)
MkBranch
is a newtype wrapper defined at line 38 of src/src/BranchParse.hs:
newtype Branch = MkBranch String deriving (Eq)
while MkBranchInfo
is a data constructor defined at line 61 of the same file:
data BranchInfo = MkBranchInfo Branch (Maybe Remote) deriving (Eq, Show)
We can see that Branch
just wraps a String
that is a git branch name. BranchInfo
has the one MkBranchInfo
data constructor which takes in 2 arguments: a Branch
and a Maybe Remote
. We shall not cover the Remote
type for now. Essentially, this code:
let bi = MkBranchInfo (MkBranch branch) Nothing
return (Just bi)
Creates a representation for a git branch with a Nothing
for the Maybe Remote
part, then returns a Just BranchInfo
if the parsing succeeds.
Putting everything together:
newRepo :: Parser MBranchInfo
newRepo =
do -- Parsec
string "Initial commit on "
branchOnly
branchOnly :: Parser MBranchInfo
branchOnly =
do -- Parsec
branch <- many (noneOf " ")
eof
let bi = MkBranchInfo (MkBranch branch) Nothing
return (Just bi)
We see that the newRepo
parser expects a string similar to:
Initial commit on some-branch-name
and on a successful parse, returns a Just BranchInfo
which represents a git branch.
branchRemoteTracking
parserIf both try noBranch
and try newRepo
fail, then branchParser
tries the try branchRemoteTracking
parser.
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
The branchRemoteTracking
parser is the most complicated of the bunch, at line 84 of src/src/BranchParse.hs:
branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
do -- Parsec
branch <- trackedBranch
tracking <- many (noneOf " ")
char ' '
behead <- inBrackets
let remote = MkRemote (MkBranch tracking) (Just behead)
let bi = MkBranchInfo branch (Just remote)
return (Just bi)
Definition of trackedBranch
:
trackedBranch :: Parser Branch
trackedBranch =
do -- Parsec
b <- manyTill anyChar (try (string "..."))
return (MkBranch b)
Our experience with Parsec tells us that trackedBranch
will consume as many characters as possible until it hits the string ...
. The list of characters consumed is bound to b
and then wrapped in the MkBranch
newtype wrapper and returned.
Following that (still in branchRemoteTracking
), tracking <- many (noneOf " ")
will consume as many characters as possible until it hits the space character. The list of characters consumed is bound to tracking
. Subsequently, char ' '
expects a single space character and consumes and discards it.
inBrackets
is defined as follows, on line 128:
inBrackets :: Parser Distance
inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)
The Distance
type constructor is defined at line 21, but I will be showing the comments from lines 11 to 19 as well because they pretty much describe what we will be covering next:
{-
The idea is to parse the first line of the git status command.
Such a line may look like:
## master
or
## master...origin/master
or
## master...origin/master [ahead 3, behind 4]
-}
data Distance = Ahead Int | Behind Int | AheadBehind Int Int deriving (Eq)
So Distance
represents how many commits the current branch is ahead and/or behind its remote tracking branch; its data constructors are all aptly named.
Going back to inBrackets
:
inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)
between
is a function defined in the Parsec library. Documentation as follows:
between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a
-- between open close p parses open, followed by p and close. Returns the value returned by p.
So essentially, inBrackets
expects some string that satisfies one of behind
, try aheadBehind
or ahead
in between a [
and ]
. There is a subtlety with the use of try
in try aheadBehind
that we will explain later. Now, let’s talk a look at behind
, aheadBehind
and ahead
.
behind
is defined at line 140 of src/src/BranchParse.hs:
behind :: Parser Distance
behind = makeAheadBehind "behind" Behind
Recall that Behind
is one of the data constrcutors of Distance
. makeAheadBehind
is defined at line 131 of the same file:
makeAheadBehind :: String -> (Int -> Distance) -> Parser Distance
makeAheadBehind name constructor =
do -- Parsec
string (name ++ " ")
dist <- many1 digit
return (constructor (read dist))
Documentation for many1
and digit
, both in the Parsec library:
many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a]
-- many1 p applies the parser p one or more times. Returns a list of the
-- returned values of p.
digit :: Stream s m Char => ParsecT s u m Char
-- Parses a digit. Returns the parsed character.
We see that behind = makeAheadBehind "behind" Behind
. This will first consume the string "behind "
(and discard it), then consume 1 or more digits and bind the list of digits to dist
. Since constructor
has type Int -> Distance
, read dist
will convert the list of digits into an Int
, then pass it to constructor
to create a Distance
. In this case, the constructor
is the Behind
data constructor, which takes in 1 Int
and creates a Distance
.
The behind
parser wants to parse a string similar to behind 5
and returns a Behind n
. inBrackets
can therefore consume a string similar to [behind 5]
.
inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)
The other possibility that inBrackets
can go down is try aheadBehind
. Let’s look at the aheadBehind
parser, defined at line 142 of src/src/BranchParse.hs:
aheadBehind :: Parser Distance
aheadBehind =
do -- Parsec
Ahead aheadBy <- ahead
string ", "
Behind behindBy <- behind
return (AheadBehind aheadBy behindBy)
ahead
is defined at line 138 of the same file:
ahead :: Parser Distance
ahead = makeAheadBehind "ahead" Ahead
aheadBehind
will first call ahead
, which calls makeAheadBehind
, which consumes the string "ahead "
(and discard it), then consume 1 or more digits and creates an Ahead Int
. The string ", "
will consume the string ", "
. Next, behind
springs into action (we covered that above) and consumes "behind "
followed by 1 or more digits. Note that pattern matching is done to get the Int
in the Behind
so that the Int
is bound to behindBy
. Finally, an AheadBehind Int Int
is created. All in all, inBrackets
that goes down the route of aheadBehind
consumes a string similar to the following:
[ahead 13, behind 7]
Returning to inBrackets
once again:
inBrackets = between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)
We see that the final possible branch is ahead
. We have already covered this while going through aheadBehind
. For completeness, if inBrackets
goes down the route of ahead
, a string similar to [ahead 10]
is desired.
Earlier, we mentioned a subtlety in the use of try
in try aheadBehind
for the inBrackets
parser. One might ask, why only wrap aheadBehind
in a try
? Why not wrap behind
and ahead
in try
as well?
We do not have to wrap the behind
parser in a try
, because it uses the string "behind "
parser to consume the string "behind "
. Notice that the string "behind "
and the string "ahead "
differ in the first character (b
vs. a
) - this causes the behind
parser to fail immediately without consuming any input. Since it does not consume any input, the <|>
ensures that it will go on to try the next parser in try aheadBehind
.
We see this fine print in the documentation for (<|>)
:
(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
-- This combinator implements choice. The parser p <|> q first applies p. If it
-- succeeds, the value of p is returned. If p fails without consuming any input,
-- parser q is tried. This combinator is defined equal to the mplus member of
-- the MonadPlus class and the (<|>) member of Alternative.
Specifically, the part that says If p fails without consuming any input, parser q is tried.
There is overlap between strings that aheadBehind
and ahead
parse. aheadBehind
expects strings of the form ahead M, behind N
, while ahead
expects a string similar to ahead M
, with M
and N
being non negative integers. If we were to rearrange things and use behind <|> ahead <|> try aheadBehind
, then for the input string ahead 7, behind 9
, the behind
parser will fail without consuming any input, then <|>
will use the ahead
parser to consume the string "ahead 7"
and stop there. The (behind <|> ahead <|> try aheadBehind)
parser succeeds, but between (char '[') (char ']') (behind <|> try aheadBehind <|> ahead)
will fail because the next character is not a ]
but a ,
. Hence, aheadBehind
must be attempted before ahead
.
So we have established that aheadBehind
must be attempted before ahead
. Minimally, we have to use behind <|> aheadBehind <|> ahead
. Now for the try
. What happens if behind <|> aheadBehind <|> ahead
parses the string "ahead 5"
(which is valid for a git branch that is only ahead but not behind its remote tracking branch)? The behind
parser fails without consuming any input, so <|>
tries aheadBehind
, which consumes the entire "ahead 5"
but then that expects a ", "
, so parsing fails. Because input was consumed, the next <|>
does not try the ahead
parser. Hence we need to wrap aheadBehind
in a try
so it will not consume any input on parse failure and chaining it with <|> ahead
will move on to try the ahead
parser.
Now that we know what the inBrackets
parser does, we go back to what brought us here in the first place, branchRemoteTracking
:
branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
do -- Parsec
branch <- trackedBranch
tracking <- many (noneOf " ")
char ' '
behead <- inBrackets
let remote = MkRemote (MkBranch tracking) (Just behead)
let bi = MkBranchInfo branch (Just remote)
return (Just bi)
Because inBrackets
took a while to explain, if necessary, you might want to read what we previously covered for branchRemoteTracking
to refresh your knowledge before carrying on.
To understand the data structures involved, we have to know what we are trying to do here. branchRemoteTracking
is trying to parse a string where the current git branch that has a remote tracking branch and falls under one of the 3 cases:
An example of a string that satisfies case 3 is:
master...origin/feat [ahead 5, behind 3]
Armed with this information, we know that
let remote = MkRemote (MkBranch tracking) (Just behead)
Captures the information about the remote tracking branch in MkBranch tracking
and the number of commits the current branch is ahead and/or behind the remote tracking branch in Just behead
.
The Remote
type constructor and the MkRemote
data constructor are defined at line 56 of src/src/BranchParse.hs:
data Remote = MkRemote Branch (Maybe Distance) deriving (Eq, Show)
There is only 1 data constructor, MkRemote
. We see that a remote represents a remote tracking branch (the Branch
parameter) and the number of commits the current branch is ahead and/or behind this remote tracking branch (the Maybe Distance
parameter). It is possible that the current branch and its remote tracking branch are in sync and Maybe Distance
allows us to use Nothing
to represent that.
The remaining lines in branchRemoteTracking
:
let bi = MkBranchInfo branch (Just remote)
return (Just bi)
creates a BranchInfo
object using its single data constructor MkBranchInfo
, passing in the current branch (in branch
) and information about the remote tracking branch (in Just remote
). Then it wraps the BranchInfo
inside a Just
and uses return
on it.
Here’s the definition for the BranchInfo
type constructor:
data BranchInfo = MkBranchInfo Branch (Maybe Remote) deriving (Eq, Show)
Earlier when we covered the branchOnly
parser, we mentioned we will explain the Maybe Remote
part in MkBranchInfo
. See how branchOnly
also uses MkBranchInfo
but passes in a Nothing
for the Maybe Remote
:
branchOnly :: Parser MBranchInfo
branchOnly =
-- omitted
let bi = MkBranchInfo (MkBranch branch) Nothing
-- omitted
The Nothing
indicates that there is no remote tracking branch for the current branch.
To summarize, the branchRemoteTracking
parser wants to consume a string similar to one of the three variants below:
master...origin/feat [ahead 7]
bourbon...origin/rice-noodles [ahead 10, behind 4]
fix-a-pesky-bug...workplace/nice-feature-work [behind 2]
In other words, a branch that has a remote tracking branch and is some commits ahead and/or behind that remote tracking branch.
branchRemote
parserIn the event that try noBranch
, try newRepo
and try branchRemoteTracking
all fail, branchParser
attempts the try branchRemote
parser.
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
The branchRemote
parser is defined at line 96 of src/src/BranchParse.hs:
branchRemote :: Parser MBranchInfo
branchRemote =
do -- Parsec
branch <- trackedBranch
tracking <- many (noneOf " ")
eof
let remote = MkRemote (MkBranch tracking) Nothing
let bi = MkBranchInfo branch (Just remote)
return (Just bi)
Its definition is eerily similar to that of branchRemoteTracking
:
branchRemoteTracking :: Parser MBranchInfo
branchRemoteTracking =
do -- Parsec
branch <- trackedBranch
tracking <- many (noneOf " ")
char ' '
behead <- inBrackets
let remote = MkRemote (MkBranch tracking) (Just behead)
let bi = MkBranchInfo branch (Just remote)
return (Just bi)
Except that in terms of parsers, these 2 are not there:
char ' '
behead <- inBrackets
but are instead replaced by the eof
parser, which expects there to be no more input.
With what we have covered for branchRemoteTracking
, it should not be difficult to see that branchRemote
expects a string similar to:
refactoring...origin/refactoring
which is a git branch that has a remote tracking branch and is perfectly in sync with it. From
let remote = MkRemote (MkBranch tracking) Nothing
let bi = MkBranchInfo branch (Just remote)
we see that the 2nd argument to MkRemote
is a Nothing
, which indicates that the git branch and its remote tracking branch are perfectly in sync.
Due to the overlap between the strings that branchRemoteTracking
and branchRemote
parsers consume, specifically, that branchRemoteTracking
will consume what branchRemote
consumes and more, therefore, try branchRemoteTracking
has to be attempted before try branchRemote
.
branchOnly
parserThe final parser the branchParser
will use, when all else fails, is the branchOnly
parser:
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
Notice that it is not wrapped in a try
, because this is the final parser in the chain and we do not need to care about whether input is consumed upon failure and we can just let it fail.
branchOnly
is defined at line 106 of src/src/BranchParse.hs:
branchOnly :: Parser MBranchInfo
branchOnly =
do -- Parsec
branch <- many (noneOf " ")
eof
let bi = MkBranchInfo (MkBranch branch) Nothing
return (Just bi)
We have covered it when we went through the newRepo
parser, so we shall not cover it here again. In short, branchOnly
consumes a string containing just a branch name and has no remote tracking branch. To see this in an actual git repo, simply do a git checkout -b some-crazy-weird-branch-name
and run git status --porcelain --branch
. This first line in the output will look similar to:
## some-crazy-weird-branch-name
Because this overlaps with what the try branchRemoteTracking
and try branchRemote
parsers consume, we have to attempt those before the branchOnly
parser.
With that, we have completed our coverage of branchParser
.
branchParser'
branchParser' :: Parser MBranchInfo
branchParser' =
do -- Parsec
string "## "
branchParser
branchParser :: Parser MBranchInfo
branchParser =
try noBranch
<|> try newRepo
<|> try branchRemoteTracking
<|> try branchRemote
<|> branchOnly
To summarize branchParser'
, below, we give one example on each line for each of the parsers that branchParser
can use:
## HEAD (no branch)
## Initial commit on something-that-doesnt-seem-to-work-for-git-2-15-0
## localbranch...remote/remote-tracking-branch [ahead 5, behind 5]
## localbranch...remote-two/another-remote-tracking-branch
## just-a-local-branch
branchParser'
Now that we are done with branchParser
(and branchParser'
), we go back to what led us down this path:
branchInfo :: String -> Either ParseError MBranchInfo
branchInfo = parse branchParser' ""
processBranch :: String -> Maybe MBranchInfo
processBranch = rightOrNothing . branchInfo
On parse success, branchInfo
returns a Right MBranchInfo
. On parse failure, branchInfo
returns a Left ParseError
. Its calling function processBranch
uses rightOrNothing
, defined at line 15 of src/src/Utils.hs:
rightOrNothing :: Either a b -> Maybe b
rightOrNothing = either (const Nothing) Just
to convert a Left ParseError
into a Nothing
, and convert a Right MBranchInfo
into a Just MBranchInfo
. The either
function is from the Data.Either
module:
either :: (a -> c) -> (b -> c) -> Either a b -> c
-- Case analysis for the Either type. If the value is Left a, apply the first
-- function to a; if it is Right b, apply the second function to b.
while the const
function should be a familiar staple:
const :: a -> b -> a
-- const x is a unary function which evaluates to x for all inputs.
-- For instance,
-- >>> map (const 42) [0..3]
-- [42, 42, 42, 42]
Notice that rightOrNothing
will discard the ParseError
that is embedded in the Left
on a parse failure. In other applications, the ParseError
may be used to display a meaningful error message giving some hints as to why parsing failed. But in this case, we do not care for that.
processBranch
is invoked by processGitStatus
, defined at line 21 of src/src/Utils.hs:
processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
do -- Maybe
mbranch <- processBranch branchLine
status <- processStatus statusLines
return (MkGitInfo mbranch status)
On a successful parse of branchLine
by processBranch
, mbranch
will be a MBranchInfo
. Do note that we are in the Maybe
monad. On an unsuccessful parse, processBranch branchLine
will result in Nothing
and the rest of the computations in processGitStatus
will not be performed and a Nothing
will be its return value.
We shall move on to processStatus
, the next major piece of this program.
processStatus
functionprocessStatus
is defined at line 50 of src/src/StatusParse.hs:
processStatus :: [String] -> Maybe (Status Int)
processStatus statLines =
do -- Maybe
statList <- for statLines extractMiniStatus
return (countStatus statList)
This function parses all the lines from 2nd line to the final line of the output of git status --porcelain --branch
. The function for
is defined in the Data.Traversable
module:
for :: (Traversable t, Applicative f) => t a -> (a -> f b) -> f (t b)
-- for is traverse with its arguments flipped. For a version that ignores the
-- results see for_
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
-- Map each element of a structure to an action, evaluate these actions from
-- left to right, and collect the results. For a version that ignores the
-- results see traverse_
traverse
is part of the Traversable
type class. We include it here because for
is defined in terms of traverse
.
When we fit for statLines extractMiniStatus
to the type signature of for
, we get:
for :: (Applicative f) => [] String -> (String -> f b) -> f ([] b)
The Traversable
here is the []
constructor (not to be confused with the empty list). extractMiniStatus
is the String -> f b
function. It is defined at line 45 of src/src/StatusParse.hs and has the following type signature:
extractMiniStatus :: String -> Maybe MiniStatus
Maybe
is an Applicative, and we see that our b
is MiniStatus
. Using this new found information about the types, we have:
for :: [] String -> (String -> Maybe MiniStatus) -> Maybe ([] MiniStatus)
Hence, statList
in:
statList <- for statLines extractMiniStatus
has type [MiniStatus]
.
MiniStatus
is a type constructor defined at line 14 of the same file. To better explain things, we include the comment above it as well:
{- The two characters starting a git status line: -}
data MiniStatus = MkMiniStatus Char Char
Here is the definition of extractMiniStatus
:
extractMiniStatus :: String -> Maybe MiniStatus
extractMiniStatus [] = Nothing
extractMiniStatus [_] = Nothing
extractMiniStatus (index:work:_) = Just (MkMiniStatus index work)
We see that if a string has less than 2 characters, it returns a Nothing
. Otherwise, it uses pattern matching to extract the first 2 characters and pass them to the MkMiniStatus
data constructor. The author uses index
and work
for the name bindings for the first and second characters respectively, which is a hint that this has something to do with the git index and the work tree.
To understand the behavior of for
, we look at its definition:
for = flip traverse
It is as the documentation says. This is not very meaningful, so we have to look at the definition of traverse for lists:
instance Traversable [] where
{-# INLINE traverse #-} -- so that traverse can fuse
traverse f = List.foldr cons_f (pure [])
where cons_f x ys = liftA2 (:) (f x) ys
In our case, extractMiniStatus
is the f
. Notice the liftA2 (:) (f x) ys
. If f x
returns a Nothing
at some point, then we have:
liftA2 (:) Nothing ys
which should stay a Nothing
for the remaining of the computation and there is no escape from it. But let us verify whether this is the case, by looking at the definition of liftA2
in the Applicative
instance of Maybe
:
instance Applicative Maybe where
pure = Just
Just f <*> m = fmap f m
Nothing <*> _m = Nothing
liftA2 f (Just x) (Just y) = Just (f x y)
liftA2 _ _ _ = Nothing
Just _m1 *> m2 = m2
Nothing *> _m2 = Nothing
liftA2 (:) Nothing ys
is covered by the case
liftA2 _ _ _ = Nothing
Therefore, once we get a Nothing
in traverse
, this definition of liftA2
ensures that we will always get a Nothing
. Which means that extractMiniStatus
is banking on its final pattern match:
extractMiniStatus (index:work:_) = Just (MkMiniStatus index work)
for any meaningful computation to be done. The other pattern matches (which return Nothing
) all indicate failure.
If the 2nd till the final line of git status --porcelain --branch
all pattern match against the final pattern match in extractMiniStatus
, then for statusList extractMiniStatus
returns a Just [MiniStatus]
. If even one line doesn’t pattern match against the final pattern match, then for statusList extractMiniStatus
returns a Nothing
.
To understand what extractMiniStatus
is pattern matching on, we quote some relevant documentation from the short format section of the git status manpage for git 2.15.0:
In the short-format, the status of each path is shown as
XY PATH1 -> PATH2
where PATH1 is the path in the HEAD, and the " -> PATH 2" part is shown only
when PATH1 corresponds to a different path in the index/worktree (i.e. the file
is renamed). The XY is a two-letter status code.
For paths with merge conflicts, X and Y show the modification states of each
side of the merge. For paths that do not have merge conflicts, X shows the
status of the index, and Y shows the status of the work tree. For untracked
paths, XY are ??. Other status codes can be interpreted as follows:
...omitted...
Indeed the first character of each line shows the state of the file in the index, while the second character shows the state of the file in the work tree. Notice how extractMiniStatus
does not care about the rest of the characters on each line.
The final line of processStatus
:
return (countStatus statList)
calls the countStatus
on the [MiniStatus]
gathered, assuming all went well. If for statusList extractMiniStatus
returns Nothing
, then processStatus
also returns a Nothing
. Let us look at the countStatus
function next.
countStatus
functionThe countStatus
function is defined at line 36 of src/src/StatusParse.hs:
countStatus :: [MiniStatus] -> Status Int
countStatus l = MakeStatus
{
staged=countByType isStaged l,
conflict=countByType isConflict l,
changed=countByType isChanged l,
untracked=countByType isUntracked l
}
It returns a Status Int
. The Status
type constructor is defined at line 7 of the same file. But we include the comment at line 6 as well:
{- Full status information -}
data Status a = MakeStatus {
staged :: a,
conflict :: a,
changed :: a,
untracked :: a} deriving (Eq, Show)
With Status Int
, all the fields in MakeStatus
will be Int
. This seems to be used to count the number of files in the git repo that are not in a “clean” state.
We see that the countStatus
function uses the countByType
function to compute each of the fields in MakeStatus
. The countByType
function is defined at line 33 of the same file:
countByType :: (MiniStatus -> Bool) -> [MiniStatus] -> Int
countByType isType = length . filter isType
countByType
counts the number of lines in the [MiniStatus]
computed by for statusList extractMiniStatus
that fulfil the isType
predicate. Based on the usage of countByType
that we see in the MakeStatus
data constructor, the isStaged
, isConflict
, isChanged
and isUntracked
predicates are used as the isType
argument to countByType
. Let’s take a look at isStaged
, defined at line 21 of src/src/StatusParse.hs:
isStaged :: MiniStatus -> Bool
isStaged (MkMiniStatus index work) =
(index `elem` "MRC") || (index == 'D' && work /= 'D') || (index == 'A' && work /= 'A')
There are 3 distinct cases where isStaged
returns True
:
M
, R
, C
D
and the second character is not D
A
and the second character is not A
The code is simple enough, but what exactly do these characters stand for? To find out, we consult the documentation for the short-format of git status
:
In the short-format, the status of each path is shown as
XY PATH1 -> PATH2
where PATH1 is the path in the HEAD, and the " -> PATH 2" part is shown only
when PATH1 corresponds to a different path in the index/worktree (i.e. the file
is renamed). The XY is a two-letter status code.
For paths with merge conflicts, X and Y show the modification states of each
side of the merge. For paths that do not have merge conflicts, X shows the
status of the index, and Y shows the status of the work tree. For untracked
paths, XY are ??. Other status codes can be interpreted as follows:
- '' = unmodified
- M = modified
- A = added
- D = deleted
- R = renamed
- C = copied
- U = updated but unmerged
Ignored files are not listed, unless --ignored option is in effect, in which
case XY are !!.
X Y Meaning
-------------------------------------------------
[MD] not updated
M [ MD] updated in index
A [ MD] added to index
D [ M] deleted from index
R [ MD] renamed in index
C [ MD] copied in index
[MARC] index and work tree matches
[ MARC] M work tree changed since index
[ MARC] D deleted in work tree
-------------------------------------------------
D D unmerged, both deleted
A U unmerged, added by us
U D unmerged, deleted by them
U A unmerged, added by them
D U unmerged, deleted by us
A A unmerged, both added
U U unmerged, both modified
-------------------------------------------------
? ? untracked
! ! ignored
-------------------------------------------------
The table on the codes for X
and Y
are very useful to us and allows us to show some of the cases covered by the isStaged
function.
index `elem` "MRC"
covers these cases:
M [ MD] updated in index
R [ MD] renamed in index
C [ MD] copied in index
[MARC] index and work tree matches
(index == 'D' && work /= 'D')
covers these cases:
D [ M] deleted from index
D U unmerged, deleted by us
while (index == 'A' && work /= 'A')
covers these cases:
A [ MD] added to index
[MARC] index and work tree matches
[ MARC] M work tree changed since index
[ MARC] D deleted in work tree
A U unmerged, added by us
But based on first principles, index `elem` "MRC"
covers the case where the file in the index has been modified, renamed, or copied, relative to HEAD. Starting from a clean repository, M
can be achieved by making a change to a file tracked by git and then using git add
on that file. R
can be achieved by using git mv
. I have no idea how we can get a C
but I am guessing it might have something to do with one of git rebase
, git merge
, git am
and similar.
One way to satisfy (index == 'D' && work /= 'D')
is to use git rm
on a tracked file. To be precise, that shows a "D "
for the first character and a space for the second character. If the table is exhaustive, it seems that we are ok with every entry that has a D
in the first character, except for this one case:
D D unmerged, both deleted
which seems that it will only arise during a git merge when there’s a merge conflict in another file that’s awaiting the user to resolve manually or a similar situation involving some merge conflict - this is just a guess and I am not certain if I am correct.
One way to satisfy (index == 'A' && work /= 'A')
is to git add
a previously untracked file. That gives us a "A "
to be precise. It seems that we are trying to avoid this case:
A A unmerged, both added
which once again seems that it will only arise during a merge conflict pending human resolution.
Whether these cases covered by the isStaged
function are exhaustive, they all indicate that the file has changed in the index, relative to HEAD, except for in the case of merge conflicts.
We shall do a quick walkthrough of isConflict
, isChanged
and isUntracked
.
isConflict :: MiniStatus -> Bool
isConflict (MkMiniStatus index work) =
index == 'U' || work == 'U' || (index == 'A' && work == 'A') || (index == 'D' && work == 'D')
As its name suggests, isConflict
covers the case where a file has a merge conflict.
isChanged :: MiniStatus -> Bool
isChanged (MkMiniStatus index work) =
work == 'M' || (work == 'D' && index /= 'D')
isChanged
takes care of files which are modified in the work tree relative to HEAD (work == 'M'
) and files deleted from the work tree but not deleted in the index (can be gotten by using rm
to remove a tracked file).
isUntracked :: MiniStatus -> Bool
isUntracked (MkMiniStatus index _) =
index == '?'
and finally, isUntracked
takes care of files which are not tracked by git.
Returning to processStatus and countStatus:
processStatus :: [String] -> Maybe (Status Int)
processStatus statLines =
do -- Maybe
statList <- for statLines extractMiniStatus
return (countStatus statList)
countStatus :: [MiniStatus] -> Status Int
countStatus l = MakeStatus
{
staged=countByType isStaged l,
conflict=countByType isConflict l,
changed=countByType isChanged l,
untracked=countByType isUntracked l
}
we see that for statLines extractMiniStatus
computes a list of MkMiniStatus
from the output of git status --porcelain --branch
. Then, countStatus
is used to create a Status
with 4 fields that counts the number of files which are modified in the index relative to the work tree (staged), in a merge conflict, modified in the work tree relative to HEAD (changed) and untracked. This Status
is then wrapped in a Just
and returned by processStatus
.
In the event that some line in the output of git status --porcelain --branch
has less than 2 characters, for statLines extractMiniStatus
results in a Nothing
and it is returned by processStatus
, without running return (countStats statList)
, because we are inside the Maybe
monad.
That finishes our coverage of processStatus
.
processGitStatus
processGitStatus :: [String] -> Maybe GitInfo
processGitStatus [] = Nothing
processGitStatus (branchLine:statusLines) =
do -- Maybe
mbranch <- processBranch branchLine
status <- processStatus statusLines
return (MkGitInfo mbranch status)
In the final line, MkGitInfo mbranch status
constructs a GitInfo
(defined at line 11 of src/src/Utils.hs):
data GitInfo = MkGitInfo MBranchInfo (Status Int)
which wraps over the MBranchInfo
from processBranch branchLine
and the Status Int
from processStatus statusLines
. Assuming everything went smoothly and both processBranch
and processStatus
returned Just
s, the GitInfo
itself will be wrapped inside Just
. Otherwise, processGitStatus
returns a Nothing
.
The GitInfo
value captures all the information obtained from the output of git status --porcelain --branch
.
stringsFromStatus
stringsFromStatus :: Maybe Hash
-> String -- status
-> Maybe [String]
stringsFromStatus h status = do -- List
processed <- processGitStatus (lines status)
return (showGitInfo h processed)
stringsFromStatus
lives inside the Maybe
monad. processGitStatus
returns either a Just GitStatus
or a Nothing
. If it is a Nothing
, everything else is skipped and stringsFromStatus
returns a Nothing
. If it is a Just GitStatus
, the GitStatus
is bound to processed
. That, along with h
, is passed to showGitInfo
, defined at line 57 of src/src/Utils.hs:
showGitInfo :: Maybe Hash
-> GitInfo
-> [String]
showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat
where
branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)
This pattern matches the GitInfo
argument using its only MkGitInfo
constructor and binds its 2 components to the names bi
and stat
.
Because the return type of ShowGitInfo
is [String]
and a ++
is used to concatenate branchInfoString
and showStatusNumbers stat
, this means that branchInfoString
is a [String]
.
Let’s look at the definition of branchOrHashWith
, along with its comment at line 50:
{- Combine status info, branch info and hash -}
branchOrHashWith :: Char -> Maybe Hash -> Maybe BranchInfo -> BranchInfo
branchOrHashWith _ _ (Just bi) = bi
branchOrHashWith c (Just hash) Nothing = MkBranchInfo (MkBranch (c : getHash hash)) Nothing
branchOrHashWith _ Nothing _ = MkBranchInfo (MkBranch "") Nothing
The first pattern match ignores the first 2 arguments and tries to pattern match against the MBranchInfo
inside the GitInfo
. Recall that this is the result of the processBranch
function and captures all the important information about the current git branch. Also recall that MBranchInfo
is a type synonym for Maybe BranchInfo
. If this is a Just
, then branchOrHashWith
simply returns the BranchInfo
value that’s wrapped inside the Just
.
The second pattern match covers the case where the return value from processBranch
is a Nothing
. This happens when parsing the branch line fails and we have no information on the current git branch. The second argument passed to branchOrHashWith
is originally from the main
function:
main = do -- IO
status <- getContents
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
-- omitted
gitrevparse :: IO (Maybe Hash)
gitrevparse = do -- IO
mresult <- safeRun "git" ["rev-parse", "--short", "HEAD"]
let rev = do -- Maybe
result <- mresult
return (MkHash (init result))
return rev
to be exact, it is the result of unsafeInterleaveIO gitrevparse
, which is a deferred run of git rev-parse --short HEAD
. This command shows the git commit SHA1 of the top most commit on the current git branch. We covered this early on and noted that the result of unsafeInterleaveIO gitrevparse
will be a Just Hash
if the git rev-parse --short HEAD
runs successfully and a Nothing
otherwise. So we finally see the purpose of this deferred computation: it allows us to obtain a git commit SHA1 as a fallback in the event that we cannot obtain any information about the git branch. The unsafeInterleaveIO
will prevent it from running until it is actually needed.
Returning to the second pattern match of branchOrHashWith
:
branchOrHashWith c (Just hash) Nothing = MkBranchInfo (MkBranch (c : getHash hash)) Nothing
The (Just hash)
will only pattern match on a successful executation of git rev-parse --short HEAD
. The c
here is a colon character. The getHash
function is defined at line 9 of src/src/Utils.hs:
newtype Hash = MkHash {getHash :: String}
getHash hash
extracts the String
that is wrapped by the MkHash
newtype constructor, which is used by the gitrevparse
function to wrap around the git commit SHA1 (except for the newline character).
Overall, this second pattern match of branchOrHashWith
returns a BranchInfo
value whose Branch
component is the git commit SHA1 prepended with a colon character, and whose Maybe Remote
component is a Nothing
.
The third and final pattern match of branchOrHashWith
:
branchOrHashWith _ Nothing _ = MkBranchInfo (MkBranch "") Nothing
covers the case where both parsing the branch line failed and the command git rev-parse --short HEAD
failed. In this case, a BranchInfo
object is created with the Branch
component being a MkBranch ""
and whose Maybe Remote
component is a Nothing
.
Going back to showGitInfo
, we see that the BranchInfo
returned by branchOrHasWith
is passed to showBranchInfo
.
branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)
which is defined at line 47 of src/src/Utils.hs:
showBranchInfo :: BranchInfo -> [String]
showBranchInfo (MkBranchInfo branch mremote) = show branch : showRemoteNumbers mremote
This first runs show branch
to convert the Branch
value within MkBranchInfo
into a String
. The Show
instance of Branch
is defined at line 40 of src/src/BranchParse.hs:
instance Show Branch where
show (MkBranch b) = b
Because Branch
is just a newtype wrapper over String
, this is essentially just returns the String
that is being wrapped. The value of this String
can be the current git branch name or if parsing the branch line fails, the current git commit SHA1 prepended by a colon, or if that fails, it will be the empty string.
This String
is prepended to the [String]
created by showRemoteNumbers mremote
. The showRemoteNumbers
function is defined at line 35 of src/src/Utils.hs:
showRemoteNumbers :: Maybe Remote -> [String]
showRemoteNumbers mremote =
do -- List
ab <- [ahead, behind]
return (show ab)
where
(ahead, behind) = fromMaybe (0,0) distance -- the script needs some value, (0,0) means no display
distance = do -- Maybe
remote <- mremote
dist <- getDistance remote
return (pairFromDistance dist)
And it makes use of the list monad. The idea is simple. ahead
and behind
will each be bound to ab
(one at a time) and then show ab
converts it to a String
, which will be in the resulting [String]
. Hence the return value of showRemoteNumbers
will always be a list of 2 strings.
ahead
and behind
are defined in the where
clause by fromMaybe (0,0) distance
. The fromMaybe
function is part of the Data.Maybe
module. We covered it earlier but to refresh our memory, here is its documentation:
fromMaybe :: a -> Maybe a -> a
-- The fromMaybe function takes a default value and a Maybe value. If the Maybe
-- is a Nothing, it returns the default values; otherwise, it returns the value
-- contained in the Maybe.
If distance
is a Just _
, then we will be taking (ahead, behind)
from inside it. Otherwise, ahead
and behind
will both be 0. distance
is defined as follows:
distance = do -- Maybe
remote <- mremote
dist <- getDistance remote
return (pairFromDistance dist)
It lives in the Maybe
monad. The mremote
is the Maybe Remote
part of the bigger BranchInfo
value passed to showBranchInfo
. If it is a Nothing
, all bets are off and fromMaybe (0,0) distance
will return (0,0)
. This applies for the case where there is no information on the number of commits the current branch is ahead and/or behind its remote tracking branch, or perhaps the current branch does not have a remote tracking branch.
If there is a Remote
value, it is bound to the name remote
and passed to the getDistance
function, defined at line 58 of src/src/BranchParse.hs:
getDistance :: Remote -> Maybe Distance
getDistance (MkRemote _ md) = md
Here is the definition of the Remote
data type:
data Remote = MkRemote Branch (Maybe Distance) deriving (Eq, Show)
so getDistance
is essentially extracting the Maybe Distance
part. This will only be a Just
if parsing the branch line was successful and it is one of the following variants:
## master...origin/feat [ahead 7]
## bourbon...origin/rice-noodles [ahead 10, behind 4]
## fix-a-pesky-bug...workplace/nice-feature-work [behind 2]
which will be parsed by the branchParser'
parser using the branchParser
parser which goes down the route of the branchRemoteTracking
parser, all of which we covered earlier.
The Distance
type is defined at line 21 of src/src/BranchParse.hs:
data Distance = Ahead Int | Behind Int | AheadBehind Int Int deriving (Eq)
If getDistance
extracts a Just Distance
value, the Distance
value is bound to the name dist
, which is then passed to the pairFromDistance
function, defined at line 153 of src/src/BranchParse.hs:
pairFromDistance :: Distance -> (Int, Int)
pairFromDistance (Ahead n) = (n,0)
pairFromDistance (Behind n) = (0,n)
pairFromDistance (AheadBehind m n) = (m,n)
which covers all the different data constructors of Distance
. It returns a 2 tuple, with each element being the number of commits the current branch is ahead or behind of its remote tracking branch, respectively.
showRemoteNumbers :: Maybe Remote -> [String]
showRemoteNumbers mremote =
do -- List
ab <- [ahead, behind]
return (show ab)
where
(ahead, behind) = fromMaybe (0,0) distance -- the script needs some value, (0,0) means no display
distance = do -- Maybe
remote <- mremote
dist <- getDistance remote
return (pairFromDistance dist)
With our newfound knowledge, what showRemoteNumbers
does is pretty obvious. It returns a list of 2 strings indicating how many commits the current branch is ahead or behind its remote tracking branch respectively, if applicable. Otherwise, both elements will be "0"
.
Backtracking to showBranchInfo
:
showBranchInfo (MkBranchInfo branch mremote) = show branch : showRemoteNumbers mremote
We see that it returns a list of 3 strings:
And backtracking to showGitInfo
:
showGitInfo :: Maybe Hash
-> GitInfo
-> [String]
showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat
where
branchInfoString = showBranchInfo (branchOrHashWith ':' mhash bi)
After having generated the list of 3 strings in branchInfoString
, we concatenate it with the result of showStatusNumbers stat
, defined at line 29 of src/src/Utils.hs:
showStatusNumbers :: Status Int -> [String]
showStatusNumbers (MakeStatus s x c t) =
do -- List
nb <- [s, x, c, t]
return (show nb)
Looking at the definition of the Status
data type:
{- Full status information -}
data Status a = MakeStatus {
staged :: a,
conflict :: a,
changed :: a,
untracked :: a} deriving (Eq, Show)
we see that showStatusNumbers
extracts the number of staged, conflicted, changed and untracked files, converts each of them to String
, then packs them into a list.
showGitInfo mhash (MkGitInfo bi stat) = branchInfoString ++ showStatusNumbers stat
and showGitInfo
combines all the information into one list of 7 elements, which are String
versions of the following:
Backtracking to stringsFromStatus
:
stringsFromStatus :: Maybe Hash
-> String -- status
-> Maybe [String]
stringsFromStatus h status = do -- List
processed <- processGitStatus (lines status)
return (showGitInfo h processed)
If processGitStatus
returns a Just GitInfo
, the GitInfo
is bound to the name processed
, then showGitInfo h processed
is executed and the list it returns is wrapped inside a Just
and returned by stringsFromStatus
. If processGitStatus
returns a Nothing
, then stringsFromStatus
returns a Nothing
.
main
functionmain :: IO ()
main = do -- IO
status <- getContents
mhash <- unsafeInterleaveIO gitrevparse -- defer the execution until we know we need the hash
let result = do -- Maybe
strings <- stringsFromStatus mhash status
return (unwords strings)
putStr (fromMaybe "" result)
If stringsFromStatus
returns a Just [String]
, the [String]
is bound to strings
. The unwords
function then joins the String
s in the list together into one big String
, with each String
in the list separated by a space character. This Maybe String
is then bound to the result
let binding. If result
is a Just String
, then putStr (fromMaybe "" result)
will print the String
to standard output; otherwise it will print the empty string to standard output.
And… we are done with our main function.
The rest of the post covers how the output of this Haskell program is used to generate a prompt containing information about the git repo.
Very early on, we briefly covered the update_current_git_vars
function defined in line 43 of zshrc.sh:
function update_current_git_vars() {
unset __CURRENT_GIT_STATUS
if [[ "$GIT_PROMPT_EXECUTABLE" == "python" ]]; then
local gitstatus="$__GIT_PROMPT_DIR/gitstatus.py"
_GIT_STATUS=`python ${gitstatus} 2>/dev/null`
fi
if [[ "$GIT_PROMPT_EXECUTABLE" == "haskell" ]]; then
_GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
fi
__CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")
GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
GIT_STAGED=$__CURRENT_GIT_STATUS[4]
GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]
}
This is the line that runs the Haskell program to process the output of git status --porcelain --branch
:
_GIT_STATUS=`git status --porcelain --branch &> /dev/null | $__GIT_PROMPT_DIR/src/.bin/gitstatus`
and the output of the Haskell program is stored in the _GIT_STATUS
variable. The line
__CURRENT_GIT_STATUS=("${(@s: :)_GIT_STATUS}")
splits the _GIT_STATUS
variable using space as the delimiter and stores the result as an array in the __CURRENT_GIT_STATUS
variable. Right after that
GIT_BRANCH=$__CURRENT_GIT_STATUS[1]
GIT_AHEAD=$__CURRENT_GIT_STATUS[2]
GIT_BEHIND=$__CURRENT_GIT_STATUS[3]
GIT_STAGED=$__CURRENT_GIT_STATUS[4]
GIT_CONFLICTS=$__CURRENT_GIT_STATUS[5]
GIT_CHANGED=$__CURRENT_GIT_STATUS[6]
GIT_UNTRACKED=$__CURRENT_GIT_STATUS[7]
we see that the author makes the assumption that there are 7 elements in the __CURRENT_GIT_STATUS
array and assigns each element to a variable. These are the same 7 elements in the list created by the showGitInfo
Haskell function.
We go back to the git_super_status
function, defined at line 64 of zshrc.sh:
git_super_status() {
precmd_update_git_vars
if [ -n "$__CURRENT_GIT_STATUS" ]; then
STATUS="$ZSH_THEME_GIT_PROMPT_PREFIX$ZSH_THEME_GIT_PROMPT_BRANCH$GIT_BRANCH%{${reset_color}%}"
# omitted
}
In the if
statement, __CURRENT_GIT_STATUS
variable is checked for non emptiness. If so, STATUS
is assigned a value which begins with $ZSH_THEME_GIT_PROMPT_PREFIX
, defined at line 96 of zshrc.sh:
ZSH_THEME_GIT_PROMPT_PREFIX="("
followed by $ZSH_THEME_GIT_PROMPT_BRANCH
, defined at line 99 of the same file:
ZSH_THEME_GIT_PROMPT_BRANCH="%{$fg_bold[magenta]%}"
This changes the foreground color (text color) to magenta.
This is followed by $GIT_BRANCH
, which gives us the branch name produced by the Haskell program. Then we have a %{${reset_color}%}
which resets the foreground color.
If the current directory is in a git repo and the branch is named my-branch
, the STATUS
variable will have the following value:
(my-branch
Next up, we have the following code inside the overall if
branch in git_super_status
:
if [ "$GIT_AHEAD" -ne "0" ]; then
STATUS="$STATUS$ZSH_THEME_GIT_PROMPT_AHEAD$GIT_AHEAD%{${reset_color}%}"
fi
This appends extra stuff to STATUS
, but only if GIT_AHEAD
is a non-zero value. It starting with ZSH_THEME_GIT_PROMPT_AHEAD
, defined at line 104 of zshrc.sh:
ZSH_THEME_GIT_PROMPT_AHEAD="%{UpArrow%G%}"
There is an up arrow character ↑ which I have replaced with the text UpArrow
because of some technical issues that prevents it from rendering in a code block.
This is then followed by GIT_AHEAD
, which is the number of git commits the current branch is ahead of its remote tracking branch (if any). Then we have another %{${reset_color}%}
.
The %{UpArrow%G%}
is used to include a ‘glitch’ to output the ↑ character. According to zsh documentation:
%G
Within a %{…%} sequence, include a ‘glitch’: that is, assume that a single character width will be output. This is useful when outputting characters that otherwise cannot be correctly handled by the shell, such as the alternate character set on some terminals. The characters in question can be included within a %{…%} sequence together with the appropriate number of %G sequences to indicate the correct width. An integer between the ‘%’ and ‘G’ indicates a character width other than one. Hence %{seq%2G%} outputs seq and assumes it takes up the width of two standard characters.
Multiple uses of %G accumulate in the obvious fashion; the position of the %G is unimportant. Negative integers are not handled.
Note that when prompt truncation is in use it is advisable to divide up output into single characters within each %{…%} group so that the correct truncation point can be found.
Building on our hypothetical example, if my-branch
is 5 commits ahead of its remote tracking branch, the GIT_AHEAD
variable will have value 5 and the STATUS
variable will have the value (my-branch↑5
. However, if my-branch
is not ahead of its remote tracking branch, then GIT_AHEAD
will be zero and STATUS
will still be (my-branch
.
The next line in git_super_status
:
STATUS="$STATUS$ZSH_THEME_GIT_PROMPT_SEPARATOR"
appends ZSH_THEME_GIT_PROMPT_SEPARATOR
, which is defined at line 98:
ZSH_THEME_GIT_PROMPT_SEPARATOR="|"
so it is a pipe character. This separates the (git branch, number of commits ahead and number of commits behind) from the rest of the information.
The rest of the code in git_super_status
is of a similar nature and we shall not go through them here. We make an exception for line 91, where echo "$STATUS"
prints the prompt that is built. For zsh-git-prompt to display iinformation about a git repo, code which calls the git_super_status
function has to be at the user’s ~/.zshrc
(or included by it). Example code from the README:
source path/to/zshrc.sh
# an example prompt
PROMPT='%B%m%~%b$(git_super_status) %# '
The prompt from the STATUS
variable printed by the git_super_status
function will be part of the PROMPT
variable, which presumably forms the actual prompt that the user sees. Thus when the user is in a directory which is a git repository, information about that repository will be shown.
Note that in git_super_status
, if __CURRENT_GIT_STATUS
is empty, which can happen from either a failure to parse the branch line or a failure to parse any of the status lines from the output of git status --porcelain --branch
, then git_super_status
will not print anything and hence in
PROMPT='%B%m%~%b$(git_super_status) %# '
the $(git_super_status)
part will interpolate to nothing. A “conventional” prompt will be shown.
With that, our deep dive into zsh-git-prompt has come to an end.
We have not covered all the important code in the zsh-git-prompt repo, only the code that is actually run during normal usage. There are some test code in the src/test directory that the reader might want to take a look at, along with supporting code that is littered throughout the main code but used in tests as well. For instance, line 28 of src/src/BranchParse.hs. This code offers some insight on how one can use the venerable QuickCheck library for testing Haskell code. I could go through that in a follow up post, or maybe not, because it has taken me about a week of my free time to write this post and I need to get back to other stuff I was working on.
This is a pretty intense post (hence I called it a deep dive) and sometimes even I was lost in the details (but I managed to find my way back). The parts where I pasted previously discussed code was more for myself to refresh my memory than for you the reader. If you have made it all the way here and understood most of the content, then you deserve a pat on the back and my mission was successful.
Disclaimer: Opinions expressed on this blog are solely my own and do not express the views or opinions of my employer(s), past or present.