cft

Performing a code review

Code analysis


user

Programming Percy

3 years ago | 9 min read

Earlier today a user with the handle /aliezsid made a post on Reddit asking for a code review. I needed something to do for my cofee break, so I thought why not.

Aliezsid is asking for a code review for his new project commitlog. I visited his GitHub to see what the project was about, seems its a tool to generate change logs based on commit history for a repository on GitHub. I really like his idea, so I decided to do a small review.
You can read about his projects here.

Reconnaissance

The first thing I do when performing code reviews are always checking out the documentation. I need to gather a understanding of the library and what the idea behind it is.

The README.md specifies that this library is under heavy development, and it states that there is no tests so to be considered unstable. The second thing I usually do during a code review is checkout the tests to see usage examples. Tests are not only a great thing to have to make sure things don’t break, but I find them to be useful when understanding a libraries usage as well.

But as Aliezsid himself has written, he knows there are no tests at the moment, so leave that be.

Code analysis

The next thing I do is to grab the code.

mkdir codereview-gitcommit
cd codereview-gitcommit
git init
git pull https://github.com/barelyhuman/commitlog

Lets open up main.go. I am going to go see what is happening in the program and walk it through step by step.

path := os.Args[1]
r, err := git.PlainOpen(path)
if err != nil {
log.Fatal("Error opening Repository: ", err)
}

The first rows of gitcommit main function.

The first thing we see is that he expects an Input argument to the program. He will then use go-git to open that GitHub repository. If we check what PlainOpen does we will see the following.

// PlainOpen opens a git repository from the given path. It detects if the
// repository is bare or a normal one. If the path doesn't contain a valid
// repository ErrRepositoryNotExists is returned
func PlainOpen(path string) (*Repository, error) {
return PlainOpenWithOptions(path, &PlainOpenOptions{})
}

PlainOpen from go-git

So, the program expects an PATH to a GitHub repository, not a URL. That’s good to know, so lets modify this a bit, to make it more clear.
Lets use the golang Flag package to help us. The reason why I want to use flags is that it will help us with generating an user error, it is also more scalable.

path := os.Args[1]
r, err := git.PlainOpen(path)
if err != nil {
log.Fatal("Error opening Repository: ", err)
}

Updated version of reading arguments

Now its easier for users to get a grasp of how to use it, when executing the program we will now see

Error output when using gitcommit wrong.

Lets move on. What happens next in the code is that he opens up the git repository, grabs the git header reference and extracts the commit history. This is pretty unclear since there is no comments.

r, err: = git.PlainOpen (path)
if err! = nil {
log.Fatal ("Error opening Repository:", err)
}
ref, err: = r.Head ()

if err! = nil {
log.Fatal ("Unable to get repository HEAD:", err)
}
cIter, err: = r.Log (& git.LogOptions {From: ref.Hash ()})

if err! = nil {
log.Fatal ("Unable to get repository commits:", err)
}

Code where gitcommit opens the repo and extracts the header reference and commits

Now this code is isolated to the main package and pretty unclear whats going on. I’d like to break this apart some. I’m going to create a new Struct called Repository which will hold these functions. This will make testing a lot easier, and code cleaner and more scalable.

// Repository is a struct that holds an Opened github repository and the reference
type Repository struct {
path string
repo *git.Repository
ref *plumbing.Reference
}

// Open will open up a git repository
// and load the Head reference
func Open(path string) (*Repository, error) {
// Open the github repository
r, err := git.PlainOpen(path)
if err != nil {
return nil, fmt.Errorf("%v:%w", "Error opening Repository: ", err)
}

// Grab the Git HEAD reference
ref, err := r.Head()
if err != nil {
return nil, fmt.Errorf("%v:%w", "Unable to get repository HEAD:", err)
}
return &Repository{
path: path,
repo: r,
ref: ref,
}, nil
}

// GetCommits will extract commit history from the git repository
func (r *Repository) GetCommits() ([]*object.Commit, error) {
// Get the Commit history
cIter, err := r.repo.Log(&git.LogOptions{From: r.ref.Hash()})

if err != nil {
return nil, fmt.Errorf("%v:%w", "Unable to get repository commits:", err)
}

var commits []*object.Commit

err = cIter.ForEach(func(c *object.Commit) error {
commits = append(commits, c)
return nil
})

if err != nil {
return nil, fmt.Errorf("%v:%w", "Error getting commits :", err)
}
return commits, nil
}

A new Repository struct which holds our functions, testing will be a lot easier

The next thing he does is to extract the Latest tags from the repository. This is done by extracting the latestTag with the utils package. Actually, we can remove the utils package now and insert this into a function in the Repository struct. Since utils only holds this one function, I feel its more appropriate to insert the logic into our Repository struct.

logContainer := new(logcategory.LogsByCategory)
latestTag, _, err := utils.GetLatestTagFromRepository(repo.repo)

if err != nil {
log.Fatal("Error Getting Tag Pairs", err)
}

tillLatest := false

if latestTag != nil {
if latestTag.Hash().String() == repo.ref.Hash().String() {
tillLatest = false
} else {
tillLatest = true
}
}

This is the old gitcommit code where he extracts the latestTags.

// GetLatestTag is used to check the latestTag
// it will return a reference to the LatestTag and the PreviousTag
func (r *Repository) GetLatestTag() (*plumbing.Reference, *plumbing.Reference, error) {
tagRefs, err := r.repo.Tags()
if err != nil {
return nil, nil, err
}

var latestTagCommit *object.Commit
var latestTagName *plumbing.Reference
var previousTag *plumbing.Reference
var previousTagReturn *plumbing.Reference

err = tagRefs.ForEach(func(tagRef *plumbing.Reference) error {
revision := plumbing.Revision(tagRef.Name().String())

tagCommitHash, err := r.repo.ResolveRevision(revision)
if err != nil {
return err
}

commit, err := r.repo.CommitObject(*tagCommitHash)
if err != nil {
return err
}

if latestTagCommit == nil {
latestTagCommit = commit
latestTagName = tagRef
previousTagReturn = previousTag
}

if commit.Committer.When.After(latestTagCommit.Committer.When) {
latestTagCommit = commit
latestTagName = tagRef
previousTagReturn = previousTag
}

previousTag = tagRef

return nil
})
if err != nil {
return nil, nil, err
}

return latestTagName, previousTagReturn, nil
}

Next he checks if the latestTag is the same as the repository HEAD.

tillLatest := false

if latestTag != nil {
if latestTag.Hash().String() == repo.ref.Hash().String() {
tillLatest = false
} else {
tillLatest = true
}
}

Code from main function to check if the first commit is the same as the repository HEAD.

I’d also like to remove the tillLatest check and instead make that it’s own function. This is so that its easier to test and also easier to reuse in case the package grows.

He also reuses the exact same code in isCommitToNearest function later at line 116, every time we use code more than once, break it out into its own function to avoid duplicate code. We need to see what’s wanted in both use cases and make sure we can use our new function in both cases.

isCommitToNearestTag also contains a small bug, It double checks an error and uses log.Fatal. The second error check is actually never accessible, since the first error check would cancel since its using log.Fatal. It seems the tillLatest boolean flag is used to force a check against the latestTag.Hash().

So the code will actually perform this check once, only to force it to be performed again later. This is something we can remove so its only performed once, not that we have a optimization problem, but no reason to rerun code.

func isCommitToNearestTag(repo *git.Repository, commit *object.Commit, tillLatest bool) bool {
latestTag, previousTag, err := utils.GetLatestTagFromRepository(repo)

if err != nil {
log.Fatal("Couldn't get latest tag...", err)
}
if err != nil {
log.Fatal("Couldn't access tag...", err)
}

if latestTag != nil {
if tillLatest {
return latestTag.Hash().String() == commit.Hash.String()
}
return previousTag.Hash().String() == commit.Hash.String()

}
return false
}

Old isCommittoNearestTag code.

I’m going to remove the code that does the first tillLatest check completely, it actually serves no purpose (Line 58–79). And when removing that, we will notice that we can remove the fetching of the latest tag altogether. Lets save that for later.

Our new function is going to be called IsCommitLatest and it will accept a pointer to object.Commit.

// IsCommitNearest will check if a commit tag Hash is equal to the current repository HEAD tag
// If the Hashes matches, it will return true
func (r *Repository) IsCommitNearest(commit *object.Commit) (bool, error) {
latestTag, previousTag, err := r.GetLatestTag()

if err != nil {
return false, fmt.Errorf("%v:%w", "Couldn't get latest tag...", err)
}

if latestTag != nil {
// Compare latest tag Hash with the repository HEAD hash
// Hash() returns a Slice which can be compared without converting to string
// Noticed by /OfficialTomCruise on reddit comments
if latestTag.Hash() == r.ref.Hash() {
return true, nil
}
return previousTag.Hash() == commit.Hash, nil

}
return false, nil
}

isCommitNearest can now be used inside the Switch without running twice.

After this is a final iteration that ranges through all the commits and appends them into a log. The logs are handeld by the logcategory package, so lets digest it to learn what it does.

Its also a very small package like utils was. Actually, its one struct with one function attached.

package logcategory

// LogsByCategory - Type to hold logs by each's category
type LogsByCategory struct {
CI []string
FIX []string
REFACTOR []string
FEATURE []string
DOCS []string
OTHER []string
}

// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
markDownString := ""

markDownString += "# Changelog \n"

if len(logContainer.CI) > 0 {
markDownString += "\n\n## CI Changes \n"

for _, item := range logContainer.CI {
markDownString += item + "\n"
}
}

if len(logContainer.FIX) > 0 {
markDownString += "\n\n## Fixes \n"
for _, item := range logContainer.FIX {
markDownString += item + "\n"
}
}

if len(logContainer.REFACTOR) > 0 {
markDownString += "\n\n## Performance Fixes \n"

for _, item := range logContainer.REFACTOR {
markDownString += item + "\n"
}
}

if len(logContainer.FEATURE) > 0 {

markDownString += "\n\n## Feature Fixes \n"
for _, item := range logContainer.FEATURE {
markDownString += item + "\n"
}
}

if len(logContainer.DOCS) > 0 {

markDownString += "\n\n## Doc Updates \n"
for _, item := range logContainer.DOCS {
markDownString += item + "\n"
}
}

if len(logContainer.OTHER) > 0 {

markDownString += "\n\n## Other Changes \n"
for _, item := range logContainer.OTHER {
markDownString += item + "\n"
}
}

return markDownString
}

gitcommits logcategory package.

So lets start with the struct. Its very rare to see capitalized names in structs, the last place I saw this used regularly was in Java. Maybe our author is a old java shark? Its nothing wrong with it, I kinda like it, but I know not every one does. Lets leave it as the author wants it.

The GenerateMarkDown function looks through the length of all its slices and creates strings based on them, along with a small header. It uses regular string concatenation (oldString += “hello ” + “world ”). It works in this case because the strings are pretty simple, but I’d actually recommend using a String.Builder. You can read about efficient strings.

It also repeats the same thing over and over for each Slice. And each slice in the struct is a []string. So lets instead create a function that does what we need.

// printCategory will output all items inside a Log slice and a title
func printCategory(output *strings.Builder, title string, logs []string) {
if len(logs) > 0 {
output.WriteString(fmt.Sprintf("\n\n## %s \n", title))
for _, item := range logs {
output.WriteString(item + "\n")
}
}
}

Accept a pointer to a string builder, a title and the logs to print

And now the GenerateMarkdown would instead look like this

// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
var output strings.Builder
output.WriteString("# Changelog \n")

printCategory(&output, "CI Changes", logContainer.CI)
printCategory(&output, "Fixes", logContainer.FIX)
printCategory(&output, "Performance Fixes", logContainer.REFACTOR)
printCategory(&output, "Feature Fixes", logContainer.FEATURE)
printCategory(&output, "Doc Updates", logContainer.DOCS)
printCategory(&output, "Other Changes", logContainer.OTHER)

return output.String()
}

Also, I’d rather have that in the main package, and remove the extra package of logcategory. So I’m putting all logcategory related logic into a new file in the main package called logcategories.go.

I also recommend keeping the main function clean and easy to view, so lets also create a file called repository.go. All code related to the Repository struct goes in there.

So for the last part of the code review, the commit iteration mentioned before. It goes through all commits and appends those messages to the appropriate Slice in the logcategory. You know, since I’d recommend keeping main clean. This is actually something that I’d move into the logcategories file. Also I want to remove it from the main so It can be tested easier.

I find it hard to test things residing in the main. I recommend always breaking things out so its easier accessible in tests.

I’ve created a new function for LogsByCategory struct which will handle the logic for us.

// AddCommitLog will take a commitHash and a commitMessage and append them to their
// apropriate Slice
func (logContainer *LogsByCategory) AddCommitLog(commitHash, commitMessage string) {
message := fmt.Sprintf("%s - %s", commitHash, normalizeCommit(commitMessage))

switch {
case strings.Contains(commitMessage, "ci:"):
logContainer.CI = append(logContainer.CI, message)
case strings.Contains(commitMessage, "fix:"):
logContainer.FIX = append(logContainer.FIX, message)
case strings.Contains(commitMessage, "refactor:"):
logContainer.REFACTOR = append(logContainer.REFACTOR, message)
// Golang Switch allows multiple values in cases with , separation
case strings.Contains(commitMessage, "feat:"), strings.Contains(commitMessage, "feature:"):
logContainer.FEATURE = append(logContainer.FEATURE, message)
case strings.Contains(commitMessage, "docs:"):
logContainer.DOCS = append(logContainer.DOCS, message)

default:
logContainer.OTHER = append(logContainer.OTHER, message)
}
}

Logic is broken out from main so its testable

package main

import (
"flag"
"fmt"
"log"
"os"
"strings"
)

func main() {

var path string
// Read user input
flag.StringVar(&path, "path", "", "A filepath to a folder containing a github repository")
// Parse Flags
flag.Parse()

// Make sure user has inserted the needed flags
if path == "" {
flag.Usage()
os.Exit(0)
}

repo, err := Open(path)
if err != nil {
log.Fatal(err)
}

commits, err := repo.GetCommits()
if err != nil {
log.Fatal(err)
}

logContainer := new(LogsByCategory)

// we no longer need to fetch latestTag here to compare tillLatest.

// itterate all commits and add them to the log based on hash and Message
for _, c := range commits {

logContainer.AddCommitLog(c.Hash.String(), c.Message)

nearestTag, err := repo.IsCommitNearest(c)
if err != nil {
log.Fatal(err)
}
if nearestTag {
break
}
}

fmt.Println(logContainer.GenerateMarkdown())

}

func normalizeCommit(commitMessage string) string {
var message string
for i, msg := range strings.Split(commitMessage, "\n") {
if i == 0 {
message = msg
break
}
}
return strings.TrimSuffix(message, "\n")
}

This is what our final main looks like.

Final thing, he mentions that he currently creates his own changelog in git actions. So lets modify .github/workflows/create-binaries.yml since we changed the way the program is executed. Remember we added the -path flag?

Line 30: ./commitlog . > CHANGELOG.txt
// Replaced with
.commitlog -path=. > CHANGELOG.txt

So! I’m done for now.

Hopefully you’ve enjoyed my small code review and my process of making it.

Upvote


user
Created by

Programming Percy


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles