cft

Memory management - Deep and Shallow Copying

Memory allocation in variable assignment. What happens when we duplicate variables?


user

Marvin Kweyu

3 years ago | 4 min read

Let's go back one moment. A little further down to our data structures. The dear heaps and stacks of them.Quite literally.


What happens when I assign a variable? What about when I pass it as a parameter?

How does the program know this is what I'm talking about?

Before we go to a higher level of abstraction, that is, looking at it like a note title taken on a piece of paper so that you can flip the page and just look at the variable value, we should look at how the machine ‘thinks‘ about it.


Close your eyes for a moment and declare a variable:

my_variable = "random";

See, when you do this, you're adding to the list of things the program has to remember. It's like one on top of another of 'cart' lists. A stack.


data to remember
data to remember


As it stands, your program will store your string as below:

memory-allocation
memory-allocation


Where the first diagram with the pointer(location of the variable in memory), len(for the length of the string) and capacity(the amount of memory, in bytes, the machine gives to this variable for use) is a stack and the other is a heap. Whatever we do with this variable, be it a mutation, concatenation, making a substring, or whatever there is, refers to the stack.

In the stack, the pointer holds the location of where the actual data is stored. So we are just given a reference to it.


Why do we do this? Why do we store it in different data structures?

Simple. It's about the speed.

Stacks are faster compared to heaps. So instead of moving around a whole chunk of data(the heap) while mutating it, just carry the reference to it. I mean, the program is already doing its tasks (whether heavy or not), so there is no need to add overhead here.


A point to remember, not all data is assigned as such. For static data types, that is, boolean, integers, floats, and chars, variables are added directly onto the stack. So we would have no heap to store a simple 456.98 because the sizes of these types are already known by the program except in the rare case it is user input!

The size of these types, more so numbers(integers and floats), are determined based on whether they can be negative (signed) or exclusively positive(unsigned). This should remind you of how you declare your variables in math. You would say that any number in your paper is positive unless stated otherwise, or as we call it here unless signed.


So this assignment would work with compound data types - the result of combining two or more static types.

Example:

  • string (a combination of chars)
  • arrays
  • tuples

... and so forth, depending on how your language of choice calls it, for instance, dictionary vs javascript object.


Back to copying.

You want two variables to refer to the same thing and you want to edit one of the variables without affecting the other.

I suppose you could assume a simple re-assignment, right?


my_variable = "random"
my_other_variable = my_variable


Whether it's console.log(), print() or printf(), (choose your weapon and let's make the battle legendary!), anything that your muscle memory has right now. The two will show random in stdout/ output. The caveat?

memory duplication
memory duplication


Yes, you've duplicated the stack, not the heap. Being as it is, every language's goal at optimum performance. Keeping it as tidy and efficient as possible. No overhead. Pointing to the same memory space.

So what happens if I mutate one variable?


my_other_variable + 'ramblings'
print(f"My second variable: {my_other_variable}")
print(f"My variable: {my_variable}")


In both cases, the output is a string:

random

To get completely two different items with the same data, in that both can be mutated independently, you have to take a different approach; deep copy.

A warning

As far as memory is concerned, deep copying is memory consuming as it has to get the pointer and follow it to where the data is stored then duplicate this heap.

Depending on what language you are using, we have the inbuilt copy module in python, javascript, and or copy for lower-level languages and so on and so forth (We cannot simply list all the ways to deep copy across the multiverse)


import copy
my_variable = "random"
my_other_variable = copy.deepcopy(my_variable)

Love javascript much?

let my_variable = "ramblings"
let my_second_variable = `${my_variable}`
my_second_variable = 'random ' + my_second_variable
console.log('My first variable',my_variable)
console.log('My second variable',my_second_variable)

There are, of course, other multiple ways of doing this. It is, after all, javascript.

A point to mark, especially with objects, lodash, dearest ramda or rfdc works perfectly. Custom method for your implementation? Go ahead, just not JSON.stringify().


The mad rustacean?

let my_variable = String::from("random");
let my_other_variable = my_variable.clone();


Having done this, you can manipulate your new variables in any way you want. Go to the moon if need be. Just need a couple of dollars more.

deep copy
deep copy


It is this same principle that governs the passing of variables across functions and objects. Passing a pointer to the original data and not the whole `heap`. Comprende? I sure hope so. So go forth and choose wisely.

Let's leave this piece at that, and chat in the comments if need be.

And yes, we can chat tech on Twitter too. marvinus_j

Originally published on TheGreenCodes

Upvote


user
Created by

Marvin Kweyu

Software engineer. I read, code and stay exceptionally weird


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles