Backing up to USB with a batch Script in Microsoft Windows

I have recently reentered the world of work and have been issued with a brand new computer that has Windows 10 on it. In years passed, I developed an aversion to using Windows and was frustrated at each institution that pressed a computer into my hands with this proprietary operating system on me.

But with age, my zealousness and my idealism have waned and I am much more comfortable with using Windows, particularly now that Windows has an embedded Ubuntu subsystem.

I find, too, that once in a while I find something about Windows that I genuinely like. With an increasing number of training materials on my hard drive I have become increasingly paranoid that I will suffer a hard drive failure and lose all of my materials. For the moment I have taken to backing up my data to an USB thumb drive. In this article I will show you my approach in the hope that it will provide you some marginal value.

Fixing the USB thumb drive Drive Letter

To make it easier for us to write our script, firstly, we can ensure that each time we mount our USB thumb drive it will mount to the same logical drive letter. In my case, I chose the Z: drive meaning that any other media won’t accidentally be mapped to the same drive letter accidentally.

To achieve this, first launch the Disk Management utility. You can achieve this by pressing the Windows key then typing diskmgmt.msc and choosing the result that is listed. Insert your USB thumb drive and see it appear in the list of drives in Disk Management.

Right click on the the drive and choose “Change Drive Letter and Paths…”. Then change the drive letter to Z:. Each time you mount your drive in Windows from now on, it will mount to the drive letter Z.

Write the Script

Now that the drive is mounted predictably, a very simple batch script can be created. The purpose of this script will be to backup the contents of our Documents folder to the Z:\ and then un-mount the USB drive so we can pull it out as soon as the script has finished. As you can see, the script will give an error message if the Z: is not mounted. It uses ROBOCOPY to efficiently mirror the contents of the Documents directory to Z: and then to follow, it unmounts the volume.

Paste the contents of the above into a text editor and save as a .bat on your desktop.  Whenever you want to backup, insert your USB stick and then double-click the script. When it has successfully unmounted it will tell you you can pull out the USB thumb drive.

Bioconductor Tip: Use affycoretools to get Gene Symbols for your ExpressionSet

For whatever reason, following on from my despair with normalizing gene expression data from earlier in the week, my most recent challenge has been to take a Bioconductor ExpressionSet of gene expression data measured using an Affymetrix GeneChip® Human Transcriptome Array 2.0 but instead of labeling each row with its probe ID having it mapped to its corresponding gene symbol.

I have seen a lot of code samples that suggest using variations on a theme of using the biomaRt package or querying a SQL database of annotation data directly:  in the former I gave up trying; in the latter, I ran away to hide, having only interacted with a SQL database through Java’s JPA abstraction layer recently.

It turns out to be very easy to do this using the affycoretools package by James W MacDonald which contains ‘various wrapper functions that have been written to streamline the more common analyses that a core Biostatistician might see.’

As you can see below, you can very easily extract a vector of gene symbols for each of your probe IDs and assign it as the rownames to your gene expression data.frame.

I hope this will save you the trouble of finding this gem of a package.

Be pragmatic about your choice of laptop in Bioinformatics

Recently I have been familiarising myself with analysing microarray data in R.  Statistics and Analysis for Microarrays Using R and Bioconductor by Sorin Draghici is proving to be indispensible in guiding me through retrieving microarray data from the Gene Expression Omnibus (GEO), performing typical quality control on samples and  normalizing expression data over multiple samples.

As an example, I wanted to examine the gene expression profiles from GSE76250 which is a comprehensive set of 165 Triple-Negative Breast Cancer samples. In order to perform the quality control on this dataset as detailed by the book, I needed to download the Affymetrix .CEL files and then load them into R as an AffyBatch object:

The raw.data AffyBatch object representing these data when loaded into R takes over 4 gigabytes of memory. When you then perform normalization on this data using rma(raw.data) (Robust Multi-Array Average), this creates an ExpressionSet that effectively doubles that.

This is where I come a bit unstuck. My laptop is an Asus Zenbook 13-inch UX303A which comes with (what I thought to be) a whopping 11 gigabytes of RAM. This meant that after loading and transforming all the data onto my laptop, I had effectively maxed out my RAM. The obvious answer would be to upgrade my RAM. Unfortunately, due to the small form factor of my laptop, I only have one accessible RAM slot meaning my options are limited.

So, I have concluded that I have three other options to resolve this issue.

  1. Firstly, I could buy a machine that has significantly more memory capacity at the expense of portability. Ideally, I don’t want to do this because it is the most expensive approach to tackling the problem.
  2. Another option would be to rent a Virtual Private Server (VPS) with lots of RAM and to install RStudio Webserver on it. I’m quite tempted by the idea of this but I don’t like the idea of my programming environment being exposed to the internet. Having said this, the data I am analysing is not sensitive data and, any code that I write could be safely committed to a private Bitbucket or Github repository.
  3. Or, I could invest the time in approaching the above problem in a less naive way! This would mean reading the documentation for the various R and Bioconductor packages to uncover a more memory restricted method or, it could mean scoping my data tactically so that, for instance, the AffyBatch project will be garbage collected, thereby freeing up memory once I no longer need it.

In any case, I have learned to be reluctant to follow the final path unless it is  absolutely necessary. I don’t particularly want to risk obfuscating my code by adding extra layers of indirection while, at the same time, leaving myself open to making more mistakes by making my code more convoluted.

The moral of the story is not to buy a laptop for its form factor if your plan is to do some real work on it. Go for the clunkier option that has more than one RAM slot.

Either that or I could Download More Ram.

Personal project success

With a glut of free time of late, I have chosen to take some time to
write some code as part of a personal project. Primarily, I wanted to
really dive deeply into Spring framework beyond the basics of web
application development. At the same time I didn’t want the effort to
go to waste and so I have decided that I really want the project to
have a use in the real world.

With this in mind, I have decided to develop an app that allows
learners and speakers of the Welsh language to find
restaurants, coffee shops and (possibly) other kinds of services that
cater to their language needs. It’s a simple idea but, surprisingly,
there is very little in the way of this kind of thing online.

But this is a bit pie in the sky. I’ve had a number of ideas in the
past that have never come to anything. At work I’ve found that
implementing software with another person’s criteria in mind is
usually fairly easy to accomplish but, invariably I’ve failed
miserably when it comes to putting my own ideas into action. I’ve been
wondering why that is.

I have a fair few ideas why this was the case. Suffice to say, here
are some ideas I’ve had in order to try and overcome these difficulties
that, so far, seem to prove successful.

  1. Breaking down the task

    When I settle on an idea, if I meditate for a little while on the
    totality of all the work to be done, I find the whole thing
    impossible!

    If I use a physical kanban board I find this helps a lot. For more
    details on Kanban, see this great Google Talk with Eric
    Brechner
    . Writing small tasks on little cards and placing
    them on a cork board, I can instantly see the subsystems of the
    software, the features I have considered.

    I’ve taken to placing a pen and a small pile of cards next to the
    board so that the instant I have an idea, I can write it on a card
    and place it in the list of pending tasks. As I’m both building the
    software and deciding the parameters what it’s supposed to do, I
    can then easily discard any cards for ideas that turn out not to be
    so good.

    Also, without being tied down by anyone in for a deadline, it’s
    quite easy to arrange the order of the cards any reason that suits
    you. In my case, I try to work on tasks that present parts of
    Spring Framework that I haven’t used before and flip between these
    tasks and other that might be easier for me.

  2. Little and often

    When committing to read a book, I will never hold myself to a
    deadline. More often than not it’s because I find that I rarely
    have the time to commit to it. Some days I am far more motivated to
    read where I can read a number of chapters in one sitting whereas
    other days I have more important things to do.

    What keeps me going is making to commitment to do a little bit each
    day. Ideally I try to read for an hour but there are many times
    where I will only read for twenty minutes. But, by consistently
    doing something each day, I eventually get through the book and I
    don’t lost track of the narrative.

    Similarly, with my personal project, I am in the habit of doing a
    little programming each day. If I have little or no motivation,
    writing a small unit test or refactoring a small piece of code
    helps me to continue to progress with the project as well as making
    me more familiar with the code in the code base.

    Another thing that has really helped in this regard for me is the
    contribution graph on GitHub. Trying to keep as much of it green
    gives me a sense of achievement. For reading, I update my progress
    on goodreads. Sorry to all who are on the receiving end of them!

  3. Stick to the stack

    This has been a huge impediment to me in the past. Sticking to one
    technology to implement a personal project is particularly
    difficult when there are so many languages, frameworks and
    databases.

    In this sense, the best tool is the one that you are already
    using. Of course you can do a rewrite if absolutely necessary but
    context switching to another language or framework is probably more
    costly than the gains you might get from a more modern set of
    libraries. If I really want to learn the latest and greatest of the
    JavaScript MVC frameworks, I’ll do all that I can to make sure I do it
    on a new project instead of retrofitting it to this one!

    In this way, I can concentrate on becoming better at using this
    framework or that language in a way that I wouldn’t have had I
    switched technologies frequently. Also, having a deeper knowledge
    of one technology may help you to better understand the
    capabilities of another. For example, understanding web development
    using the Spring framework will help you to understand the
    capabilities and underlying mechanisms of Groovy on Grails.

    Using a particular set of technologies for an extended period of
    time will also allow you to write code in a more idiomatic way that
    makes writing code more enjoyable and also, easier to write. Your
    project will become less of a burden and you will find that you can
    achieve more in a shorter period of time.

Ultimately, to make a personal project succeed, what I think it’s
important to have a clear idea set in your mind, hopefully to program
something that is going to have a real purpose. While the idea needs
to be clear, the detail should be broken down. Any thought of a
potential feature should be written down before it’s forgotten and can
be then be discarded later if it turns out not to be valuable.

Making sure that the project is worked on often , ideally daily, helps
to keep thoughts flowing and ensures that things stay productive.

Most importantly, not getting sidetracked by other thoughts and ideas,
usually other technologies cuts short the tendency to slow progress by
switching the focus or underlying tools used to complete the project.