Tag Archives: science

You can be an academic YouTube STAR!

Many universities have begun exploring the use of the Internet for sharing academic coursework, either via “flipped classrooms” or Massive Open On-Line Courses (MOOCs).  Over the last year, I have uploaded approximately 50 videos to my YouTube Channel, most of them academic lectures.  I hope that I have learned something in this process that will you to publish your work more broadly, as well!

I would start by explaining that my lectures come from multiple purposes and even multiple university campuses.  My longest-running series of lectures came from a weekly seminar on topics of my own choosing called “the Useful Hour.”  I produced fourteen of these sessions (with help from Brigitte Glanzmann when I had to be away for a week), though I only started recording them on video for the last twelve.  I recorded the eight-session bioinformatics module from our division’s B.Sc. Honours program as a trial run for creating a “flipped classroom” in future years (a model where students watch lectures outside of class and spend in-class time working exercises).  More recently, I collaborated with the H3Africa BioNet to produce a four-lecture module on Gene Expression.  From time to time, I help the Tygerberg Postgraduate Student Council by recording a lecturer.  Each of these experiences has had its own lessons to convey.

The technical aspects of recording a video are generally easy enough that even a Ph.D. can do it!  Today’s budget camcorders capture more detail with better sound under lousier conditions than did cameras that cost five times as much even five years ago.  Best of all, one no longer needs to wrestle with tapes and analog-to-digital transfer loss.  Today we simply pull the Secure Digital card out of the camcorder and plug it into the socket on a laptop, where the video files are instantly accessible.  Of course, many people record video using digital cameras or cell phones.  Preparing videos for upload to a public server, however, is frequently more difficult than the initial capture.  I’ll talk about these aspects below.

Focus on the speaker

pointer

Speak softly, and carry a big stick!

We must start with video that is worth watching.  Far too frequently, I see that people recording lectures focus on the slides rather than the person who is delivering the lecture.  Reading text from video is generally unpleasant, and the reality is that looking at people fires circuits in our brains that academic content does not.  Video is a format designed to capture motion; it is a notoriously inefficient method for capturing still images, though!  Keeping the camera on the speaker, then, makes more sense.  This comes with some caveats:

  1. Viewers still need to be able to see the slides.  My answer has been to produce a PDF from the PowerPoint or other presentation software, since almost everyone has the ability to view PDFs on any platform.  I post the PDF to a shared directory on Google Drive, and I include the URL leading to the PDF in the YouTube description.
  2. From time to time a researcher will point to a particular part of a slide.  This is probably problematic on video; if he or she has used a laser pointer, the spot of light will either be too bright (green) or too dim (red) to appear well on video.  A moving mouse pointer might be better.  If the speaker is old-school (like me), he or she may use a stick to point at the slide instead.  This can create a problem of the lecturer “blooming” as he or she moves away from the bright field created by the projector into the relative dark outside the projector’s light.
  3. How will a person watching the video know to advance to the next slide?  Hopefully the speaker says “next slide” out loud.  When my parents recorded my brother’s and my first efforts to read aloud, they told us to bang a spoon against a mug to produce an audible chime with each page turn.  That was even more fun than reading!
  4. Software is publicly available to integrate the slowly-changing slide video with the quickly-moving speaker video.  Screencast-O-Matic will produce videos of up to fifteen minutes in its free version.  This approach will guarantee that your viewers are seeing the same slide the lecturer is seeing as the talk progresses.
screencast

Screencast-O-Matic insets your image atop the slides you are presenting.

Light and detail go hand in hand

As I alluded above, lighting is frequently a problem in academic lecture videos.  We frequently keep our lecture halls very dim in order to make the slides stand out as much as possible.  In a large venue, you may have a spotlight on the speaker, which will help.  In a medium venue, you may have a light in the ceiling directly above the speaker, which can make him or her appear somewhat ghoulish.  The more you rely upon zoom, the less light will reach your camera!  Keep that camera close.  If you can open the blinds on a window so that your speaker is lit, you will have a more interesting video.  Try to find ways to position your camera between the light and the subject (without casting a shadow, of course).  Never forget that the projected slides are much brighter than the subject you are trying to record.  If even the corner of the projected image appears in-shot, expect the speaker to become a flat silhouette.

Today’s cameras can record in very high resolutions, such as 1080p (the same as your HD television).  If lighting is truly problematic, you may want to consider forcing your high-resolution camera to a lower resolution, such as 720p; this may allow it to combine intensities across multiple transistors for each pixel.  Similarly, you should expect that a camera with a larger “retina” will outperform one with a tiny CCD in low light.  To put this in plain terms, do not expect a cell phone to produce quality video in semi-darkness, no matter the name on the label.  That said, I have observed that my “mirrorless” Canon EOS-M2 is inferior to my much cheaper Canon VIXIA HF R62 for video.  The lenses and electronics of the EOS-M2 are optimized for photos, not video.

Privacy issues are a big deal

Ensure that your audience knows that the lecture is being recorded.  Bad things can happen when a person does not want his or her image to be on-line and somebody else decides that they shall be.  Imagine how much worse this becomes when that member of the audience is a minor!  Nobody should be forced into public view because he or she attends a talk.

We frequently expect a period of questions and answers at the end of a lecture (and sometimes in the middle).  A novice camera operator may automatically swing to capture the questioner in action.  Depending on the situation, this part of the video may need to be truncated outright due to privacy issues.

Video is big and hard to handle well

20170831-This-Big

I use my hands a lot.

When I upgraded to my Canon VIXIA HF R62 from a JVC Everio (GZ-HM30AU), I had a rude shock.  My old camera had captured 720p video in very manageable MTS files, but the new camera captured 1080p video in massive MP4 volumes.  I used a 16 GB SDHC card for videos.  The cameras assumed that no file should be allowed to be larger than 4 GB (linked to 32-bit computing).  With the new camera, I consume 4 GB every 33 minutes!  At a couple of long events I recorded, I found that I needed more storage than the 16 GB card could provide.  I solved that problem by upgrading to a 64 GB card.

Naturally, keeping the raw footage of every event I video is not practical.  If each of the 50 videos I posted to YouTube over the last year produced 66 minutes of raw footage, I would need to archive 400 GB for just this period!  Similarly, posting these videos to YouTube would be a problem.  Each hour would span two files, which would require my viewers to watch part ‘A’ and then queue up part ‘B’ immediately afterwards; many would just skip watching the end, humans being humans.  To compound the problem, I live in South Africa, which means my upload speeds to network servers are dreadfully slow.  My home DSL line, for example, achieves 0.3 Mbps.  I have uploaded one GB before, but it takes hours.  In any case, I will probably need to truncate a bit of time off the front and the back of the video.  In short, I need to do video editing.

ffmpeg-summer-v2

While semi-professionals might opt for Adobe Premiere and those who “think different” will break out iMovie, I am a bioinformaticist, and I like software that lets me master high-quality videos with a minimum of fuss and bother.  I use ffmpeg, a very powerful suite of tools that one can use directly on the command line.  Most of the time, I am (a) concatenating my source video files into one movie, (b) including only a middle section, and (c) writing a more compact movie from the source materials.  To use a recent example, I have two input files; I write their names into a file called list.txt:

file mvi_0031.mp4
file mvi_0032.mp4

Next, I run a command line that looks like this:

ffmpeg.exe -ss 00:00:15 -f concat -safe 0 -i list.txt -t 00:50:00 -c:v libx264 -preset slow -c:a copy output.mp4

In order, the options do the following:

  • -ss specifies where in the combined files ffmpeg will start the output video (in this example, after the first fifteen seconds).
  • -ff concat -safe 0 -i list.txt specifies that the files listed in list.txt should be combined into one video and that they are formatted the same way.
  • -t specifies the total duration of the video to be encoded (in this example, exactly fifty minutes).
  • -c:v libx264 -preset slow specifies that my output video will be MPEG 4 pt 10, a very common format for storing video (and one that YouTube knows how to read).
  • -c:a copy directs ffmpeg not to re-compress the audio, making it sound just as nice in the output as it did in the original.

The ffmpeg software is very good at reducing the size of videos without compromising its quality.  I find that I can represent an hour-long lecture in a two GB 1080p video, rather than the nearly 8 GB of source footage.  If I am filled with caffeine for my lecture, the video size increases a bit (more motion requires more bits for accurate representation).

These smaller videos can then be uploaded to my YouTube account.  Happily, if you have a Gmail account (or if you use a different email address to log into Google Services), you can simply use that login for YouTube.  One clicks the arrow pointing up, and a screen will appear to which you drag your video file.  All done, right?

No job is finished until the paperwork is through!

Meta-data is key to your video reaching an audience, and too few people spend adequate time on this step.  I would call your attention to both the “Basic Info” and “Advanced Settings” pages that video authors can complete.  Of course, you should enter a paragraph of information in the basic description blank.  Ask yourself what web searches should find your video, and be sure you include those key terms in the text.  For good measure, add them again in the keywords section!  I like to include the university name where the recording took place.  Hopefully the social media minders for these schools will highlight your video to their large audiences.  YouTube will sniff the video for still frames that might be representative for the video.  I always try to pick the one in which I do not look like I’m suffering a fit of some sort.

Advanced Settings has more options to help users find your video.  Pick a category; generally my lectures fall in the “Science and Technology” category.  Be sure to enter a video location.  Google will translate your information to GPS coordinates so people can find videos shot near particular locations.  Enter a recording date, and select the language of your video (especially if you are not using English).

In many cases, you will have several videos that belong together as a set.  When I produced a short biography and four videos on Gene Expression for H3A BioNet, I also created a “playlist” that contained all five videos in the correct order.  Remember, if you can hook a viewer into watching one of your videos, you might be able to retain their interest for a few more!  Ideally, people will like your stuff enough that they subscribe to your YouTube channel, receiving a notification every time you post a new video.  You will be launched on your next career as a YouTube star!

Advertisements

What protein database is best for tuberculosis?

As many of you know, I have specialized in the field of proteomics, the study of complex mixtures of proteins that may be characteristic of a disease state, development stage, tissue type, etc.  Here in South Africa, my application focus has shifted from colon cancer to tuberculosis.  As a newcomer to this field, I’ve been curious to know whether the field of tuberculosis has good information resources to leverage in its fight against the disease.

The key resource any proteomics group can leverage is the sequence database, specifically the list of all protein sequences encoded by the genome in question.  The human genome incorporates around 20,310 protein-coding genes (reduced from estimates of 26,588 from the 2001 publication), but those genes code for upwards of 70,000 distinct proteins through alternative splicing. Bacteria are able to get by with far smaller numbers of genes.  E. coli, for example, functions with only 4309 proteins.  The organism that infects humans and other animals to produce tuberculosis is named Mycobacterium tuberculosis.  If we were to rely upon the excellent UniProt database, from which I quoted E. coli protein-coding gene counts, we would probably conclude that M. tuberculosis relies upon even fewer genes: only 3993 (3997 proteins)!

logo_7

UniProt is an excellent all-around resource for proteomics, but researchers in a particular field usually gravitate to a data resource that is particular to their organism.  People who work with C. elegans for developmental studies, for example, use WormBase.  People who study genetics with D. melanogaster would use FlyBase.  People in tuberculosis have frequently turned to TubercuList for its annotation of the M.tb genome (comprising 4031 proteins).  This database, however, has not been updated since March of 2013 (available from the “What’s New” page).  Can it still be considered current, four years later?

cms_refseq10years

e-ensembl

As a recent import from clinical proteogenomics, my first impulse is still to run to the genome-derived sequence databases of NCBI, particularly its RefSeq collection.  I found a NCBI genome for M. tuberculosis there, with a  last modification date from May 21, 2016 and indicating its annotation was based upon “ASM19595v2,” a particular assembly of the sequencing data.  This was echoed when I ran to Ensembl, another site most commonly used for eukaryotic species (such as humans) rather than prokaryotic organisms (such as bacteria).  Their Ensembl tuberculosis proteome was built upon the same assembly as was the one from NCBI.

JGI_logo_stacked_DOEtag_UF_CMYK

As a former post-doc from Oak Ridge National Laboratory, I am always likely to think of the Department of Energy’s Joint Genome Institute.  The DOE sequences “bugs” (slang for bacteria) like nobody’s business.  Invariably, I find that I can retrieve a complete proteome for a rare bacterium at JGI which is represented by only a handful of proteins in UniProt!  This makes JGI a great resource for people who work in “microbiome” projects, where samples contain proteins from an unknown number of micro-organisms.  In any case, they had many genomes that had been sequenced for tuberculosis (using the Genome Portal, I enumerated projects for Taxonomy ID 1773).  I settled for two that were in finished state, one by Manoj Pillay that appeared to serve as the reference genome and another by Cole that appeared to be an orthogonal attempt to re-annotate the genome from fresh sequencing experiments.

The easiest way to compare the six databases I had accumulated for M. tuberculosis is to enumerate the sequences in each database.  The FASTA file format is very simple; if you can count the number of lines in the file that start with ‘>’, you know how many different sequences there are!  I used the GNU tool “grep” to count them:

grep -c "^>" *.fasta
  • TubercuList: 4031 proteins
  • NCBI GCF: 3906 proteins
  • DOE JGI Cole: 4076 proteins
  • DOE JGI Pillay: 4048 proteins
  • Ensembl: 4018 proteins
  • UniProt: 3997 proteins

So far, one could certainly be excused for thinking that these databases are very nearly identical.  Of course, databases may contain very similar numbers of sequences without containing the same sequences.  One might count how many sequences are duplicated among these databases, but identity is too tough a criterion (sequences can be similar without being identical).  For example, database A may contain a long protein for gene 1 while database B contains just part of that long protein sequence for gene 1.  Database A may be constructed from one gene assembly while Database B is constructed from an altogether different gene assembly, meaning that small genetic variations may lead to small proteomic variations.

pgec20header20final20editI opted to use OrthoVenn, a rather powerful tool for analyzing these sequence database similarities.  The tool was published in 2015.  Almost immediately, I ran into a vexing problem.  The Venn diagram created by the software left out TubercuList!  I was delighted to get a rapid response from Yi Wang, the author of the tool (through funding of the United States Department of Agriculture’s Agricultural Research Service).  The tool could not process TubercuList because it contained disallowed characters in its sequence!  I followed his tip to sniff the file very closely.  I found that both sequence entries and accession numbers contained characters they should not.  Specifically, I found these interloping characters:

+ * ' #
jVenn_chart

OrthoVenn Venn chart

Scrubbing those bonus characters from the database allowed the OrthoVenn software to run perfectly.  Before we leave the subject, I would comment that these characters would cause problems for almost any program designed to read FASTA databases; in some cases, for example, the protein containing one of those characters might be prevented from being identified because of these inclusions!  My read is that they were introduced by manual typing errors; they are not frequent, and they appeared at a variety of locations.  Let’s remember that they have been in place for four years, with no subsequent database release!

Most people are accustomed to seeing Venn diagrams that incorporate two or three circles.  In this case I compelled the software to compare six different sets.  The bars shown at the bottom of the image show the numbers of clusters in each database; note that these differ from the number of sequences reported in my bullet list above because OrthoVenn recognizes that sequences within a single database may be highly redundant of each other!  (If sequences were completely identical, they could be screened out by the Proteomic Analysis Workbench from OHSU.)  Looking back at the six-pointed star drawn by the software, we might conclude that the overlap is nearly perfect among these databases.  We see four clusters specific to the JGI Pillay database, and 131 clusters specific to some sub-population of the databases, but the great bulk of clusters (3667) are apparently shared among all six databases!

Venn

The Edwards visualization from OrthoVenn

Oh, how much difference a visualization makes!  Shifting the visualization to “Edwards‘ Venn” alters the picture considerably.  Now we see that the star version hides the labels for some combinations of database.  We see that 3667 clusters are indeed shared among all six databases.  After that, we can descend in counts to 131 clusters found in the Pillay and Cole databases from JGI; does this reflect a difference in how JGI runs its assemblies?  Next we step to 106 clusters found in UniProt, Ensembl, Tuberculist, and NCBI GCF, but neither of the JGI databases.  The next sets down represent 70 clusters found in all but NCBI GCF or 25 clusters found in all but the two JGI databases and NCBI GCF.

I interpret this set of intersections to say that tuberculosis researchers are faced with a bit of a dilemma.  If they use a JGI database, they’ll miss the 106 clusters in all the other databases.  If they use Ensembl or TubercuList, they will include those 106 but lose the 131 clusters specific to the JGI databases.  Helpfully, OrthoVenn shows explicitly which sequences map to which clusters.  Remember that when I downloaded the Ensembl and NCBI databases, I saw that they were both based upon a single genome assembly called ASM19595v2.  Did they contain exactly the same genes?  No!  Ensembl contained two fairly big sets of genes that NCBI omitted, including 70 and 25 protein clusters, respectively.  NCBI contains another 11 protein clusters that were omitted from Ensembl.  Just because two databases stem from the same assembly does not imply that they have identical content.

For my part, I may use some non-quantitative means to decide upon a database.  I do not like making manual edits to a database since then others need to know exactly which edits I’ve made to reproduce my work.  That takes away TubercuList.  Next, I feel strongly that the FASTA database should contain useful text descriptions for each accession.  Take a look at the lack of information TubercuList provides for its first protein:

Rv0001_dnaA

That’s right.  Nothing!  The Joint Genome Institute databases are quite similar in omitting the description lines. Compare that to what we see in the NCBI and UniProt databases:

NP_214515.1 chromosomal replication initiator protein DnaA [Mycobacterium tuberculosis H37Rv]
sp|P9WNW3|DNAA_MYCTU Chromosomal replication initiator protein DnaA OS=Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) GN=dnaA PE=1 SV=1

That’s much more informative. We’ve got missing data here, too, though. Tuberculosis researchers have grown accustomed to their “Rv numbers” to describe their most familiar genes/proteins, but NCBI and UniProt leave those numbers out of well-characterized genes; the Rv numbers still appear for less well-characterized proteins, such as hypothetical proteins. By comparison, Ensembl includes textual descriptions as well as Rv numbers in a machine-parseable format for every entry:

CCP42723 pep chromosome:ASM19595v2:Chromosome:1:1524:1 gene:Rv0001 transcript:CCP42723 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:dnaA description:Chromosomal replication initiator protein DnaA

On this basis, I believe Ensembl may be the best option for tuberculosis researchers. It is kept up-to-date while TubercuList is not, and it allows researchers to refer back to the old Rv number system in each description.

I hope that this view “under the hood” has helped you understand a bit more of the kind of question that occasionally bedevils a bioinformaticist!

Are you ready to start a molecular biology M.Sc.?

Professors receive a lot of requests from international students for admission to post-graduate training.  In South Africa, that training could be for “Honours” (a one-year course), an “M.Sc.” (a two-year Master’s program), or a “Ph.D.” (typically three years, post Master’s).  For students changing from one country to another, however, the question of “equivalencies” is key.  Could a four-year B.Sc. (Bachelor’s of Science) from Egypt, for example, be treated as the same thing as a three year B.Sc. followed by one year of Honours in South Africa?  This post gives an example of the questions I asked as I recently tried to determine the right level of admissions for an international student.

The international office for my university had declared that a student’s four-year degree was certainly equivalent to a three-year B.Sc. in South Africa, but it left to the department’s discretion whether or not Honours training was required before a M.Sc.  To support the department’s decision, I decided to build an interview from questions that would delineate the limits of the candidate’s knowledge.  I used the roster of topics for the Division of Molecular Biology and Human Genetics 2017 Honours as a guide.  I used the number of didactic training days for each topic as a weight:

Field Duration
Molecular Biology 8 days
Mycobacteriology 7 days
Biostatistics 12 days
Bioinformatics 8 days
Immunology 8 days
Cell Biology 8 days
Scientific Communication 2 days

I also gave some consideration to the M.Sc. project the student would pursue in my laboratory.  In this case, the work related to the reproducibility of mass spectrometry experiments.  After pondering before my word processor, I selected these questions for the candidate’s interview:

# Field Question
1 Cell Biology What biological processes are described by the Central Dogma of molecular biology? Walk us through each.
2 Biochemistry What do we describe with Michaelis-Menten kinetics?
3 Computer Science How does iteration differ from recursion?
4 Analytical Chemistry By what property does a mass spectrometer separate ions?
5 Medicine In HIV treatment, what is the purpose of a “protease inhibitor?”
6 Biostatistics What role does the “null hypothesis” play in Student’s t-test?
7 Medicine What type of pathogen causes tuberculosis?
8 Genetics What is the purpose of a plasmid vector in cloning? What features do such vectors commonly contain?
9 Cell Biology What cellular process includes prophase, metaphase, anaphase, and telophase?
10 Mathematics The log ratio (base 2) between two numbers is 3. What is the linear ratio?
11 Immunology What is an antibody, and what is its relationship to an antigen? What are the major families of antibodies?
12 Computer Science What is the purpose of an Application Programming Interface (API) or “library?”
13 Biochemistry What do we describe as the secondary structure of a protein?
14 Genetics Of what components are nucleic acids constructed?
15 Biostatistics What is a Coefficient of Variation?
16 Mathematics If I divide the circumference of a circle by its diameter, what value do I get?
17 Immunology What type of immune cell is the primary factory for antibodies?

The interview, conducted via Skype, lasted approximately an hour. As I asked each question, I gave the question orally and pasted the text of that question into the chat session. Remember that as an American, I have a “foreign” accent for the English-speaking population of Africa! I did not want that to be a factor in the candidate’s performance. I was grateful that our division’s Honours program coordinator, Dr. Jennifer Jackson, accompanied me during the interview, both to monitor that the candidate was treated fairly and to ask follow-up questions of her own.

Why did it take an hour to answer these questions? As is customary in post-graduate education, each answer opened the door to a series of other questions. A student may give an answer that covers only part of the question, and the follow-up will poke into the omitted area to see if it is an area of weakness, almost like a dentist with an explorer goes after a darkened area of a tooth to see if it represents dental decay!

Another factor that I want to measure for students is the degree of integration that they have achieved in their educations. To recognize that a word has been mentioned in class is not sufficient; I need to see that students understand how key concepts relate to each other. This synthesis is sometimes hard to evaluate, but it’s important. A student who doesn’t understand how a concept integrates with others will not be able to apply the principle or recognize when it should come into play.

Before the readers of this blog begin showering me with applications, I need to emphasize that the questions I framed for this particular interview are not the questions I would ask of another candidate. The ones above were chosen to reflect the background of the candidate, the diploma program to which he or she had applied, and the nature of the project I had in mind.

I hope that this post will help you decide whether or not you are ready to plunge into post-graduate education!

Young David steps out of his comfort zone

Sometimes, a look through the scrapbook can be a very humbling experience.  I resolved this month to finish a project I launched in 1994.  At last I am publishing the journal I recorded during my first trip to Europe!  For the first time, I am bringing together the forty-two journal entries, my photographs, and the video camera footage that I recorded during my clockwise circuit around the continent.  Before you jump right into the journal, though, could I ask you to read a few thoughts?

More time has passed since I wrote that journal (23 years) than I had lived at that point (I was 20 years old).  The experiences of the last two decades have certainly left their mark.  Since that time, I’ve graduated from two degree programs; I’ve filled my passport with stamps; I’ve built my career in academia; I’ve achieved some level of comfort in finance; I’ve married and divorced.  All of these changes make it hard to recognize the person who wrote those entries as the same person writing this blog!

Setting the scene

19941002-Lyon photo01

I’m sitting by “Le Crayon,” the tower of Credit Lyonnais.

The David who wrote this journal was experiencing profound discomfort.  As a fellow in the University of Arkansas Sturgis Fellows program, I was strongly pushed to spend at least a semester of my junior year abroad.  My undergraduate advisor, Doug Rhoads arranged for me to visit the laboratories of Jean-Jacques Madjar at the University of Lyons, where Thierry Masse mentored my project.  The fact is that I did not enjoy “wet bench” research, and I was becoming concerned that my Biology degree could equip me for a career I did not want!  To complicate the matter further, we never formalized my visa to work in the laboratory for a year-long stretch, and so I needed to leave France well before even a semester had passed.  Scheduling this journey through many countries was my fall-back plan, and my mother was working with the University of Arkansas to get a formal plan in place for the spring of 1995.  In short, I felt that I was failing in this first real test of applying my academic skills.

If you mainly know me as a globe-trotter who uprooted his career and moved to South Africa, you might be surprised to know that as a young man I disliked travel, and I feared change.  Ask the members of Yates Lab how huge a step it seemed to me to move from Seattle, Washington to San Diego, California in the year 2000.  I spent six months poring over maps and dawdling over last details in Seattle.  To go back further in time, I was always the first member of the family to feel it was time for us to return to Kansas City when our family took long road trips in the summer time.  If you read the journal, you will see a David feeling perpetually out of place and coping badly with exhaustion and self-induced malnutrition because I wasn’t willing to spend enough money on food.

The most redundant feature of the journal is that the 20-year-old me was completely agog at the young women I encountered on my travels.  Although a disproportionate number of my friends since elementary school have been female, I must say that I was essentially undateable until my mid-twenties.  I would summarize by saying that I routinely put women on a pedestal and couldn’t see myself as desirable.  This aspect of the journal is high on my list of cringe-inducers.

IMG_9804

I had already given up cursive in college.

What should we call the nexus of judgmental, puritanical, dismissive, and obsessed with money?  I am reminded in this journal that the person I am today was distilled from common mud.  Today I am not immune from these traits, but I do try to improve myself with time.  I have been tagged with the label “stubborn” more times than I would like to admit, but I hope that I can manage open-mindedness and respect for others at least from time to time.  In particular, I struggled to read the passages I wrote about the Turks in Budapest or the drive-by racism I dumped on Latin culture.  At least I realized that smug American chest-thumping was not preferable.  My memories of myself from that time have been substantially white-washed, but my text makes it clear I had a long way to go.  In my memories of that time, I mostly remember that the international relations scholar from Turkey taught me that a bishop or a castle is generally more reliable than a knight in the chess end-game.

From 1994 to now

Travel in Europe today is considerably simpler than it was in 1994.  Moving from country to country is considerably easier because of the Schengen agreement that eliminates customs at borders between countries and the Economic and Monetary Union that makes the Euro the only currency you need for much of the continent.  The traveler’s checks that fueled my travel are not needed in Europe; instead, you feed your bank card into an ATM, and out pops money.  My single telephone call home from Vienna would be likely replaced today by Skype; I could use my phone or computer in the WiFi of any hostel to chat right away with folks at home.

IMG_9801

My account book, in many currencies

I wrote my journal narrative in a spiral-bound notebook, and I kept strict accounts of every franc, Deutschmark, schilling, crown, etc. in a separate small notebook, both of which I acquired while living in Lyon.  I was very fond of Pilot rolling ball pens at the time, and so each page is filled with cramped blue writing.

While my parents used 35mm slide cameras to capture my early years, I carried a 126 film cartridge camera made by Vivitar with me to Europe.  As you will see, many of the images I mention never made it to print when I developed those films, and the term “focus” does not really apply.  In three cases, I used Microsoft’s Image Composite Editor to stitch together multiple photos into a single panorama.

19940618 Lyon cathedrals photo06

The two most visible cathedrals of Lyon, France

Computer video has come quite some distance since 1994.  I originally recorded the video on an analog Sharp “Video8” camera.  When I subsequently upgraded to a miniDV camera, I was able to transfer the video from the old camera to a new one via an S-video cable; this process recorded the video in a digital format on the new tape.  I was able to transfer that digital video without loss to a desktop computer with a FireWire card.  To deinterlace and compress the section of video I’ve posted to YouTube, I used the “yadif” filter of FFMPEG:

ffmpeg.exe -ss 00:00:09 -i input.avi -vf yadif -t 00:45:05 -c:v libx264 -preset slow output.mov

With those comments in place, I hope you enjoy reading the journal, a project 23 years in the making!

Teaching on the sly

When I first arrived at Stellenbosch University, I was a bit concerned.  I had thoroughly enjoyed organizing my own semester-long class in bioinformatics for M.Sc. and Ph.D. students at Vanderbilt University.  Under the “British System,” though, students encounter their final classes in the “Honours” year, crammed between the three-year Bachelor’s program and the two-year Master’s program.  Interestingly, a student may attend Honours at a different college than where he or she completed a bachelor’s degree, and the student may go to yet another university for a Master of Science after the Honours, so long as the training is judged to be relevant.

Overview of South African education program

This sequence describes the common route through South African education, from kindergarten to a terminal degree.

I would take a moment to explain a couple of important features here.  In South Africa, students are required to complete only the first nine grades, called “General Education and Training.”  In the United States, graduation from high school means that you have met your high school’s requirements for that goal (which in turn must meet state requirements).  In South Africa, however, high schools essentially serve to prepare students to take the “matric” exams, which are set (created) and marked (graded) nationally.  Matric successes or failures are what decide a student’s opportunities going forward.  I should also say that the chart above describes the academic track.  Many students take advantage of TVET (Technical and Vocational Education and Training) schools that lead to a certificate or diploma rather than a degree (these campuses have also experienced significant protests).  Each of these training types is considered in determining the SAQA level for a job candidate.

IMG_7451

The 2016 Honours class for the Division of Molecular Biology and Human Genetics

Students who come to Honours in the Division of Molecular Biology and Human Genetics (MBHG) may come from quite a variety of schools and backgrounds.  Like other divisions throughout Stellenbosch University and the University of Cape Town, we are trying to “transform,” or more faithfully represent the broader population of South Africa, and so we seek out candidates who may not have been able to afford the best schools for bachelor’s training.  Transformation is a hard task, and many universities are struggling [Note to self: read that overview chapter!].

My first exposure to teaching at Stellenbosch, then, was to create a bioinformatics “module” for our Honours students.  The group above got to serve as test subjects for my new curriculum, which spanned just four days in 2016.  Instead of 43 one-hour classes from my old Vanderbilt BMIF 310, I adjusted to four morning laboratories (each three hours) and four afternoon lectures (each two hours).  With so little time, I was obviously quite superficial in my coverage.  For 2017, though,  I will conduct a bioinformatics module that extends for eight days (during the first eight business days of May).  I am keeping the hands-on and lecture split the same as last year.  I think the doubling to eight days will be good for both the students and the professor!

Useful Hour 3

In this still from Useful Hour 3, Haiko, Michael, and I impersonate parts of a linked list.

Lecturing just eight days a year isn’t really satisfying my itch to teach, though.  This year I initiated a wildcat “course” of sorts.  The “Useful Hour” takes place each Wednesday at 1:30 PM.  Anyone on campus can attend, and we record videos each week for those who cannot.  The topics have generally been focused on computers, bioinformatics, or biostatistics, though in the coming week we will branch out into biochemistry, as well.  Since the Useful Hour covers so much terrain, I have tried to treat each segment as an independent story, with the topic for each Wednesday announced by my listserv on Monday.  It could be that the loose structure of the Useful Hour will cause its undoing, but for now I am really enjoying its playful vibe.

My work with the Blackburn Lab at the University of Cape Town on Tuesdays has led to another opportunity.  I have teamed up with Nelson Soares, a staff scientist, to create a monthly “Big Show” tutorial for the community of proteomics researchers throughout Cape Town.  Our recent program gave graduate students and post-docs the opportunity to present the essentials of protein identification and quantitation.  In April, we will look at the opportunities their acquisition of a SCIEX TripleTOF will confer on the group.  I appreciate that the students are also willing to listen to a lecture from me, from time to time!

The very latest teaching gig is one I hesitate to mention, since we are still formulating it.  In talking with more members of the Biotechnology Department at the University of the Western Cape, I’ve realized that they have a critical need for more biostatistics training.  I have never taught this subject formally, though I was part of the weekly “Omics” clinic for Biostatistics at Vanderbilt University for a few years.  Certainly one cannot function for long in genomics, transcriptomics, or proteomics without knowing something about biostatistics.  Teaching biostatistics formally is likely to teach me as much about the subject as the students who attend!  I hoped to use slides from Stellenbosch University for teaching weekly courses at UWC, but I could not get that use approved.  Instead, I have once again borrowed the expertise of my friend Xia Wang at the University of Cincinnati.  I am hopeful that I will be able to understand and use her didactic materials.  They’re written in the LaTeX math formatting language, so I will need to remind myself how to edit and export to a format I can display, like PDF. My last real experience with LaTeX was when I wrote my Ph.D. dissertation in 2003.

With students on three university campuses, I think I will finally feel like I have real some momentum in my teaching!

Bang for the buck: U.S. aid to South Africa

Out of $4 trillion dollars in the U.S. federal budget, how much is spent on foreign aid?  While most people in a recent poll thought it was around a quarter of the annual budget, the true answer is around one percent.  In this post, I want to explain two key programs that have impacted my new home country: PEPFAR and AGOA.  The United States plays a substantial role in making the future of South Africa brighter!

PEPFAR: Curtailing the epidemic of HIV/AIDS

pepfar

During the first eight years of the millennium, I rarely had anything positive to say about the President of the United States.  President George W. Bush, though, signed into law the “U.S. Leadership Against HIV/AIDS, Tuberculosis, and Malaria Act of 2003,” which transformed medical care in southern Africa.  His name is still respected in South Africa because of this law; it yielded the President’s Emergency Plan for AIDS Relief (PEPFAR).  This program has been renewed twice by bipartisan vote, in 2008 and 2014.  In the thirteenth year of the program, PEPFAR supported anti-retroviral treatment (ART) for 11.5 million people living with the Human Immunodeficiency Virus (HIV), with that number having climbed by 50% since 2014.  Some two million babies have been born without HIV from mothers who carry the virus.  This is an amazing accomplishment, and it couldn’t have come at a more critical time.

The HIV crisis in South Africa began as it did in the United States, with AIDS appearing in the community of gay men during the early 1980s.  Cases were documented in the heterosexual community in 1987.  By 1990, the crisis had begun to grow rapidly.  It is worth noting that South Africa was coping with tremendous changes during this period as the Apartheid government was compelled to cede power; Nelson Mandela was released from prison in February of 1990.  When he became President in 1994, however, the new government was unable to do much about the growing epidemic.  1996 was a watershed year for HIV as ART was announced, and the first drugs became publicly available (though expensive).  In 1999, Thabo Mbeki was elected President, and the public thought that HIV prevention and treatment might become a priority under his leadership.  His Presidential AIDS Advisory Panel, however, was dominated by HIV denialists / “AIDS dissidents” who claimed the virus had nothing to do with AIDS.  Not only were ART drugs not made available widely, but ART was withheld from pregnant women carrying the virus.  Nelson Mandela re-entered the debate in 2000 by a powerful closing speech at a Durban international conference on AIDS.  The topic became even more personal to him when his son died of AIDS in 2005.  Against this complex historical background, the prevalence of heterosexually transmitted HIV-AIDS was surging.  “By 1994, this had risen to 7.6%, and by 2005 was 30.2%, with an estimated 5.5 million of South Africa’s 47 million people infected.  An estimated 1000 new HIV infections and 900 AIDS deaths occurred each day” [Giliomee and Mbenga, p. 418].

PEPFAR has a tremendous role to play in today’s South Africa.  The program currently estimates that 7,000,000 people in the country are living with HIV, with approximately half protected by ART.  180,000 people die of AIDS each year in South Africa. “South Africa now has the largest number of patients on anti-retroviral drugs in the world, and South African life expectancy has increased by more than a decade.” [Bekker et al.]  Just imagine the impact if PEPFAR were no longer paying for HIV treatment!

Please be aware that there have been changes in the Trump Administration that suggest this program may be in trouble.  It is no exaggeration to say that real people will die without PEPFAR.

AGOA: “Trade, not Aid!”

agoa-feature_1

Debate may never end over the best way for wealthy nations to support the growth of poor nations.  When wealthy countries give food aid to poor nations, those efforts can undermine the economic growth of agriculture in those countries.  The African Growth and Opportunity Act (AGOA) was enacted in 2000, the last year of Bill Clinton’s presidency.  You may be thinking, “gosh, another economic treaty I need to know about!”  In fact, AGOA is not a treaty.  AGOA is a unilateral decision by the United States to drop taxes and quotas on imports of particular goods from countries in sub-Saharan Africa.  The program began by including 34 countries and soon expanded to 40.  After the first fifteen-year run of the program, the U.S. Congress decided to renew AGOA for an additional ten years in 2015.  Each year, the President decides exactly which countries will be extended these benefits.

The metrics for AGOA success paint a somewhat equivocal picture.  The 2016 biennial report shows $23.5 billion in exports from Sub-Saharan Africa in the year 2000.  This number grew to $86.1 billion in the year 2008 before falling back to $18.5 billion in 2015.  This might seem an abject failure, but much of the decline reflects reduced oil exports to the United States and the worldwide recession of 2009.  Most Sub-Saharan countries, of course, would like to export to the world’s biggest economy!  America, in turn, uses this desire to requiring development toward “a market-based economy; the rule of law, political pluralism, and the right to due process; the elimination of barriers to U.S. trade and investment; economic policies to reduce poverty; a system to combat corruption and bribery; and the protection of internationally recognized worker rights” [2016 biennial report, p. 8].  Essentially, the United States waives taxes on imports from countries that behave as the United States would like to see.

South Africa has had an interesting story within the framework of AGOA.  As the continent’s most advanced and diversified economy, South Africa was a bit of a question mark for inclusion in the 2015 renewal of the law.  Did it make sense to give these trade benefits to an economy that was already moving rapidly?  South Africa made itself a less attractive trade partner by raising trade barriers against American farmers exporting meat to South Africa, which caused them to violate the “elimination of barriers to U.S. trade” rule above.  At the start of 2016, the situation had deteriorated enough that Barack Obama suspended AGOA benefits for South Africa.  This action was enough to convince the foot-dragging South African government to drop its trade barriers, and so South Africa is once again an AGOA beneficiary in good standing.

What will happen to AGOA under the Trump Administration? Although President Trump has been ambivalent on the subject of free trade, he has not signaled that he will seek to end AGOA either by unlisting all participant countries or seeking the repeal of AGOA through the Congress.  Africans do not expect great things from President Trump, though.  His Tweets about South Africa have had a generally negative tone.

In the end, South Africa is proud of its ability to take care of its own problems.  If AGOA comes to an end, the country will lose one of its best customers for fruits and vegetables, and the automobile industry growing in the Eastern Cape would suffer.  The loss of PEPFAR, on the other hand, would devastate health care in South Africa.  The economy of South Africa is not strong enough to bear the cost of supporting ART on this scale.  The country already relies on the permissive, pro-public health intellectual property laws of India to have access to generic ART.  We can all hope that the PEPFAR and AGOA relationships between South Africa and the United States continue under President Trump!

An extraordinary journey in three universities

Last November, I received some very welcome news.  The Deputy Vice-Chancellor for Academics at the University of the Western Cape informed me that I had been named an Extraordinary Professor in the Department of Biotechnology!  My work within that department had been going well, when persistent student protests closed the university through the end of 2016.  This letter reflected the ongoing hope of Biotechnology that our collaboration would continue when the students returned to their studies.  Today I received my official badge, so I would like to write about the work that is developing at each of the three local universities at which I have an appointment.

I have written about my travels among the campuses in and around Cape Town.  I would stress that I spend most of my time at my home institution, the Tygerberg campus for Stellenbosch University.  Bioinformatics has seen considerable investment by the university.  The South African Tuberculosis Bioinformatics Initiative represents the concentration of bioinformatics investigators for our campus: Gerard C. Tromp, Gian van der Spuy, and me.  There are other data scientists, though!  The Centre for Evidence-based Healthcare, led by Taryn Young, offers statistical expertise.  Tonya Esterhuizen specializes in biostatistics.  As I will explain in a moment, I hope to work with them more in the days to come.  This year, my formal teaching duties at my home campus will double.  Don’t worry for me, though, since I will host the Honours students for the Division of Molecular Biology and Human Genetics for only eight days!  I am glad that bioinformatics will have the “standard” module length for our Honours program, equal to Immunology and several other subjects.  I have been supplementing my teaching through an informal “course,” called the “Useful Hour.”  I have begun teaching all comers about a range of subjects, from computers to programming and statistics.  I hope to pull in some philosophy of science soon, as well.  I have been filming these subjects as a bit of an experiment, and it has been handy for those who cannot attend.img_20170126_152122

Hugh Patterton, Gerard Tromp, and I coordinate our efforts near Simonsberg.

The Stellenbosch campus of Stellenbosch University has made strides in bioinformatics, as well.  Hugh Patterton, a professor in the Department of Biochemistry, has been named to lead bioinformatics efforts at this campus.  Naturally, our group (SATBBI) has been talking with Hugh about ways we can reinforce each other’s efforts.  Some of our consultations on the Stellenbosch campus have pointed in the direction of microbiome research, an area that is replete with bioinformatics challenges.  I look forward to seeing what emerges!

I am highlighting the University of the Western Cape in this post, of course!  In describing bioinformatics at the campus, I should start by mentioning the South African National Bioinformatics Institute (SANBI).  Alan Christoffels leads this group of investigators.  They’re an interesting group, with considerable success in capacity development within South Africa and across the continent.  My home on the campus, however, has been with the Department of Biotechnology.  In many respects, this reflects how I have spent my career.  I set the mold in graduate school, when I was a bioinformaticist surrounded by analytical chemists.  I like being close to the people who generate the data I work with!  In the Department of Biotechnology, I work most closely with the group of Ashwil Klein, the lecturer who heads the Proteomics Research and Service Unit.  They have primarily emphasized a gel-based workflow, meaning that they partially isolate proteins on a 2D gel before identifying the spot based on the peptide masses they observe on the Bruker Ultraflex TOF/TOF.  The group is actively moving toward additional instruments, though, and the acquisitions should greatly broaden their capabilities.  I enjoy the intellectual challenges their group produces, since the rules of the road are somewhat less established for agricultural proteomics.

img_5457

The new UWC Chemical Sciences and Biological Sciences Buildings rise above the Cape Flats Nature Reserve.

In attending the department’s recent strategic retreat, I was introduced among the researchers of UWC Biotechnology more broadly.  I was particularly glad to meet with Dr. Bronwyn Kirby, who heads the Next Generation Sequencing Facility.  We discussed the Honours course offered for the department (I taught bioinformatics for the proteomics module last year), and I believe I’ll get to add some bioinformatics for the sequencing module in 2017!  I was also delighted to meet the SARChI chair who heads the Institute for Microbial Biotechnology and Metagenomics (IMBM), Marla Trindade.  We spoke about what the students of the institute most needed, and establishing a structured curriculum for biostatistics seemed very high on the list.  I mentioned the biostatistics researchers at Stellenbosch above.  My hope is to be able to use much of the structure Stellenbosch has already built in its Biostatistics I and II classes as a model for teaching biostatistics at UWC Biotechnology.  It would be my first effort at teaching biostatistics formally; I hope that I have absorbed enough to be a good teacher for this subject!

I continue to spend my Tuesdays with the University of Cape Town medical school and to visit the Centre for Proteomics and Genomics, as well.  UCT named me an Honorary Professor in the Department of Integrative Biomedical Sciences halfway through 2016.  My interactions there have principally taken place within the Institute of Infectious Disease and Molecular Medicine (IDM), borrowing from the network of relationships that Jonathan Blackburn has established there.  I have worked with Nelson Soares, his Junior Research Fellow, to create monthly programs for the Cape Town community invested in proteomics.  This Tuesday, we started this series for 2017 with an introduction to the methods we use for identifying and quantifying proteins.  I was really pleased that Brandon Murugan, a senior graduate student in the Blackburn Lab, felt comfortable enough to present this material!

img_2949

I enjoyed my sundown cruise with the SATVI team in May of last year!

From the very beginning of my time in South Africa, I have been working with the South African Tuberculosis Vaccine Initiative (SATVI).  Recently they began having their research in progress meetings on Tuesday morning, allowing me to take part.  I really like the interaction.  They take my questions seriously, and I think we all learn from working together.  Certainly I would find great meaning in being part of a successful vaccine trial for this disease!

I have another group I must mention in describing bioinformatics across these three universities.  Nicola Mulder’s “CBIO” team has been an opening wedge in bioinformatics education for South Africa.  Their H3Africa BioNet courses have been used to supplement the content of B.Sc. education in places like the University of Limpopo.  It should be no surprise that many of the people I have mentioned in today’s post have collaborated in a manuscript describing the growth of bioinformatics in South Africa.  Our field is key to the future of public health and to the advances in biotechnology yet to come!