Tag Archives: science

Zanzibar: the Victoria Garden Museums

An index to this series appears at the first post.

Our last full day on Zanzibar gave us a chance to visit a pair of museums grouped around the Victoria Garden.  They don’t get much attention in the guide books, but we enjoyed our look at the Zanzibar Museum of Art and the Natural History Museum.  Just what would we find beyond the archway at the southern traffic circle of Stone Town?

Rather than scurrying about as our time on the island drew to a halt, Natasha and I relaxed with a bit of light shopping during the morning.  We began with a couple of women’s collective arts stores in the Hurumzi district at first.  We liked some appliqué pillows, though they were priced a bit higher than we thought appropriate.  We saw some shirts and shorts that might look nice for me, but again their prices were high (going to $30 USD for shorts seems excessive to a frugal mind).  We enjoyed a couple of antique shops.  At one, Natasha found a box with pivoting lid intended for salt and pepper; she acquired that for holding earrings.

quarter-anna

A coin minted in India features a British monarch but is used off the cost of Africa…

IMG_1251

The High Court of Zanzibar

At another, I spotted an Imperial British coin from India featuring Queen Victoria. I think my brother might use that with his students to show that Africa and India were actively trading with the rest of the world around the time of the American Civil War.  I also found a Quran in Arabic that I wanted for my brother’s classroom.  We returned to a T-shirt shop near our jetty from last night to purchase some T-shirts for little ones in the family.  It was a good run!

From there, we took the road south past the High Court and State buildings (photos of government buildings are not permitted, though I snapped the High Court without realizing what it is).  The way ahead was blocked, so we headed away from the coast, and happily that course led next to the Victoria Gardens. This park, also called the People’s Gardens, was dedicated to the people of Zanzibar by Sultan Hamoud in 1899 to celebrate Queen Victoria’s Diamond Jubilee.  A 1996 renovation has produced a park that still looks a bit ragged, but some of the trees there are still rather pretty.  A large house adjoining the garden that was originally constructed as the British Residency now serves as the State House: the official residence of the Zanzibari president.

Zanzibar Museum of Art

IMG_1517

The Peace Memorial now houses the art museum.

The park adjoins a complex of two museums that we both enjoyed.  For 6000 Tanzanian shillings ($2.70 USD), we gained access to both the Peace Memorial Museum (now the Zanzibar Museum of Art) and the Natural History Museum.  The Peace Memorial building dates from 1920 during the reign of George V.  It was constructed in honor of those who lost their lives in the “Great War,” commemorating the “victorious peace.”  Why would Zanzibar have cared who “won” World War I?  As it turns out, the British used the island as a repair base for its navy.  The “Battle of Zanzibar” saw the German cruiser Königsberg sink the British cruiser Pegasus during 1914.  The Peace Memorial building looks quite unlike other World War I memorials that I have seen, such as the Liberty Memorial in Kansas City.  One might easily mistake it for a mosque, with its high dome surrounded by six smaller domes!

IMG_1486

The minaret of the Mnara Mosque may date to the seventeenth century.

As I mentioned, the building now houses an art museum.  Visitors are not going to see long galleries full of oils flanked by a massive sculpture garden, though.  I would highlight a few items as worth seeing.  The first is a set of miniatures.  Since Stone Town has dilapidated quite a bit, it can be hard to imagine this city in its prime.  The minaret for the Malindi / Mnara Mosque, is one of the oldest structures standing in Stone Town, though it is now matched to a mosque below that was constructed in 1834/5 (Sheriff pg.51).  It now abuts buildings on almost all sides, so the miniature version at the museum is the only way to see the mosque as a separate structure.  “Zanzibari mosques are very plain and unobtrusive, hardly distinguishable from domestic buildings.  They normally form a continuous line with neighbouring domestic houses…” (Sheriff pg. 5)

IMG_1487

The Old Dispensary (1899) incorporates a strong Indian influence.

The Old Dispensary is a major landmark in Stone Town.  Its story revolves around a fabulously wealthy Ismaili businessman of the late 19th century named Tharia Topan.  As one measure of his wealth, a tract of land he owned in the Ng’ambo (the other side of Creek Road) was so large that it contained 1300 huts (Andriananjanirana-Ruphin pg. 101).  When he decided to create a hospital to celebrate Queen Victoria’s Golden Jubilee, he spared no expense.  He chose a plot of land that would be prominent on the coast (though the extension of the port later blocked its view), and he brought architects and craftsmen from Bombay to create a building better suited as a palace than as a hospital.  The building uses teak imported from India throughout its structure.  He crafted a golden trowel for the ceremony of laying the building’s foundation stone and shipped it to London for exhibition.  Unsurprisingly, he was knighted in 1890, but a year later he was dead.  He never got to see the completion of his triumphant creation (Battle pg. 91-99).

IMG_1488

This evocative statue is displayed without details.

Next, the museum gives a corner chamber to the topic of ceramics, mostly a set of pots and vases.  On a shelf, though, stands a small statue of a chained female slave, looking down but not defeated.  I was really moved by the work, especially since our visit to the Slave Market Museum had reinforced the importance of female slaves in the role of “concubine” or “second wife.”  Many of these women decided against accepting freedom since it would mean separation from their children and other violations of dignity.  I had noted that the Slave Market Museum relied heavily on photographs and text; incorporating this statue could add depth to their presentation.  As it stands, the statue is presented without annotation of sculptor, date, or even title.

Natasha called my attention to Mr. Naaman‘s brilliant recreation of an 1840 photograph by Gillian depicting Stone Town from above.  What makes it brilliant?  The artist made it entirely by pasting together fragments of different banana leaves in 2005, using different species to achieve different shadings.

IMG_1496

Stone Town, executed in banana leaves

Everywhere Natasha and I have gone in Zanzibar, we have been greeted with Jambo (“Hello”), Karibu (“You are welcome”), or Hakuna Matata (“No worries”).  I learned another phrase from a museum piece showing a woven fish trap.  It reads “kuingia demani,” which means getting into problems that one doesn’t know how to solve.  I think we can all relate to that!

Natural History Museum

IMG_1518

Natural History doesn’t get a dome.

Visitors to the Art Museum are also encouraged to visit the small natural history museum next door.  We were both worried that the chamber would be filled with dusty Victorian taxidermy animals. While some stuffed animals were indeed present, we encountered a few things that kept our attention.  For me, the first was a partial skeleton in a glass box locked in a wire cage on the wall.  The description indicated that the skeleton represented the bones of a dodo bird from Mauritius (a gift of W. Harold Ingrams, Esq.).  This might not seem so remarkable, but remember that the last accepted dodo sighting took place in 1662!  These bones are either fakes, or they are more than three and a half centuries old.

We puzzled over a really large vertebra standing on a small table. It must have been a foot across on the central column.  At first we thought it might be from an elephant when Natasha snapped her fingers and realized it was from a whale.  My attention was also grabbed by the jaws of a largetooth sawfish (Pristis microdon) and a common sawfish (Pristis pristis).  They look something like a chainsaw blade with inch-long teeth sticking out on either side. Outside the building, Natasha noticed that the museum was once home to giant tortoises.  Gladly, the animals have been moved to nearby “Prison Island,” where we hope they have more room to maneuver.

IMG_1511

Sawfish teeth

Abyssinian Maritim Restaurant

For dinner, Natasha and I decided to break from Tanzanian food (which we like) to enjoy an Ethiopian restaurant we had spotted near the SW corner of Stone Town.  The restaurant had large posters of sites in the country to tell some of the nation’s history.  Because we started accumulating insect bites the moment we sat down, we decided to move to a more internal table; sadly, the insect bites continued.  We realized from the menu that our dinner was going to cost substantially more than we had been spending.  A normal lunch at a local food joint might cost 12 or 13,000 shillings.  We opted for a vegetarian entrée for me and a chicken entrée for Natasha, and we added a bottle of water and a spiced Ethiopian tea on top.  The total bill came to 49,000 shillings ($22 USD), so ultimately it was “much of a muchness.”

We wandered north toward the tourist area when Lady Hellen appeared at her shop door.  Where had we been?  Didn’t we know she was waiting for us?  Laughing, we stepped inside.  Natasha found two refrigerator magnets, and I bargained for a watercolor of a Zanzibar door that would form a nice triptych with our dhow and street paintings.  She seemed nonplussed at the small purchase, but she still showed good grace.

IMG_1755

Our three watercolors: town, dhow, door

Our efforts to get back to our hotel produced an unusual result.  I headed for the southeast corner of the Old Arab Fort, and then I marched us into the maze of alleys.  The Friday evening crowd on the streets had collectively decided to close up the shops.  Somehow I got us entirely turned around, and we popped back out near Freddie Mercury’s house!  This time Natasha took the fore, and she charged us back into the maze.  Once again, we took a wrong turn, and we bounced out of the maze near Lady Hellen’s art shop!  We decided to play it safe with our last effort.  We headed south and east along the belt road, and then we walked northeast along a familiar track back to our New Mkunazini Road, bought one last bottle of water, and then collapsed into our room at last!

Advertisements

The birth of a new conference

November 1, 2017

How does a new conference enter the academic calendar? I was encouraged by the example set by the Clinical Proteomics / Post-Genome Medicine meeting (ClinProt 2017), and I thought it might be useful to talk about some of the things that the group did really well, while relating a bit of what unfolded for me in my last day at the conference.

Logo_BGRS-shrinkFirst off, this is far from the first meeting to take place in Russia on the subject of human proteomics. The Russian Human Proteome Organization has been operating since 2002, and it sponsors two distinct meetings yearly. The main meeting takes place in the city of Kazan each October. Members who are particularly interested in bioinformatics may participate in an annual meeting at the city of Novosibirsk (Bioinformatics of Genome Regulation and Structure / Systems Biology). The RHUPO has also successfully organized a big event for the world HUPO; in 2009, Dr. Alexander Archakov hosted the third Human Proteome Project Workshop in Moscow!

img-42

Human Proteome Project, Moscow (2009)

The ClinProt 2017 meeting seemed special in that it sought to foster connections among many different institutions within Russia; the program was salted with several investigators across Europe and the broader world, but the emphasis seemed to be on developing networks within the country, including multidisciplinary links. As I look across the eleven-member organizational team in the conference program, I see five different research institutions, all in Russia, represented by post-doctoral scientists. This team of junior researchers will all have valuable experience for the future, and senior scientists who attended the meetings will remember who they could rely upon when trying to solve a last-minute problem before a talk!

I would catalog several things, then, that the organizers did right:

Skin in the game
Because several institutions contributed organizers, more schools sent speakers, poster presenters, and trainees. In total, 350 people registered, and 274 attended. That’s pretty great for a first conference!
Personal touch
Several speakers mentioned that they had been recruited by an organizer who knew them from prior contact. Since professors frequently get spammed by for-profit conferences, these personal contacts made a difference in getting the names they wanted for the meeting agenda.
Detail focus
I heard several of the organizers quietly worrying about whether something was going to go just right. Throughout, it was clear that each person knew what his or her responsibility included. The team was definitely committed.
Industry works
I occasionally hear academics sneer at the inclusion of instrument and reagent vendors in speaker rosters, but their participation in a meeting adds more than just money. I was glad to see a representative from Helicon lecture on the value of CyTOF for cell counting applications, since I am mentoring a student working with such data.

I became aware that we had some special guests today as I lingered in the speaker ready room. Several people in suits made an appearance. I had a rapid conversation with Sergey Suchkov, an M.D. and Ph.D. who has a relentless energy about him. He has a strong interest in developing relationships among BRICS nations in the field of “precision medicine” (sometimes called “personalized medicine”), and he wanted to talk about some possibilities between South Africa and Russia in that space. We agreed to touch base this afternoon when he could introduce me to another M.D. Ph.D. friend of his who has become involved in genome bioinformatics. That meting put forward some interesting possibilities in tuberculosis, which has become problematic in the Russian prison system. I hope we will be able to define some projects we can pursue together in this space.

Right away, though, I had to leave our discussion to teach my afternoon workshop on performing post-hoc quality control assessment in large-scale proteome projects. I was very grateful that the conference organizers could add a link to their website so that participants could download the R statistical script and input files for the workshop directly from the link above. That way the conference attendees who needed to leave Moscow early can still get access to the tutorial.

20131230-Xia-Wang-TOC

Image from my paper with Xia Wang introducing “IDFree” metrics

This was my first time to teach a workshop on quality control. My normal curriculum has emphasized protein identification or the recognition of post-translational modifications. Since I am now chairing the HUPO-PSI working group on quality control, though, it was a good time for me to put together some training materials in this space. I chose a highly visible data set, the 1425 LC-MS/MS experiments that the Vanderbilt team produced from colorectal cancer samples for the National Cancer Institute CPTAC program. The workshop would focus on recreating figures that Xia Wang at U-Cincinnati had scripted in the R statistical environment from tables of QC metrics that my team had generated.

I was really pleased with the dozen or so students who attended the workshop. Their questions were very good, and their understanding of the statistical concepts was at a very high level. To give one example, a student asked how differently the files would have spread in my plot of the first two principal components if we had used ordinary PCA rather than robust PCA. Another asked how hierarchical clustering would visualize these data in principal components space. These are not the questions one encounters with people who have never seen PCA before!

So color me impressed. This meeting ran like clockwork, and the students came ready to learn. The speaker list did not have some of the biggest names in world proteomics, but in fact I trusted what I was hearing more because it came from investigators who had worked at the bench more recently. I am of course grateful for the time I’ve been given to see Russia first-hand, but in the end I was brought here to teach and to learn. I enjoyed both missions!

 

Clinical proteomics in Russia and my last pair of pants

October 30, 2017

At last the first day of ClinProt 2017 had arrived! I set aside my now-muddy pairs of jeans in favor of my fresh and clean blue dress pants, laced up my shiny black shoes, and put on my enthusiastic green shirt. With a spot of breakfast downstairs (on my third morning eating there, I found that the milk jug was full for the first time!), I was ready to meet with the others for a shuttle van ride over to the conference.

Moscow traffic at 8:20 AM is a bit intense. The drivers here are a bit more careful of road laws than I have seen in other countries, but they still produce some pretty creative merges in their traffic jams. What would have been a few minutes on the subway was more like a half hour on the road, but my dress pants were still pristine when we arrived at the Congress Center at the I.M. Sechenov First Moscow State Medical University. The facility had a lovely central hall, with a graceful split staircase to the two main venues for our meeting. I hadn’t seen lecture halls in which an array of nine HDTVs replaced the more typical projector. It certainly produced a bright image, though the borders between screens were distracting.

IMG_7892

Why project when you can emit?

The Clinical Proteomics 2017 meeting was organized because a confluence of groups wanted to consolidate researchers in this country. EuPA, the European Proteomics Association, helps to integrate activities that span national proteomics societies. The Russian Human Proteomics Organization (RHUPO) sought to foster a sense of community among Russian research groups in this area. The Sechenov First Moscow State Medical University was happy to contribute a venue for the event, and many instrument, reagent, and other vendors agreed to take part, as well. I haven’t learned the total count of attendees yet, but I know that there are 87 research posters. For a first effort, I think it is clear that a great many things have gone well.

From the very first talk, it was apparent that Russian clinical proteomics researchers are grappling with challenges that became familiar to me as part of the National Cancer Institute (NCI) CPTAC program. Anna Kudryavtseva discussed her efforts to reconcile proteomics data with those that had been produced by NCI The Cancer Genome Atlas (TCGA), working in a particular sub-type of head and neck cancer. Prioritizing genes that were more frequent targets of mutation in tumors has value for understanding which proteins are most useful to monitor closely, for example. It was a great “plenary” (all attendees) talk to kick off these discussions.

As soon as we split to multiple sessions, I was on duty. I co-chaired the “Genomics and Beyond” panel with Sergey Moshkovskii. It was a bit odd to be fielding this panel while the Protein Informatics workshop was taking place in another room (that topic has been my bread and butter for two decades)! In this case, however, Sergey and I were not only chairing the session but also leading it with our two lectures, both in the field of proteogenomics.

DSC_3620-shrink

Photo credit: Olga Kiseleva

I defined the term by saying that we want to improve our interpretation of genomic data by integrating proteomics data, and we want to improve our interpretation of proteomics data by integrating genomic data (I was trying to be ecumenical). From there, I led the group through the new paper that I’ve published with Anzaan Dippenaar and Tiaan Heunis, in which we demonstrated our ability to recognize sequence variations and novel genes in Mycobacterium tuberculosis “bugs” that had been isolated from patient sputum in South Africa. Sergey followed up by finding evidence of RNA editing in fruit flies.

DSC_3631-shrink

Photo credit: Olga Kiseleva

The other speakers in the panel were also quite interesting. Matthias Schwab was visiting from Germany, and he educated the group on the current status of the field of pharmacogenomics. Vladimir Strelnikov, a geneticist, described the value of bisulfite sequencing for measuring DNA methylation in breast cancer. Sergey Radko outlined a SISCAPA-like strategy for using “aptamers” to enrich proteins prior to Selected Reaction Monitoring. Artem Muravev closed out the session to discuss the challenges of biobanking. This last talk was delivered in Russian, so I benefited quite a lot from real-time translation to English by Anastasia, one of two translators fielding our session (during my talk, she had been translating my words to Russian as I worked through my slides). Finally all the speakers came together for fifteen minutes of question and answer. I tweaked our pharmacogenomics speaker a little bit by saying that even if we had the complete sequences for every human on earth in our hands today, personalized medicine would not have arrived!

With the morning complete, everyone adjourned to a nearby restaurant. I was a little leery when I learned our destination was the Black Market, but I needn’t have worried; we wandered down the street to a lovely restaurant named “Black Market.” I had the Black Market Burger and felt thoroughly happy. I felt very grateful that the European Proteomics Association picked up the bill for that morning’s speakers!

Back in the conference, I enjoyed hearing my long-time friend David Goodlett discuss his long-term monitoring study of diabetes. He’s a careful guy, and it is good to see that he can make label-free proteomics sing in biofluids (a tough space to work), recognizing protein pairs for which expression can flag the onset of disease. It’s very reminiscent of the kind of study Stellenbosch University has produced in the space of tuberculosis. Our next speaker returned to the subject of biobanking, and he delivered his talk via Skype, not my favorite format. I am a big believer in contact with my audience.

IMG_7886

Did I mention it was my enthusiastic green shirt?

I threw all my remaining energy into the poster session. Interacting with researchers at the start of their careers is very rewarding, and people who stand beside their work without knowing whether or not anyone will take interest have a hard job. These students were even braver, since they were prepared to defend their work in English!

I started with a poster very near and dear to my heart. A.V. Mikurova was evaluating the different levels of sequence coverage achieved by database search (Mascot, X!Tandem) and de novo algorithms (PepNovo+, Novor, and PEAKS) when working with 27 LC-MS/MS experiments for a defined mixture of human proteins. We discussed the relative unresponsiveness of sequence coverage as a metric for performance evaluation and the challenge of ensuring the algorithms had comparable configuration. I asked S.E. Novikova about her choices of statistical model for a time-series measurement of proteomes in response to all-trans retinoic acid. I hope my statistics lectures online will be useful to her, though it sounds like she’s already on the right track. N.V. Kuznetsova taught me a few things I didn’t know about celiac disease! She had been evaluating the ability of Triticain-Α to degrade the most immunogenic peptide of gluten-family proteins. Finally, J. Bespyatykh was presenting a poster on the proteomics of Mycobacterium tuberculosis from a strain called Beijing B0/W148. Her work obviously had a strong relationship to what Tiaan and Anzaan had published with me, so we had a great conversation about the work. I hope we can help her find a sequence database that is a more ideal fit for her proteomes than the generic “H37Rv” protein database. I was really pleased to speak with so many students about their work at this meeting.

With that, I slumped onto a wall and didn’t move very much. The other conference attendees had flowed back into the conference room for an afternoon round of talks. I let my mind wander for a bit, though I did have some nice conversations with the vendors. Soon, though, I heard some odd noises echoing through the entry hallway. Was there a music practice room somewhere in the building? Was that a tuba?

DSC_3712-shrink

Dixieland music in Moscow! Photo credit: Olga Kiseleva

My questions were answered when I eventually joined everyone downstairs for a catered closing reception. The organizers had invited a Dixieland band to perform for our reception! The group was really solid. I particularly liked one of their trumpeters, since he had a smooth Chuck Mangione vibe going on. I kept recognizing songs only part of the way, since they were singing many of the lyrics in Russian! I finally got a solid hit on “Mack the Knife!” I sat up close to enjoy the show.

With the evening at an end, I declined invitations to go hit a bar and walked to the nearby Frunzenskaya subway station. Two stops later, I was in my neighborhood. I trudged up the paved driveway to the street with my hotel. As I awaited the green light at my last crosswalk before the hotel entrance, a car drove too close to the curb where I was waiting, and dirty rainwater soaked my last clean pair of pants.

 

You can be an academic YouTube STAR!

Many universities have begun exploring the use of the Internet for sharing academic coursework, either via “flipped classrooms” or Massive Open On-Line Courses (MOOCs).  Over the last year, I have uploaded approximately 50 videos to my YouTube Channel, most of them academic lectures.  I hope that I have learned something in this process that will you to publish your work more broadly, as well!

I would start by explaining that my lectures come from multiple purposes and even multiple university campuses.  My longest-running series of lectures came from a weekly seminar on topics of my own choosing called “the Useful Hour.”  I produced fourteen of these sessions (with help from Brigitte Glanzmann when I had to be away for a week), though I only started recording them on video for the last twelve.  I recorded the eight-session bioinformatics module from our division’s B.Sc. Honours program as a trial run for creating a “flipped classroom” in future years (a model where students watch lectures outside of class and spend in-class time working exercises).  More recently, I collaborated with the H3Africa BioNet to produce a four-lecture module on Gene Expression.  From time to time, I help the Tygerberg Postgraduate Student Council by recording a lecturer.  Each of these experiences has had its own lessons to convey.

The technical aspects of recording a video are generally easy enough that even a Ph.D. can do it!  Today’s budget camcorders capture more detail with better sound under lousier conditions than did cameras that cost five times as much even five years ago.  Best of all, one no longer needs to wrestle with tapes and analog-to-digital transfer loss.  Today we simply pull the Secure Digital card out of the camcorder and plug it into the socket on a laptop, where the video files are instantly accessible.  Of course, many people record video using digital cameras or cell phones.  Preparing videos for upload to a public server, however, is frequently more difficult than the initial capture.  I’ll talk about these aspects below.

Focus on the speaker

pointer

Speak softly, and carry a big stick!

We must start with video that is worth watching.  Far too frequently, I see that people recording lectures focus on the slides rather than the person who is delivering the lecture.  Reading text from video is generally unpleasant, and the reality is that looking at people fires circuits in our brains that academic content does not.  Video is a format designed to capture motion; it is a notoriously inefficient method for capturing still images, though!  Keeping the camera on the speaker, then, makes more sense.  This comes with some caveats:

  1. Viewers still need to be able to see the slides.  My answer has been to produce a PDF from the PowerPoint or other presentation software, since almost everyone has the ability to view PDFs on any platform.  I post the PDF to a shared directory on Google Drive, and I include the URL leading to the PDF in the YouTube description.
  2. From time to time a researcher will point to a particular part of a slide.  This is probably problematic on video; if he or she has used a laser pointer, the spot of light will either be too bright (green) or too dim (red) to appear well on video.  A moving mouse pointer might be better.  If the speaker is old-school (like me), he or she may use a stick to point at the slide instead.  This can create a problem of the lecturer “blooming” as he or she moves away from the bright field created by the projector into the relative dark outside the projector’s light.
  3. How will a person watching the video know to advance to the next slide?  Hopefully the speaker says “next slide” out loud.  When my parents recorded my brother’s and my first efforts to read aloud, they told us to bang a spoon against a mug to produce an audible chime with each page turn.  That was even more fun than reading!
  4. Software is publicly available to integrate the slowly-changing slide video with the quickly-moving speaker video.  Screencast-O-Matic will produce videos of up to fifteen minutes in its free version.  This approach will guarantee that your viewers are seeing the same slide the lecturer is seeing as the talk progresses.
screencast

Screencast-O-Matic insets your image atop the slides you are presenting.

Light and detail go hand in hand

As I alluded above, lighting is frequently a problem in academic lecture videos.  We frequently keep our lecture halls very dim in order to make the slides stand out as much as possible.  In a large venue, you may have a spotlight on the speaker, which will help.  In a medium venue, you may have a light in the ceiling directly above the speaker, which can make him or her appear somewhat ghoulish.  The more you rely upon zoom, the less light will reach your camera!  Keep that camera close.  If you can open the blinds on a window so that your speaker is lit, you will have a more interesting video.  Try to find ways to position your camera between the light and the subject (without casting a shadow, of course).  Never forget that the projected slides are much brighter than the subject you are trying to record.  If even the corner of the projected image appears in-shot, expect the speaker to become a flat silhouette.

Today’s cameras can record in very high resolutions, such as 1080p (the same as your HD television).  If lighting is truly problematic, you may want to consider forcing your high-resolution camera to a lower resolution, such as 720p; this may allow it to combine intensities across multiple transistors for each pixel.  Similarly, you should expect that a camera with a larger “retina” will outperform one with a tiny CCD in low light.  To put this in plain terms, do not expect a cell phone to produce quality video in semi-darkness, no matter the name on the label.  That said, I have observed that my “mirrorless” Canon EOS-M2 is inferior to my much cheaper Canon VIXIA HF R62 for video.  The lenses and electronics of the EOS-M2 are optimized for photos, not video.

Privacy issues are a big deal

Ensure that your audience knows that the lecture is being recorded.  Bad things can happen when a person does not want his or her image to be on-line and somebody else decides that they shall be.  Imagine how much worse this becomes when that member of the audience is a minor!  Nobody should be forced into public view because he or she attends a talk.

We frequently expect a period of questions and answers at the end of a lecture (and sometimes in the middle).  A novice camera operator may automatically swing to capture the questioner in action.  Depending on the situation, this part of the video may need to be truncated outright due to privacy issues.

Video is big and hard to handle well

20170831-This-Big

I use my hands a lot.

When I upgraded to my Canon VIXIA HF R62 from a JVC Everio (GZ-HM30AU), I had a rude shock.  My old camera had captured 720p video in very manageable MTS files, but the new camera captured 1080p video in massive MP4 volumes.  I used a 16 GB SDHC card for videos.  The cameras assumed that no file should be allowed to be larger than 4 GB (linked to 32-bit computing).  With the new camera, I consume 4 GB every 33 minutes!  At a couple of long events I recorded, I found that I needed more storage than the 16 GB card could provide.  I solved that problem by upgrading to a 64 GB card.

Naturally, keeping the raw footage of every event I video is not practical.  If each of the 50 videos I posted to YouTube over the last year produced 66 minutes of raw footage, I would need to archive 400 GB for just this period!  Similarly, posting these videos to YouTube would be a problem.  Each hour would span two files, which would require my viewers to watch part ‘A’ and then queue up part ‘B’ immediately afterwards; many would just skip watching the end, humans being humans.  To compound the problem, I live in South Africa, which means my upload speeds to network servers are dreadfully slow.  My home DSL line, for example, achieves 0.3 Mbps.  I have uploaded one GB before, but it takes hours.  In any case, I will probably need to truncate a bit of time off the front and the back of the video.  In short, I need to do video editing.

ffmpeg-summer-v2

While semi-professionals might opt for Adobe Premiere and those who “think different” will break out iMovie, I am a bioinformaticist, and I like software that lets me master high-quality videos with a minimum of fuss and bother.  I use ffmpeg, a very powerful suite of tools that one can use directly on the command line.  Most of the time, I am (a) concatenating my source video files into one movie, (b) including only a middle section, and (c) writing a more compact movie from the source materials.  To use a recent example, I have two input files; I write their names into a file called list.txt:

file mvi_0031.mp4
file mvi_0032.mp4

Next, I run a command line that looks like this:

ffmpeg.exe -ss 00:00:15 -f concat -safe 0 -i list.txt -t 00:50:00 -c:v libx264 -preset slow -c:a copy output.mp4

In order, the options do the following:

  • -ss specifies where in the combined files ffmpeg will start the output video (in this example, after the first fifteen seconds).
  • -ff concat -safe 0 -i list.txt specifies that the files listed in list.txt should be combined into one video and that they are formatted the same way.
  • -t specifies the total duration of the video to be encoded (in this example, exactly fifty minutes).
  • -c:v libx264 -preset slow specifies that my output video will be MPEG 4 pt 10, a very common format for storing video (and one that YouTube knows how to read).
  • -c:a copy directs ffmpeg not to re-compress the audio, making it sound just as nice in the output as it did in the original.

The ffmpeg software is very good at reducing the size of videos without compromising its quality.  I find that I can represent an hour-long lecture in a two GB 1080p video, rather than the nearly 8 GB of source footage.  If I am filled with caffeine for my lecture, the video size increases a bit (more motion requires more bits for accurate representation).

These smaller videos can then be uploaded to my YouTube account.  Happily, if you have a Gmail account (or if you use a different email address to log into Google Services), you can simply use that login for YouTube.  One clicks the arrow pointing up, and a screen will appear to which you drag your video file.  All done, right?

No job is finished until the paperwork is through!

Meta-data is key to your video reaching an audience, and too few people spend adequate time on this step.  I would call your attention to both the “Basic Info” and “Advanced Settings” pages that video authors can complete.  Of course, you should enter a paragraph of information in the basic description blank.  Ask yourself what web searches should find your video, and be sure you include those key terms in the text.  For good measure, add them again in the keywords section!  I like to include the university name where the recording took place.  Hopefully the social media minders for these schools will highlight your video to their large audiences.  YouTube will sniff the video for still frames that might be representative for the video.  I always try to pick the one in which I do not look like I’m suffering a fit of some sort.

Advanced Settings has more options to help users find your video.  Pick a category; generally my lectures fall in the “Science and Technology” category.  Be sure to enter a video location.  Google will translate your information to GPS coordinates so people can find videos shot near particular locations.  Enter a recording date, and select the language of your video (especially if you are not using English).

In many cases, you will have several videos that belong together as a set.  When I produced a short biography and four videos on Gene Expression for H3A BioNet, I also created a “playlist” that contained all five videos in the correct order.  Remember, if you can hook a viewer into watching one of your videos, you might be able to retain their interest for a few more!  Ideally, people will like your stuff enough that they subscribe to your YouTube channel, receiving a notification every time you post a new video.  You will be launched on your next career as a YouTube star!

What protein database is best for tuberculosis?

As many of you know, I have specialized in the field of proteomics, the study of complex mixtures of proteins that may be characteristic of a disease state, development stage, tissue type, etc.  Here in South Africa, my application focus has shifted from colon cancer to tuberculosis.  As a newcomer to this field, I’ve been curious to know whether the field of tuberculosis has good information resources to leverage in its fight against the disease.

The key resource any proteomics group can leverage is the sequence database, specifically the list of all protein sequences encoded by the genome in question.  The human genome incorporates around 20,310 protein-coding genes (reduced from estimates of 26,588 from the 2001 publication), but those genes code for upwards of 70,000 distinct proteins through alternative splicing. Bacteria are able to get by with far smaller numbers of genes.  E. coli, for example, functions with only 4309 proteins.  The organism that infects humans and other animals to produce tuberculosis is named Mycobacterium tuberculosis.  If we were to rely upon the excellent UniProt database, from which I quoted E. coli protein-coding gene counts, we would probably conclude that M. tuberculosis relies upon even fewer genes: only 3993 (3997 proteins)!

logo_7

UniProt is an excellent all-around resource for proteomics, but researchers in a particular field usually gravitate to a data resource that is particular to their organism.  People who work with C. elegans for developmental studies, for example, use WormBase.  People who study genetics with D. melanogaster would use FlyBase.  People in tuberculosis have frequently turned to TubercuList for its annotation of the M.tb genome (comprising 4031 proteins).  This database, however, has not been updated since March of 2013 (available from the “What’s New” page).  Can it still be considered current, four years later?

cms_refseq10years

e-ensembl

As a recent import from clinical proteogenomics, my first impulse is still to run to the genome-derived sequence databases of NCBI, particularly its RefSeq collection.  I found a NCBI genome for M. tuberculosis there, with a  last modification date from May 21, 2016 and indicating its annotation was based upon “ASM19595v2,” a particular assembly of the sequencing data.  This was echoed when I ran to Ensembl, another site most commonly used for eukaryotic species (such as humans) rather than prokaryotic organisms (such as bacteria).  Their Ensembl tuberculosis proteome was built upon the same assembly as was the one from NCBI.

JGI_logo_stacked_DOEtag_UF_CMYK

As a former post-doc from Oak Ridge National Laboratory, I am always likely to think of the Department of Energy’s Joint Genome Institute.  The DOE sequences “bugs” (slang for bacteria) like nobody’s business.  Invariably, I find that I can retrieve a complete proteome for a rare bacterium at JGI which is represented by only a handful of proteins in UniProt!  This makes JGI a great resource for people who work in “microbiome” projects, where samples contain proteins from an unknown number of micro-organisms.  In any case, they had many genomes that had been sequenced for tuberculosis (using the Genome Portal, I enumerated projects for Taxonomy ID 1773).  I settled for two that were in finished state, one by Manoj Pillay that appeared to serve as the reference genome and another by Cole that appeared to be an orthogonal attempt to re-annotate the genome from fresh sequencing experiments.

The easiest way to compare the six databases I had accumulated for M. tuberculosis is to enumerate the sequences in each database.  The FASTA file format is very simple; if you can count the number of lines in the file that start with ‘>’, you know how many different sequences there are!  I used the GNU tool “grep” to count them:

grep -c "^>" *.fasta
  • TubercuList: 4031 proteins
  • NCBI GCF: 3906 proteins
  • DOE JGI Cole: 4076 proteins
  • DOE JGI Pillay: 4048 proteins
  • Ensembl: 4018 proteins
  • UniProt: 3997 proteins

So far, one could certainly be excused for thinking that these databases are very nearly identical.  Of course, databases may contain very similar numbers of sequences without containing the same sequences.  One might count how many sequences are duplicated among these databases, but identity is too tough a criterion (sequences can be similar without being identical).  For example, database A may contain a long protein for gene 1 while database B contains just part of that long protein sequence for gene 1.  Database A may be constructed from one gene assembly while Database B is constructed from an altogether different gene assembly, meaning that small genetic variations may lead to small proteomic variations.

pgec20header20final20editI opted to use OrthoVenn, a rather powerful tool for analyzing these sequence database similarities.  The tool was published in 2015.  Almost immediately, I ran into a vexing problem.  The Venn diagram created by the software left out TubercuList!  I was delighted to get a rapid response from Yi Wang, the author of the tool (through funding of the United States Department of Agriculture’s Agricultural Research Service).  The tool could not process TubercuList because it contained disallowed characters in its sequence!  I followed his tip to sniff the file very closely.  I found that both sequence entries and accession numbers contained characters they should not.  Specifically, I found these interloping characters:

+ * ' #
jVenn_chart

OrthoVenn Venn chart

Scrubbing those bonus characters from the database allowed the OrthoVenn software to run perfectly.  Before we leave the subject, I would comment that these characters would cause problems for almost any program designed to read FASTA databases; in some cases, for example, the protein containing one of those characters might be prevented from being identified because of these inclusions!  My read is that they were introduced by manual typing errors; they are not frequent, and they appeared at a variety of locations.  Let’s remember that they have been in place for four years, with no subsequent database release!

Most people are accustomed to seeing Venn diagrams that incorporate two or three circles.  In this case I compelled the software to compare six different sets.  The bars shown at the bottom of the image show the numbers of clusters in each database; note that these differ from the number of sequences reported in my bullet list above because OrthoVenn recognizes that sequences within a single database may be highly redundant of each other!  (If sequences were completely identical, they could be screened out by the Proteomic Analysis Workbench from OHSU.)  Looking back at the six-pointed star drawn by the software, we might conclude that the overlap is nearly perfect among these databases.  We see four clusters specific to the JGI Pillay database, and 131 clusters specific to some sub-population of the databases, but the great bulk of clusters (3667) are apparently shared among all six databases!

Venn

The Edwards visualization from OrthoVenn

Oh, how much difference a visualization makes!  Shifting the visualization to “Edwards‘ Venn” alters the picture considerably.  Now we see that the star version hides the labels for some combinations of database.  We see that 3667 clusters are indeed shared among all six databases.  After that, we can descend in counts to 131 clusters found in the Pillay and Cole databases from JGI; does this reflect a difference in how JGI runs its assemblies?  Next we step to 106 clusters found in UniProt, Ensembl, Tuberculist, and NCBI GCF, but neither of the JGI databases.  The next sets down represent 70 clusters found in all but NCBI GCF or 25 clusters found in all but the two JGI databases and NCBI GCF.

I interpret this set of intersections to say that tuberculosis researchers are faced with a bit of a dilemma.  If they use a JGI database, they’ll miss the 106 clusters in all the other databases.  If they use Ensembl or TubercuList, they will include those 106 but lose the 131 clusters specific to the JGI databases.  Helpfully, OrthoVenn shows explicitly which sequences map to which clusters.  Remember that when I downloaded the Ensembl and NCBI databases, I saw that they were both based upon a single genome assembly called ASM19595v2.  Did they contain exactly the same genes?  No!  Ensembl contained two fairly big sets of genes that NCBI omitted, including 70 and 25 protein clusters, respectively.  NCBI contains another 11 protein clusters that were omitted from Ensembl.  Just because two databases stem from the same assembly does not imply that they have identical content.

For my part, I may use some non-quantitative means to decide upon a database.  I do not like making manual edits to a database since then others need to know exactly which edits I’ve made to reproduce my work.  That takes away TubercuList.  Next, I feel strongly that the FASTA database should contain useful text descriptions for each accession.  Take a look at the lack of information TubercuList provides for its first protein:

Rv0001_dnaA

That’s right.  Nothing!  The Joint Genome Institute databases are quite similar in omitting the description lines. Compare that to what we see in the NCBI and UniProt databases:

NP_214515.1 chromosomal replication initiator protein DnaA [Mycobacterium tuberculosis H37Rv]
sp|P9WNW3|DNAA_MYCTU Chromosomal replication initiator protein DnaA OS=Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) GN=dnaA PE=1 SV=1

That’s much more informative. We’ve got missing data here, too, though. Tuberculosis researchers have grown accustomed to their “Rv numbers” to describe their most familiar genes/proteins, but NCBI and UniProt leave those numbers out of well-characterized genes; the Rv numbers still appear for less well-characterized proteins, such as hypothetical proteins. By comparison, Ensembl includes textual descriptions as well as Rv numbers in a machine-parseable format for every entry:

CCP42723 pep chromosome:ASM19595v2:Chromosome:1:1524:1 gene:Rv0001 transcript:CCP42723 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:dnaA description:Chromosomal replication initiator protein DnaA

On this basis, I believe Ensembl may be the best option for tuberculosis researchers. It is kept up-to-date while TubercuList is not, and it allows researchers to refer back to the old Rv number system in each description.

I hope that this view “under the hood” has helped you understand a bit more of the kind of question that occasionally bedevils a bioinformaticist!

Are you ready to start a molecular biology M.Sc.?

Professors receive a lot of requests from international students for admission to post-graduate training.  In South Africa, that training could be for “Honours” (a one-year course), an “M.Sc.” (a two-year Master’s program), or a “Ph.D.” (typically three years, post Master’s).  For students changing from one country to another, however, the question of “equivalencies” is key.  Could a four-year B.Sc. (Bachelor’s of Science) from Egypt, for example, be treated as the same thing as a three year B.Sc. followed by one year of Honours in South Africa?  This post gives an example of the questions I asked as I recently tried to determine the right level of admissions for an international student.

The international office for my university had declared that a student’s four-year degree was certainly equivalent to a three-year B.Sc. in South Africa, but it left to the department’s discretion whether or not Honours training was required before a M.Sc.  To support the department’s decision, I decided to build an interview from questions that would delineate the limits of the candidate’s knowledge.  I used the roster of topics for the Division of Molecular Biology and Human Genetics 2017 Honours as a guide.  I used the number of didactic training days for each topic as a weight:

Field Duration
Molecular Biology 8 days
Mycobacteriology 7 days
Biostatistics 12 days
Bioinformatics 8 days
Immunology 8 days
Cell Biology 8 days
Scientific Communication 2 days

I also gave some consideration to the M.Sc. project the student would pursue in my laboratory.  In this case, the work related to the reproducibility of mass spectrometry experiments.  After pondering before my word processor, I selected these questions for the candidate’s interview:

# Field Question
1 Cell Biology What biological processes are described by the Central Dogma of molecular biology? Walk us through each.
2 Biochemistry What do we describe with Michaelis-Menten kinetics?
3 Computer Science How does iteration differ from recursion?
4 Analytical Chemistry By what property does a mass spectrometer separate ions?
5 Medicine In HIV treatment, what is the purpose of a “protease inhibitor?”
6 Biostatistics What role does the “null hypothesis” play in Student’s t-test?
7 Medicine What type of pathogen causes tuberculosis?
8 Genetics What is the purpose of a plasmid vector in cloning? What features do such vectors commonly contain?
9 Cell Biology What cellular process includes prophase, metaphase, anaphase, and telophase?
10 Mathematics The log ratio (base 2) between two numbers is 3. What is the linear ratio?
11 Immunology What is an antibody, and what is its relationship to an antigen? What are the major families of antibodies?
12 Computer Science What is the purpose of an Application Programming Interface (API) or “library?”
13 Biochemistry What do we describe as the secondary structure of a protein?
14 Genetics Of what components are nucleic acids constructed?
15 Biostatistics What is a Coefficient of Variation?
16 Mathematics If I divide the circumference of a circle by its diameter, what value do I get?
17 Immunology What type of immune cell is the primary factory for antibodies?

The interview, conducted via Skype, lasted approximately an hour. As I asked each question, I gave the question orally and pasted the text of that question into the chat session. Remember that as an American, I have a “foreign” accent for the English-speaking population of Africa! I did not want that to be a factor in the candidate’s performance. I was grateful that our division’s Honours program coordinator, Dr. Jennifer Jackson, accompanied me during the interview, both to monitor that the candidate was treated fairly and to ask follow-up questions of her own.

Why did it take an hour to answer these questions? As is customary in post-graduate education, each answer opened the door to a series of other questions. A student may give an answer that covers only part of the question, and the follow-up will poke into the omitted area to see if it is an area of weakness, almost like a dentist with an explorer goes after a darkened area of a tooth to see if it represents dental decay!

Another factor that I want to measure for students is the degree of integration that they have achieved in their educations. To recognize that a word has been mentioned in class is not sufficient; I need to see that students understand how key concepts relate to each other. This synthesis is sometimes hard to evaluate, but it’s important. A student who doesn’t understand how a concept integrates with others will not be able to apply the principle or recognize when it should come into play.

Before the readers of this blog begin showering me with applications, I need to emphasize that the questions I framed for this particular interview are not the questions I would ask of another candidate. The ones above were chosen to reflect the background of the candidate, the diploma program to which he or she had applied, and the nature of the project I had in mind.

I hope that this post will help you decide whether or not you are ready to plunge into post-graduate education!

Young David steps out of his comfort zone

Sometimes, a look through the scrapbook can be a very humbling experience.  I resolved this month to finish a project I launched in 1994.  At last I am publishing the journal I recorded during my first trip to Europe!  For the first time, I am bringing together the forty-two journal entries, my photographs, and the video camera footage that I recorded during my clockwise circuit around the continent.  Before you jump right into the journal, though, could I ask you to read a few thoughts?

More time has passed since I wrote that journal (23 years) than I had lived at that point (I was 20 years old).  The experiences of the last two decades have certainly left their mark.  Since that time, I’ve graduated from two degree programs; I’ve filled my passport with stamps; I’ve built my career in academia; I’ve achieved some level of comfort in finance; I’ve married and divorced.  All of these changes make it hard to recognize the person who wrote those entries as the same person writing this blog!

Setting the scene

19941002-Lyon photo01

I’m sitting by “Le Crayon,” the tower of Credit Lyonnais.

The David who wrote this journal was experiencing profound discomfort.  As a fellow in the University of Arkansas Sturgis Fellows program, I was strongly pushed to spend at least a semester of my junior year abroad.  My undergraduate advisor, Doug Rhoads arranged for me to visit the laboratories of Jean-Jacques Madjar at the University of Lyons, where Thierry Masse mentored my project.  The fact is that I did not enjoy “wet bench” research, and I was becoming concerned that my Biology degree could equip me for a career I did not want!  To complicate the matter further, we never formalized my visa to work in the laboratory for a year-long stretch, and so I needed to leave France well before even a semester had passed.  Scheduling this journey through many countries was my fall-back plan, and my mother was working with the University of Arkansas to get a formal plan in place for the spring of 1995.  In short, I felt that I was failing in this first real test of applying my academic skills.

If you mainly know me as a globe-trotter who uprooted his career and moved to South Africa, you might be surprised to know that as a young man I disliked travel, and I feared change.  Ask the members of Yates Lab how huge a step it seemed to me to move from Seattle, Washington to San Diego, California in the year 2000.  I spent six months poring over maps and dawdling over last details in Seattle.  To go back further in time, I was always the first member of the family to feel it was time for us to return to Kansas City when our family took long road trips in the summer time.  If you read the journal, you will see a David feeling perpetually out of place and coping badly with exhaustion and self-induced malnutrition because I wasn’t willing to spend enough money on food.

The most redundant feature of the journal is that the 20-year-old me was completely agog at the young women I encountered on my travels.  Although a disproportionate number of my friends since elementary school have been female, I must say that I was essentially undateable until my mid-twenties.  I would summarize by saying that I routinely put women on a pedestal and couldn’t see myself as desirable.  This aspect of the journal is high on my list of cringe-inducers.

IMG_9804

I had already given up cursive in college.

What should we call the nexus of judgmental, puritanical, dismissive, and obsessed with money?  I am reminded in this journal that the person I am today was distilled from common mud.  Today I am not immune from these traits, but I do try to improve myself with time.  I have been tagged with the label “stubborn” more times than I would like to admit, but I hope that I can manage open-mindedness and respect for others at least from time to time.  In particular, I struggled to read the passages I wrote about the Turks in Budapest or the drive-by racism I dumped on Latin culture.  At least I realized that smug American chest-thumping was not preferable.  My memories of myself from that time have been substantially white-washed, but my text makes it clear I had a long way to go.  In my memories of that time, I mostly remember that the international relations scholar from Turkey taught me that a bishop or a castle is generally more reliable than a knight in the chess end-game.

From 1994 to now

Travel in Europe today is considerably simpler than it was in 1994.  Moving from country to country is considerably easier because of the Schengen agreement that eliminates customs at borders between countries and the Economic and Monetary Union that makes the Euro the only currency you need for much of the continent.  The traveler’s checks that fueled my travel are not needed in Europe; instead, you feed your bank card into an ATM, and out pops money.  My single telephone call home from Vienna would be likely replaced today by Skype; I could use my phone or computer in the WiFi of any hostel to chat right away with folks at home.

IMG_9801

My account book, in many currencies

I wrote my journal narrative in a spiral-bound notebook, and I kept strict accounts of every franc, Deutschmark, schilling, crown, etc. in a separate small notebook, both of which I acquired while living in Lyon.  I was very fond of Pilot rolling ball pens at the time, and so each page is filled with cramped blue writing.

While my parents used 35mm slide cameras to capture my early years, I carried a 126 film cartridge camera made by Vivitar with me to Europe.  As you will see, many of the images I mention never made it to print when I developed those films, and the term “focus” does not really apply.  In three cases, I used Microsoft’s Image Composite Editor to stitch together multiple photos into a single panorama.

19940618 Lyon cathedrals photo06

The two most visible cathedrals of Lyon, France

Computer video has come quite some distance since 1994.  I originally recorded the video on an analog Sharp “Video8” camera.  When I subsequently upgraded to a miniDV camera, I was able to transfer the video from the old camera to a new one via an S-video cable; this process recorded the video in a digital format on the new tape.  I was able to transfer that digital video without loss to a desktop computer with a FireWire card.  To deinterlace and compress the section of video I’ve posted to YouTube, I used the “yadif” filter of FFMPEG:

ffmpeg.exe -ss 00:00:09 -i input.avi -vf yadif -t 00:45:05 -c:v libx264 -preset slow output.mov

With those comments in place, I hope you enjoy reading the journal, a project 23 years in the making!