Tag Archives: computers

What protein database is best for tuberculosis?

As many of you know, I have specialized in the field of proteomics, the study of complex mixtures of proteins that may be characteristic of a disease state, development stage, tissue type, etc.  Here in South Africa, my application focus has shifted from colon cancer to tuberculosis.  As a newcomer to this field, I’ve been curious to know whether the field of tuberculosis has good information resources to leverage in its fight against the disease.

The key resource any proteomics group can leverage is the sequence database, specifically the list of all protein sequences encoded by the genome in question.  The human genome incorporates around 20,310 protein-coding genes (reduced from estimates of 26,588 from the 2001 publication), but those genes code for upwards of 70,000 distinct proteins through alternative splicing. Bacteria are able to get by with far smaller numbers of genes.  E. coli, for example, functions with only 4309 proteins.  The organism that infects humans and other animals to produce tuberculosis is named Mycobacterium tuberculosis.  If we were to rely upon the excellent UniProt database, from which I quoted E. coli protein-coding gene counts, we would probably conclude that M. tuberculosis relies upon even fewer genes: only 3993 (3997 proteins)!

logo_7

UniProt is an excellent all-around resource for proteomics, but researchers in a particular field usually gravitate to a data resource that is particular to their organism.  People who work with C. elegans for developmental studies, for example, use WormBase.  People who study genetics with D. melanogaster would use FlyBase.  People in tuberculosis have frequently turned to TubercuList for its annotation of the M.tb genome (comprising 4031 proteins).  This database, however, has not been updated since March of 2013 (available from the “What’s New” page).  Can it still be considered current, four years later?

cms_refseq10years

e-ensembl

As a recent import from clinical proteogenomics, my first impulse is still to run to the genome-derived sequence databases of NCBI, particularly its RefSeq collection.  I found a NCBI genome for M. tuberculosis there, with a  last modification date from May 21, 2016 and indicating its annotation was based upon “ASM19595v2,” a particular assembly of the sequencing data.  This was echoed when I ran to Ensembl, another site most commonly used for eukaryotic species (such as humans) rather than prokaryotic organisms (such as bacteria).  Their Ensembl tuberculosis proteome was built upon the same assembly as was the one from NCBI.

JGI_logo_stacked_DOEtag_UF_CMYK

As a former post-doc from Oak Ridge National Laboratory, I am always likely to think of the Department of Energy’s Joint Genome Institute.  The DOE sequences “bugs” (slang for bacteria) like nobody’s business.  Invariably, I find that I can retrieve a complete proteome for a rare bacterium at JGI which is represented by only a handful of proteins in UniProt!  This makes JGI a great resource for people who work in “microbiome” projects, where samples contain proteins from an unknown number of micro-organisms.  In any case, they had many genomes that had been sequenced for tuberculosis (using the Genome Portal, I enumerated projects for Taxonomy ID 1773).  I settled for two that were in finished state, one by Manoj Pillay that appeared to serve as the reference genome and another by Cole that appeared to be an orthogonal attempt to re-annotate the genome from fresh sequencing experiments.

The easiest way to compare the six databases I had accumulated for M. tuberculosis is to enumerate the sequences in each database.  The FASTA file format is very simple; if you can count the number of lines in the file that start with ‘>’, you know how many different sequences there are!  I used the GNU tool “grep” to count them:

grep -c "^>" *.fasta
  • TubercuList: 4031 proteins
  • NCBI GCF: 3906 proteins
  • DOE JGI Cole: 4076 proteins
  • DOE JGI Pillay: 4048 proteins
  • Ensembl: 4018 proteins
  • UniProt: 3997 proteins

So far, one could certainly be excused for thinking that these databases are very nearly identical.  Of course, databases may contain very similar numbers of sequences without containing the same sequences.  One might count how many sequences are duplicated among these databases, but identity is too tough a criterion (sequences can be similar without being identical).  For example, database A may contain a long protein for gene 1 while database B contains just part of that long protein sequence for gene 1.  Database A may be constructed from one gene assembly while Database B is constructed from an altogether different gene assembly, meaning that small genetic variations may lead to small proteomic variations.

pgec20header20final20editI opted to use OrthoVenn, a rather powerful tool for analyzing these sequence database similarities.  The tool was published in 2015.  Almost immediately, I ran into a vexing problem.  The Venn diagram created by the software left out TubercuList!  I was delighted to get a rapid response from Yi Wang, the author of the tool (through funding of the United States Department of Agriculture’s Agricultural Research Service).  The tool could not process TubercuList because it contained disallowed characters in its sequence!  I followed his tip to sniff the file very closely.  I found that both sequence entries and accession numbers contained characters they should not.  Specifically, I found these interloping characters:

+ * ' #
jVenn_chart

OrthoVenn Venn chart

Scrubbing those bonus characters from the database allowed the OrthoVenn software to run perfectly.  Before we leave the subject, I would comment that these characters would cause problems for almost any program designed to read FASTA databases; in some cases, for example, the protein containing one of those characters might be prevented from being identified because of these inclusions!  My read is that they were introduced by manual typing errors; they are not frequent, and they appeared at a variety of locations.  Let’s remember that they have been in place for four years, with no subsequent database release!

Most people are accustomed to seeing Venn diagrams that incorporate two or three circles.  In this case I compelled the software to compare six different sets.  The bars shown at the bottom of the image show the numbers of clusters in each database; note that these differ from the number of sequences reported in my bullet list above because OrthoVenn recognizes that sequences within a single database may be highly redundant of each other!  (If sequences were completely identical, they could be screened out by the Proteomic Analysis Workbench from OHSU.)  Looking back at the six-pointed star drawn by the software, we might conclude that the overlap is nearly perfect among these databases.  We see four clusters specific to the JGI Pillay database, and 131 clusters specific to some sub-population of the databases, but the great bulk of clusters (3667) are apparently shared among all six databases!

Venn

The Edwards visualization from OrthoVenn

Oh, how much difference a visualization makes!  Shifting the visualization to “Edwards‘ Venn” alters the picture considerably.  Now we see that the star version hides the labels for some combinations of database.  We see that 3667 clusters are indeed shared among all six databases.  After that, we can descend in counts to 131 clusters found in the Pillay and Cole databases from JGI; does this reflect a difference in how JGI runs its assemblies?  Next we step to 106 clusters found in UniProt, Ensembl, Tuberculist, and NCBI GCF, but neither of the JGI databases.  The next sets down represent 70 clusters found in all but NCBI GCF or 25 clusters found in all but the two JGI databases and NCBI GCF.

I interpret this set of intersections to say that tuberculosis researchers are faced with a bit of a dilemma.  If they use a JGI database, they’ll miss the 106 clusters in all the other databases.  If they use Ensembl or TubercuList, they will include those 106 but lose the 131 clusters specific to the JGI databases.  Helpfully, OrthoVenn shows explicitly which sequences map to which clusters.  Remember that when I downloaded the Ensembl and NCBI databases, I saw that they were both based upon a single genome assembly called ASM19595v2.  Did they contain exactly the same genes?  No!  Ensembl contained two fairly big sets of genes that NCBI omitted, including 70 and 25 protein clusters, respectively.  NCBI contains another 11 protein clusters that were omitted from Ensembl.  Just because two databases stem from the same assembly does not imply that they have identical content.

For my part, I may use some non-quantitative means to decide upon a database.  I do not like making manual edits to a database since then others need to know exactly which edits I’ve made to reproduce my work.  That takes away TubercuList.  Next, I feel strongly that the FASTA database should contain useful text descriptions for each accession.  Take a look at the lack of information TubercuList provides for its first protein:

Rv0001_dnaA

That’s right.  Nothing!  The Joint Genome Institute databases are quite similar in omitting the description lines. Compare that to what we see in the NCBI and UniProt databases:

NP_214515.1 chromosomal replication initiator protein DnaA [Mycobacterium tuberculosis H37Rv]
sp|P9WNW3|DNAA_MYCTU Chromosomal replication initiator protein DnaA OS=Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) GN=dnaA PE=1 SV=1

That’s much more informative. We’ve got missing data here, too, though. Tuberculosis researchers have grown accustomed to their “Rv numbers” to describe their most familiar genes/proteins, but NCBI and UniProt leave those numbers out of well-characterized genes; the Rv numbers still appear for less well-characterized proteins, such as hypothetical proteins. By comparison, Ensembl includes textual descriptions as well as Rv numbers in a machine-parseable format for every entry:

CCP42723 pep chromosome:ASM19595v2:Chromosome:1:1524:1 gene:Rv0001 transcript:CCP42723 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:dnaA description:Chromosomal replication initiator protein DnaA

On this basis, I believe Ensembl may be the best option for tuberculosis researchers. It is kept up-to-date while TubercuList is not, and it allows researchers to refer back to the old Rv number system in each description.

I hope that this view “under the hood” has helped you understand a bit more of the kind of question that occasionally bedevils a bioinformaticist!

Advertisements

With the new year, a new office!

The Division of Molecular Biology and Human Genetics occupies the fourth floor of the FISAN building at the SUN Faculty of Medicine and Health Sciences.  As its research programs have become better funded, substantial numbers of clinical and research staff have been added to its roster.  One practical result of this addition was that I shared an office with three other researchers when I arrived in South Africa in late 2015.  With the start of 2017, however, our division has gained access to office space on the third floor.  I am happy to report that as of last week, I have a new solo office!

This move does not come without regrets, though.  I have become friends with the inhabitants of F416, and my new hallway currently seems quite lonely by comparison.  Sam Sampson is a group leader who came to SUN via the National Research Foundation “South African Research Chairs Initiative,” and she has impressed me with her concentration skills in our busy office.  I also appreciate her thoughtful gift of teaspoons when mine went missing!  I really value Kim Stanley’s friendship; she has been very tolerant of my practical jokes, and occasionally I catch a glimpse of her mischievous sense of humor.  She invests countless hours in the REDCap study databases that undergird much of the research for our division.  Nasiema Allie was the last of the four people in our office to arrive.  Her job is quite critical since she ensures that the BSL3 lab facilities for our division are as safe as they can be.  Why is that a big deal?  Our division emphasizes research in tuberculosis, and we culture Mycobacterium tuberculosis from patient samples.  Some of the strains we recover from patients are resistant to every drug available to treat this disease.  Let’s just say that I don’t store my lunch in the freezers lining the division’s hallways!

img_6413

Our freezers are equipped with wireless boxes that “phone home” if the temperature inside rises.

My new digs are on the wing extending east from the FISAN entrance.  To be on the third floor means I am several feet closer to the flock of chickens at ground level.  When I open the window (!) of my new office to feel a sweet afternoon breeze, I also get to hear the crowing of the roosters.  Last week we also had the questionable benefit of being closer to the smell of decomposition as cadavers were moved downstairs; FISAN is an Afrikaans abbreviation for “physiology and anatomy!”  That said, the third floor has great accommodations for the bioinformatics and biostatistics students we will be training in SATBBI.  The student chamber we have selected has abundant space, featuring bookshelves, a chalkboard, a bulletin board, and even a sink!  Right outside we have a smaller area we hope to position as a meeting room.

img_6406

We haven’t reached our final configuration for the desks in the bioinformatics student workspace.

This brings us to my office.  I was one of the first professors to pick out my new home, and I decided on one featuring a blue wall (rather than the beige featured throughout the complex), an intact chalkboard (rather than the removal scars from one that had been removed), and a ledge underneath its narrow window.  I discovered that the ledge was the perfect height for tucking a cabinet or drawer set from our old furniture upstairs.  They will match the desk that my graduate student and I hauled downstairs from my old office.  The ledge is sturdy enough that I can stand on it to raise my window, so I feel confident that it will house some plants for me soon!

img_6410

All I need now is a coffee table.

Moving my computers down was a bit more worrisome.  Happily, the LAN port (or “network point,” as they would say here) was already live, though it is a slower 100 Mbps rather than gigabit.  In any case, my Ubuntu Linux file server “Deep Thought” made the transition downstairs without a hiccup.  I recently brought my Intel Core i7 workstation “Alabaster” from home; it connects to the network wirelessly, so I can use a network wire to connect the two computers in my office directly.  Using a gigabit network port exclusively to communicate between the pair means I can use the RAID from Deep Thought almost as though it were a local hard drive in Alabaster.  This may be as good place as any to mention an act of generosity from Vanderbilt University.  When I decided to move my lab to South Africa, the Department of Biomedical Informatics allowed me to move almost all the computers associated with my laboratory to Stellenbosch University!  It made a real difference to my new division.

img_6422

What office is complete without a memento or ten?

I have assembled a collection of treasures on my desk that link me to my past.  Probably my oldest memento is a koosh ball that I acquired in high school.  I am very fond of my jar of marbles for my discussions in frequentist biostatistics; I bought these marbles when I was starting as a professor at Vanderbilt from the Moon Marble Company, near Kansas City.  My first Ph.D. graduate student bought me a jade pen holder that I use everyday.  My singing bowl from China gets a special place of prominence.  A small, red Buddha was a parting present from the Harkeys, close friends from Nashville.  An analog clock from Vanderbilt reminds me of my friend Bing Zhang, who headed to Houston around the time I moved to Cape Town.  I don’t remember where my Ganesh came from, but he has a reputation for finding the solutions to problems, so he definitely belongs on my desk!

My name placard has moved, I have given up my key to F416, and all my things have migrated downstairs.  Over the weekend, my lovely girlfriend bought me white and colored chalk for my new chalkboard!  Now it’s time for the science to flow from my desk once again.  Wish me luck!

img_6408

This hallway awaits the rest of its occupants!

 

Five research institutions in one week

My home institution is the Faculty of Medicine and Health Sciences of Stellenbosch University, in the outskirts of Cape Town. Because 60% of my effort is paid by the Medical Research Council, however, I serve as a bioinformatics resource to tuberculosis researchers, regardless of their institution. On a recent week, my duties took me to five different institutions. This post invites you to come along for the ride!

Monday: Tygerberg Campus

The first day of each week generally finds me perched at my computer in an office shared with three colleagues. They are each quite impressive scientists. One is a professor, another is a database administrator, and the newest member is a lab safety manager. All three of them work long hours. We added a whiteboard on the door this week to keep track of where each of us may be found.

IMG_5373

What can I say? I like maps.

My chief project at the Tygerberg campus for this week has been establishing a small cluster of computers for the Mycobacteriology Molecular Evolution and Physiology group. The recent loss of a server destroyed some assemblies of genomic sequencing data for them. I brought seven i7 processing nodes from my old lab. We spent much of Monday afternoon consolidating memory to produce four computers with 8GB or more of RAM (even microbial genomes can take a fair amount of RAM for assembly). We installed Ubuntu Linux on the computers since the assembly software requires that operating system. The group promises to keep the processors busy in the coming weeks!

Tuesday: Groote Schuur

The Faculty of Health Sciences for the University of Cape Town is based at Groote Schuur Hospital (this is not pronounceable by an American, but try HREE-ta SKEW). I work on Stellenbosch hours, which means I arrive at UCT long before most visitors. I had two institutions to visit on this Tuesday: the Centre for Proteomics and Genomics Research and the UCT health sciences campus itself. CPGR and I had talked about a large-scale proteomic survey spanning more than one hundred patients, so I brought along a portable hard drive to collect the mass spectrometry data. I have enjoyed talking with their proteomics team, so engaging in a legitimate collaboration with them is a great opportunity.

IMG_2763

Just after sunrise, Anzio Road’s intersection with the M4 is quite pretty.

I walked up Anzio Road to the UCT Health Sciences Campus. I used my recently acquired badge (I’ve become an honorary professor at UCT) to card into the campus and then used it again to enter the Institute of Infectious Disease and Molecular Medicine (abbreviated the “IDM“). I had a great meeting with a post-doctoral fellow in SATVI. She has been waiting very patiently for me to analyze cell populations in a large flow cytometry data set associated with IRIS (Immune Reconstitution Inflammatory Syndrome, a severe condition that sometimes results when a person with HIV infection begins anti-retroviral therapy). At long last, we could begin visualizing the results from this huge study!

IMG_2771

The IDM feels both modern and classic to me.

I returned to my normal haunt at UCT, the Blackburn Lab. I had already filled my morning with other meetings, but I could still get some updates on projects from his graduate students and post-docs. At two in the afternoon, we had our weekly community proteomics meeting. Once a month, we put on a special two-hour program devoted to a special topic (for example, the next special program discussed strategies for publishing effectively in proteomics). After that meeting ended, I was on the road back to my usual neighborhood.

Wednesday: Tygerberg Campus

Upon my return to my home campus, I had the usual crush of email to field, due to my time at UCT the prior day. Working with so many institutions has led to a proliferation of email accounts; at the moment I monitor two accounts from back in the States plus my Google account, and I’ve picked up accounts for both Stellenbosch University and University of Cape Town. I expect that most messages will come to my Google account or my Stellenbosch account, but manuscript review requests frequently show up elsewhere.

IMG_5369

Poking the beast in the bioinformatics cave

I decided to give another afternoon to the mini-cluster. Having four computers running Linux is nice, but wouldn’t it be better if the operators didn’t have to physically connect a monitor to each to use it? Shouldn’t we have some files shared among the computers? I hauled out the network switch and wired them up. The information technology folks registered the MAC addresses of the network adapters for use in the FISAN building where they are located. I installed a big hard drive in the head computer and installed NFS and OpenSSH on it. On the other three computers, I installed NFS-COMMON and OpenSSH, each time configuring them to look to “Beast1” for the /scratch file system. Now users can launch jobs over SSH to the other three computers while logged in from Beast1. I felt that these computers had crossed over the transom to become worth using, and I breathed a sigh of relief.

Thursday: University of the Western Cape

I have only recently begun working regularly with the Proteomics Unit of the UWC Biotechnology program, but I have really enjoyed the interaction. I have found myself thinking about proteogenomics in a rather different light to support their work with non-model organisms. To identify proteins in Capsicum frutescens, we needed to know which proteins are encoded by the genome of this organism. Happily, we found an RNA-Seq paper that included assembled transcripts that we could employ in database search algorithms for proteomics. More recently the UWC team has been challenging me with a different plant. We found RNA-Seq data, but not the assembly produced by the authors. It has been fun building a team to infer a proper protein catalog for the species.

One of my favorite games to play with my old lab group was “Where’s the Paper?” We would work our way around the table, with each person discussing what he or she was working on. Then we’d try to figure out the best paper that could be produced from that work. That game has come in handy at UWC, as well. We’re trying to find the best manuscripts among the data that this team has been generating. I have faith in these folks!

IMG_5406

The Life Sciences Building is easily visible from the adjoining M10: Robert Sobukwe Rd.

UWC is also notable for having some of the best buildings of any campus in the area. For years it languished behind the better-funded UCT and Stellenbosch Universities, but the democratically-elected government since 1994 has spent substantial funds on the campus. The Life Sciences and Chemical Sciences buildings are gorgeous, and they would not look out of place on any American campus. One can add another nice feature of the campus; it sits adjacent to the Cape Flats Nature Reserve. I love the plants in view as I make my way along West Drive.

IMG_5398

The cape flats range from marsh to sandy fynbos.

I returned to the Tygerberg campus of Stellenbosch in the afternoon for the first rehearsal of the “Singing Sensations,” a group of academic staff who are preparing three songs for the annual gala. This year’s program will celebrate sixty years of the Stellenbosch Medical School!

Friday: Stellenbosch University, main campus

My Friday morning began at a large conference room on the floor below mine in the FISAN buiding. The Tuberculosis groups have grown so much that we cannot easily fit into our own seminar room. Three Honours students presented their plans for their research projects to a packed house, and professors peppered them with questions. The presenters did not wilt, happily. For my American readers, I would explain that Honours students have completed their three-year Bachelor’s degrees and are adding another year to prepare them for their chosen graduate program, in this case with Molecular Biology or with Human Genetics at the med school of Stellenbosch University.

I had become excited about seeing the computer cluster in operation again, and I tried to identify which parts I would need to get the other three processors in action again. A few months ago, I had a nasty surprise when I plugged one of the units into the wall. The very loud POP and sizzle had me checking my eyebrows; I needed to flip a switch on the power supply to enable it to use the doubled voltage of South African power outlets. Replacing the ITX power supply on that case seems a bit challenging here in South Africa; the university’s two main parts suppliers do not carry them. I put together a quote that includes a new case and power supply.

IMG_1886

The A. I. Perold building on the Stellenbosch main campus

With the morning gone, it was time for me to hit the road for Stellenbosch University’s main campus, which is in (wait for it) Stellenbosch. From the outskirts of Cape Town, the drive lasts less than an hour. I parked near the Perold Building (number 65 on the campus map) and walked up the stairs to the offices of the Dean of Science, Prof Louise Warnich. I visit the main campus every two weeks, usually with my friend and colleague Gerard Tromp, but this was a solo week since he was traveling. I was really grateful that my friend Professor Heinrich Volschenk was available to talk. We have been planning a tutorial session with Laurent Gatto for the end of October. We will teach students how to make use of the R statistical environment and Bioconductor in making sense of their proteomic data. Heinrich and I framed the key topics we want to address with the students. I think it will be quite a good program!

It was a busy week, but quite a rewarding one. I love the diversity of projects that I can influence here.

Making digital images from 35mm slides

As a young man, my father was quite the technologist.  He bought a reel-to-reel tape audio recorder during his service in the U.S. Army, and so I can hear the voices of my grandparents during a long road trip in the 1960s or my parents during their vacation to the South in 1971.  Dad acquired a super-eight film movie camera during that time as well, and he shot movies with it up to the time I was a child.  He even became a camera buff, shooting more than one thousand photographs that were developed as 35mm slides.  As a result, I can draw on an unparalleled archive of images from my family’s history.

As a technologist of a different sort, I have tried to bring this archive into the digital age.  During my college years, I was able to produce an 8mm video camera recording for much of Dad’s movie footage, and I digitized that footage through a miniDV video camera and Firewire cable in later years.  During my graduate school years, I digitized the audio from a few hours of reel-to-reel tape to produce audio CDs.  Oddly, producing high-resolution scans of the 35mm slides has posed the biggest challenge.  Today I can report that we have finally found a way to digitize those amazing boxes of slides!

When I was in graduate school at Seattle (1996-2000), I performed my first experiments with slide scanners.  My friend Elizabeth allowed me to use her HP Photosmart slide scanner, and the resulting images were okay, for the time.  I tried buying an inexpensive slide scanner from another company, and yet the product from the hours of time I invested in using it was fairly disappointing.  In recent years, I purchased a Canon CanoScan 8600F, a flatbed scanner with a lid that can backlight transparent sources.  The images from this flatbed have been pretty nice, since it can operate at 4800 dpi, but scanning even a single slide at this resolution takes a fair amount of time.  I’ve never managed to scan the whole collection with scanners.  I have also found that scanners do not cope very well with the range of brightness that we encountered with the slides; many dark slides simply produced poor quality images in any scanner.

In 1999, I discovered that Canon had produced the FP-100 slide adapter for my Hi-8 video camera, and I acquired one for the princely sum of $120.  Essentially, the FP-100 was a low-temperature lamp, a bracket through which one could move a slide holder (with some wiggle room for positioning), and a ring to attach to the front of the camera.  I was glad that the cool lamp was bright enough to illuminate the darkest images from the collection, since the video camera could adjust its iris to the content of the slide.  Because the video camera could only resolve 480 lines in each image, though, the video images we produced through it were not quite what I wanted.

DSCF6710-crop-resize

The entire sandwich of equipment is unwieldy, but it works well enough!

In 2015, I found the missing ingredient.  My Canon EOS-M camera employs a prime (non-zoom) lens with an external fitting of 43 mm.  I asked my photographically talented friend Brad Melton for some assistance on how I would connect the 46 mm FP-100 to the prime lens, and he located a “stepup ring” for me.  Would a slide adapter intended for a video camera be usable with a modern mirrorless high-resolution still camera?  We quickly discovered that the camera was unable to focus on the slide images because the lens was too close to the CCD to focus on an image so near.  I compensated by adding a macro tube between the camera body and the lens.  The entire sandwich of equipment included these elements:

  1. Canon EOS-M2 mirrorless camera
  2. Meike MK-C-AF3B 10mm macro tube
  3. Canon EF-M 22mm f/2 STM prime lens
  4. HeavyStar Dedicated Metal Stepup Ring 43mm-46mm
  5. Canon FP-100 slide adapter.

I was reasonably pleased with the performance, on the whole, but there are certainly some drawbacks. The first is the problem of achieving good focus. Many of these slides now carry a fair amount of dust, and the camera was frequently inclined to auto-focus on the dust rather than the image. The second issue results from the image being so close to the lens; the edges of the slide are considerably farther from the lens than the center. Focusing on one part of the image generally meant that parts of the slide farthest from the focus point would be more than a little fuzzy. This photo gives an example of this behavior:

IMG_3747-8

In the left snapshot, I have selected my mother’s face as the focus point. In the right, I have selected my grandmother’s instead.

I also encountered some degenerate behavior in the focus.  I would occasionally flip the selector over to use manual focus rather than automatic.  After cranking the focus ring all the way to one end of the dial, I would sometimes discover that the camera thought it was being operated by a madman and would force the focus back in the other direction.  When I tried switching to a different macro tube thickness, I was entirely unable to focus, so I simply felt grateful that I could get these images to focus at all!

In some cases, the slide scanner had dealt very poorly with slide images that were quite dark overall or that featured a significant contrast between light and dark portions of a frame.  Happily, the Canon EOS-M2 seemed to handle these contrasts better because it was storing brightness levels in the 14-bit depth afforded by the CR2 raw file format.

IMG_3389

My older brother always wanted to play.

One of the most common claims about 35mm slides is that they are far more resilient to aging than are prints from the same negatives.  Did that hold up?  The image of my father on the mule at the top of this post dates from 1964, around fifty years before this blog post was written.  The hues may be somewhat less vibrant than they were originally, but I doubt very much that a print from 1964 would hold up as well.  In this case, I have cropped to approximately 40% of the original field of view.  The slide below this is from 1965, showing my mother during her graduation ceremony from Northeast Missouri State Teachers College.  Again, I have cropped to about 40% of the original slide (in part to strip away unfocused areas).

IMG_3213

These slides enable me to see my parents in the primes of their lives!

Of course, capturing the original CR2 files for each of a thousand slides is just the beginning.  From here, I will need to export each to a TIFF file, crop the image to a new dimension, possibly apply a noise reduction filter, and export to JPG (the images I included here have not been gone through noise reduction, though I did scale down the resolution considerably from the 18 megapixel originals).  That step will take considerable time, but I believe the result will be a far more useful archive for our family history.

I hope that this post will contribute some ideas for how to get your family archives in a more manageable condition!

IMG_3073

Our family

Telkom South Africa: more clowns than a three-ring circus

As a researcher, I need access to data.  As a man far from home, I need network bandwidth for Skype calls.  As a consumer, I demand fair and accurate dealing from the companies with whom I contract.  Telkom South Africa has demonstrated that is woefully incapable of satisfying any of these roles.

I assumed control of an existing Telkom DSL and phone contract when I moved into Turtle House.  I assumed that working with Telkom would be easier than starting an altogether new service in the house, a process that may require multiple weeks.  I even upgraded the service to 4 Mbps and 20 GB/month at the time I assumed the line, working with the friendly folks at the Telkom Direct store at Tyger Valley Shopping Centre.

When I tested the service I had acquired, though, I was appalled.  Their promises of getting my line switched to 4 Mbps in the course of that week were entirely hollow.  I have never seen the service reach as high as 1.5 Mbps, and frequently it dipped below 1 Mbps, a service level that makes even Skype audio conversations quite choppy.

Within ten days, I had gone public with my unhappiness:

The very next day I received notification that my service had finally been switched to 4 Mbps.  I dutifully performed a speed test:

Far from reaching 4 Mbps, I was still in a range that would be unacceptable even for 2 Mbps service.  I became a regular caller of their 10210 customer service line.  On January 26th, the customer service person admitted that my home line was not rated for 4 Mbps and was marked with a maximum speed of 2 Mbps.  She noted that the salesmen at Telkom Direct could see on their computers that my address was not capable of 4 Mbps service; they had sold me a contract that they knew they could not fulfill.

The next morning I was at the Telkom Direct store, brandishing her words.  The same sales agent as before raised his hands in the air, saying that his computer shows great coverage in my area, professing his confusion that I’m not enjoying their great product.  He sadly admitted that his systems don’t show any technical information.  He reduced my 4 Mbps contract to a 2 Mbps contract.  No, he couldn’t tell me when a Telkom technician would be able to visit my home.

On February 3rd, I spoke with the first really helpful representative I have ever reached through that service.  She continued that my line was showing a clear fault.  My home required a point-to-point test to figure out where the wiring fault lay, and I was showing erratic DSL sync speeds.  She (like others before her) would mark the fault as requiring “urgent” redress.  I should hear from a line tester rapidly.

Gratingly, the next day I received a text from the company informing me that my service had been restored (though nobody had visited the home or the network exchange box).  The company had taken my “urgent” problem and simply declared it solved.  I was back on the 10210 line again with them that night.  A series of representatives marked my account as urgent, and of course nothing happened.  You can even see their Twitter team making claims that they’ll get me some feedback:

While I was sick at home on February 9th, I called the 10210 number again.  I told them that I was home anyway, so why don’t they try sending a technician?  They promised an “urgent” response.  On February 10th, after I tried unsuccessfully to return to my normal work schedule, I received a few phone calls from a number while I was in a meeting at work.  I called back as soon as I could.

I was astonished to hear that a Telkom line serviceman had been dispatched to my home!  When could I be there to let him in?  I replied “half an hour” and jumped into my car.  He was there five minutes after I arrived, and he set to work.  I repeated the information about the need for a point-to-point test and an erratic sync rate, and he dutifully ignored it.  At one point, he said, “I think I have found your problem,” and he began to pick up my Network Attached Storage device (a redundant hard drive system in active service).  I told him to set that down SLOWLY and pointed out that it was not connected between the wall and the router so therefore it was not the problem.  He ran some tests on the line and claimed they were all clean.  He showed me his work list for the neighborhood; it was full of other phone numbers with failed DSL service.  He said, “until someone deals with the overcrowded exchange, you’ll have this problem.”  I asked when someone would deal with the exchange, and he said he’d make a note (I tried not to roll my eyes).

While he was there, I though it might be helpful to have a live call with the 10210 people.  I called them again, and a real go-getter answered the phone.  I explained that I was standing next to their line technician, who had just told me that the exchange required repairs.  The customer service fellow replied that the exchange was scheduled for a maintenance on Feb. 11th (the next day!) from 11AM to 4PM.  I noted that “maintenance” was not what we needed; it was time for a serious fix to the problematic exchange.  He reported that the scheduled maintenance would be moving all the DSL lines to a new node that would equip them for up to 10 Mbps access.  He provided the line fault reference number (54CWK060216).  I discussed this information with the line tester while the go-getter remained on the phone.

Sadly, the go-getter’s information was entirely false.  My first considered activity upon returning home this evening was to run the network speed tester.  The result was just as disappointing as all the others.  The line was still incapable of reaching 1.5 Mbps, let alone 2 Mbps.  I placed yet another call to 10210.  In total I spent approximately two hours in this call.

I started with a first-line customer service representative.  She reported that no technician has been assigned to my line fault reference number.  How could that be possible, I asked, when I was standing next to the line tester when I had called only the day before?  I pressed her pretty hard to ask about the status of the exchange maintenance that had been slated for today, February 11th.  She reported that that information would not appear in her database.  That twigged a sick feeling in me, and she transferred me to her supervisor for a more authoritative answer.

The supervisor was pretty bothered by what I had to say.  He spent approximately a half hour checking with people around him, but in the end, he reported that nobody who answers the 10210 number would have access to the technical support schedules, so there’s no way that the go-getter had information to support his claim that the exchange would be serviced on February 11th.  The go-getter had simply lied to me since he wouldn’t face any fallout from doing so.

The next problem that arose was that no technician had been attached to my line fault.  Who was the person who had appeared at my house yesterday?  Shouldn’t the line tester have been noted in the system as attached to my line fault?  Shouldn’t his results from the testing be added to their database?  Nobody seemed to have any clue about the multiple failures on this visit.  Welcome to Telkom, the company where the left hand doesn’t know what the right hand is doing!

I explained to the supervisor that I had been misled a variety of ways by Telkom since the beginnings of my dealings with them.  He could see the backlog of my calls to 10210.  Is a customer who calls your service line every other day for two weeks a happy customer?  Not even close.  I told him that Telkom had breached its contract with me by failing to provide the DSL service described in the contract.  I told him that Telkom had knowingly provided me false information.  If they’d told me that due to the financial pinch they simply couldn’t afford to upgrade our exchange box, we’d be in a different category, but instead the go-getter had simply lied to get me off his back.  A contract is binding on both partners, and a contract is not valid when it is with a company that lies to me.

He transferred me to the accounts office.  The woman who answered seemed very confused that I was speaking with her.  Why did I think that the Accounts office could terminate my contract?  I asked her why the Customer Service people on 10210 would have thought her office was the right one.  She explained that I could either send an email to servcancellation@telkom.co.za to cancel my service or I could go back to the Telkom Direct sales people at Tyger Valley Mall.  She then asked if I had spoke to the technical support people yet.  Frankly, I wasn’t sure.  She transferred me over there.

I spoke with technical support for a while, and we returned to the issue of the identity of the line tester.  This time I remembered that my cell phone showed the line tester’s attempts to reach me in the call history.  I provided his cell phone to Telkom.  I imagine that they would like to know who he is.  Eventually we reached an impasse on the information available to the technical support representative, and he transferred me to his supervisor.  After a few seconds, the line went dead.

Telkom, all you had to accomplish was DSL service of 2 Mbps and provide me factual information when I had questions about my service.  Intentionally providing false information is unacceptable from a person, and it is unacceptable from a company.  I look forward to shredding my contract at the Telkom Direct store tomorrow.  I would rather have no service at all than deal with Telkom’s inadequacies.

Navigating document composition: the SACNASP application

Five days ago, I completed my application to SACNASP, the South African Council for Natural Scientific Professions.  As I mentioned in a prior post, the application spans a fair number of documents.  I thought it might be helpful to walk through some of the tools that I used to assemble the required documents.

Scanners were once luxury items for computer systems, but today we see them on many desktop systems in several guises.  Perhaps the most common is the multifunction printer/fax/copier.  If your computer is attached over a network to a photocopier, it’s probably possible for you to use it for image scanning.  I purchased a CanoScan 8600F several years ago.  It’s been a stalwart for me.  I particularly appreciate that it can scan 35mm slides and negatives, which has helped me in archiving family photographs.  For documents like those in the SACNASP application, I generally scan at a far lower resolution than the device maximum; 300 dpi is just fine.  I needed the scanner to produce images of my diplomas as well as my graduate transcript.

Having some decent image editing software is necessary to field those scans.  I use Paint.NET, a free tool for Microsoft Windows.  I’m using quite basic functions (mostly cropping, resizing, file export, and the occasional layer rotate), so almost any graphics editor you could name would fill the bill.  My usual routine with scanned documents is to drop the resolution by a factor of two (which cuts back some of the scanner noise), crop off any empty space from edges, and then save the image.  I see significant overuse of the JPEG format among my friends.  JPEGs are great when you’re dealing with photos, but they’re not right for most documents because they are better at preserving color than they are at preserving detail.  Instead, I will frequently mash scanned black and white documents to gray scale and then save as GIF.  In the case of my graduate transcript, I needed to preserve the light purple background and the dark purple border, so I mashed the color palette down to 256 colors and saved as GIF.

The two applications I’ve filed so far, from SAQA and SACNASP, have relied heavily on documents being submitted in PDF format.  I’m too cheap to buy Adobe Acrobat Professional, so I installed BullZip instead.  The free software acts like an extra printer for your computer, but instead of printed documents, it supplies PDF files.  Once I had an image ready to go in Paint.NET, I could “print” it from within the software to a PDF file.  BullZip also handled another task for me.  I had downloaded the full copy of my Ph.D. dissertation, but the SACNASP application required only the title page and abstract page.  I “printed” those individual pages from the dissertation PDF to separate PDF files.  I couldn’t find my undergraduate honors thesis on-line, so I retrieved the WordPerfect document from my home office archives and imported it into Microsoft Word (a figure or two suffered in the transition).  I then “printed” just the title page to a one-page PDF.

Great!  At that point, I had each of the required documents in PDF format.  SACNASP, however, did not want each document uploaded separately.  Instead, they wanted my diploma and transcript scans in one document and the honors thesis / dissertation pages in a separate document.  I was unsure how to handle that, at first, and I nearly punted the project to one of the administrators in my former department.  Instead, I did a little reading, and I learned that BullZip will also merge separate PDF documents into a single one.  I had to use the command line, but the result was gorgeous.

On July 13th, my application to SACNASP (along with a fairly substantial application fee on my credit card) was complete!  Over the next week or so, the organization will be contacting the two people I listed as referees at Stellenbosch University.  My referees will create documents that attest to my professional skills.  Hopefully SACNASP will accept me as a new professional member.  Then we can start the process of the critical skills letter.  It will play a significant role in getting my temporary residence permit from the South African Department of Home Affairs!

The compulsive tinkerer

When I am asked why I became a scientist, I have sometimes answered that a blank whiteboard seems to call my name.  There you are, marker in hand, and you can go any direction with your drawings that you can imagine!  I love creating something new, and technology development is an obvious niche for me.  This desire to imagine and then create has also driven my interest in computers.  Of course I like creating new algorithms (especially publishable ones), but building the computer itself is a special joy for me.

My first memory of poking around inside a computer came before we had a computer designed for that purpose.  I was a nearly compulsive programmer on the Commodore VIC-20 my family purchased in 1981.  My parents informed my brother and me that they wouldn’t buy us video games until we could write them ourselves.  They gave my brother the user’s guide on that first night, but I swiped it from him in no time at all.  At first, my code did not stray far from PRINT, FOR, and GOTO, but eventually I found my way to more complex logic.  Because the computer had only 3583 bytes (no, not kilobytes or megabytes) available for programs, I learned to be efficient by necessity!  I spent a large fraction of my free time perched in front of that computer, and after a few years, the keyboard had worn out.  Along the way, we had found another VIC-20 at a garage sale.  I swapped the working keyboard from one unit to the other.  It only required unplugging and reattaching a ribbon cable, but I was hooked!  We bought a memory expansion for the VIC-20, en route to a Commodore Plus/4 (found at a liquidation store in 1986) and then a Commodore 64.

When I was a senior in high school (1992), I shifted to my first Intel PC, a 486DX-50.  Even before I had begun using it, I installed my first upgrade, a Sound Blaster Pro!  I did not intend to listen to bleeps and blurps when real music was possible, and I had gotten spoiled on a Commodore 64 in the meanwhile.  I still have the sound samples my friend Brad and I made while messing around with audio recording software.  Soon I had ventured into video cards and other add-ons.  The fun we had with bitmap editing and video capture would lead in all sorts of directions.

By the time I was a graduate student (1996), I had begun building my own computer systems from parts.  I try to pass that skill along to trainees in my laboratory, and they all seem to take to it quite easily.  We have so many options, these days!  From tiny ITX systems up to server chassis, we can create a system to meet almost any requirement.  I find it hard to imagine creating code for the 1 MHz MOS 6502 that powered my VIC-20, now that I’m accustomed to processors that cycle more than 1000 times as quickly.  I know, though, that those early days set me on a career path that has paralleled the emergence of new technologies.  Would I have thought of bitmaps as a way to handle complex Venn diagram relationships in my Contrast software if I hadn’t spent as much time with graph paper and powers of two while in junior high school?  I don’t think so.

Did my parents know how far my childhood hobby would take me?  I think that they guessed as much.  I can hardly imagine how I would have responded if a language like Scratch had been available back then!  I certainly hope that the kids of today will have the opportunity to play in as rich a technological sandbox.