Archive for the 'Soft Hacks' Category

Splitting bioinformatics FASTA files

I keep forgetting where my scripts were in my home directories. Below is my ruby script to split a large FASTA [1] sequence into N sequences per file:

#!/usr/bin/env ruby
#
# Script: dumpseq.rb
# Description: Parses the a BLAST Fasta file and dumps each sequence to a
#              file.
# Usage: dumpseq.rb [fasta_file]

require 'fileutils'

fasta_db  = File.new(ARGV[0])

sno = 0
d = 0

file = nil

while true
  x = fasta_db.readline("n>").sub(/>$/, "")
  x =~ />(.*)n/
  if sno % N == 0 # N seqs per query
    file.close if file != nil
    dir = sprintf("D%04d000", d / 1000)
    FileUtils.mkdir_p dir
    # short filenames
    fname = sprintf "SEQ%07d.fasta", d
    d += 1
    file = File.new("#{dir}/#{fname}","w")
  end
  file << x
  sno += 1
  fasta_db.ungetc ?>
end

Its pretty hackish-looking. But then I found out that BioRuby [2] wrappers for parsing FASTA files.

[1] http://en.wikipedia.org/wiki/Fasta
[2] http://www.bioruby.org

Related Posts Related Websites
  • We've Been Hacked! Yesterday we did not post an article on to the site which is not like...
  • Don't guess on your taxes When you're filling out your taxes, it should be taken seriously.  They're not a laughing...

Why do research in the Philippines?

Because installing grid computing middleware can get you to this:

7th PANDA Grid Workshop, Bohol, Philippines, May 4 - 8, 2009
organised by
Ateneo de Manila University
Sponsored also by EPSRC, IoP, PPARC and the Royal Society of Edinburgh
The aim of the workshop is to bring together grid administrators and software developers in an informal setting, involving open discussions. The focus will include grid maintenance and monitoring and data production with PandaRoot.

Organising committee:

    Rafael P. Saldana (Ateneo)
    Kilian Schwarz (GSI)
    Dan Protopopescu (Glasgow)

Contact person:

Address:

    Holy Name University,
    Lesage and Gallares Streets,
    6300 Tagbilaran City,
    Bohol, Philippines

Let’s look at the itinerary:

Tagbilaran City (May 3, 4, 5)

Metro Centre Hotel and Convention Center
Pres. Carlos P. Garcia Avenue
Tagbilaran City, Bohol
Philippines, 6300
Website: www.metrocentrehotel.com

Panglao Island (May 6, 7, 8, 9, 10)

Bohol Beach Club
Bo. Bolod, Panglao Island, Bohol 6340
Website: www.boholbeachclub.com.ph

Shet, gusto kong umuwi!!!

Related Posts Related Websites

Adding git-svn support from source

Having workstations where you don’t have root access either means contacting support for installation or building your own software from source to get the latest version.

I started using git for code produced in my work. The build was successful with a simple “./configure; make ; make install” series of steps except for supporting access to subversion repositories. It was looking for the perl module SVN::Core to be able to function successfully. Googling about it will land you to the Alien::SVN CPAN module page. Its dependencies can be installed with the standard “install Module::Name” invocation in the CPAN shell. But the main package does not properly install in this environment. It is probably because of the tarball not containing the standard Makefile.PL. It has Build.PL instead. This script generates the Build that compiles the subversion library and its bindings. Then it generates a Makefile from Makefile.PL in the src/subversion/subversion/bindings/swig/perl/native directory. Below is the output of the script:

[Alien-SVN-1.4.6.0]$ ./Build
Running make
Running make swig-pl-lib
make: Nothing to be done for `swig-pl-lib'.
Running /usr/bin/perl Makefile.PL INSTALLDIRS=site
Writing Makefile for SVN::_Core
Writing Makefile.client for SVN::_Client
Writing Makefile.delta for SVN::_Delta
Writing Makefile.fs for SVN::_Fs
Writing Makefile.ra for SVN::_Ra
Writing Makefile.repos for SVN::_Repos
Writing Makefile.wc for SVN::_Wc
Running make
gcc -c  -I/home/aespinosa/local/include/apr-0
...

The command /usr/bin/perl Makefile.PL INSTALLDIRS=site generates a build environment to install in /usr. This is not favorable for installation in userspace since you do not have permission to write on that directory. So this command will be rerun /usr/bin/perl Makefile.PL PREFIX=$USERDIR, where $USERDIR is the destination directory you want to.

Now you can successfully clone subversion repositories!

Related Posts Related Websites

On science productivity

Grid computing infrastructures were made to support execution of science applications at larger scales. One challenge today in running your science in these behemoth systems the requirement of “griddification” or “supercomputerification”. You need to know how to make the best of your hardware or grid sites in order to orchestrate beautiful workflows and process your science. So a lot of research has been done to create languages such as Swift to make life easier for these domain scientists.

I was debugging a science application for the last several months to run on petascale (100×10^3++ processors) systems. The main goal of the domain scientist was to process hundres of thousands data sequences. I got too much carried away in the debugging to make the application work and have only looked at 3000 of the set In other words, not much *real* work has been done.

Now I should always remember when debugging, remember the scientists who took pain in measuring this data or who can’t get data. (Much like an analogy of “finish your food because there are millions of children hungry in developing countries”).

Related Posts Related Websites

Chicago Startup Factory

The event is a collaboration between the GSB and CS Department. The group hopes to create technology-heavy startups and businesses unlike when you gather a bunch of pure business people who can’t make a business plan other than canned food, a network of juice/ shake stands, etc. The speaker for the Startup Factory talk was Adarsh Arora, CEO of Athena Security and Co-Founder of Lisle Technology Partners. I took some of his striking ideas about innovating and generating business plans around technology:

  • never sell more than one innovation - his rationale for this was that the market cannot catch-up with all of your ideas. I have not thought of this deeply because [1] I have yet to have a really brilliant idea, and [2] most busines models I saw are too caught up in selling this one unique idea that they don’t bother to look at the other types (probably they are bad ideas in the first place).
  • interdisciplinary collaboration - now this is more familiar to my school of thought. As what we always say in the Ateneo Innovation Center, today’s problems are so complex that you need to apply every type of paradigm to be able to attack the problem from different angles and come up with a brilliant solution.

Adarsh also discussed four types of companies [1] wishful thinking (you have enough deep connections to get angel funding), [2] historical precedence - selling technology to improve a process, [3] intuitive jump - pure luck; with democratization of technology, YouTube and Ebay became a big thing even though video sharing and online auctions were almost non-existent web services during their time, and [4] sure technology - you know that there is a need for it in the future (e.g. Y2K “bug”).

Follow-up events to this is an Entrepenuerial Brainstorming Session with GSB and CS students and an Introduction to creating application on the iPhone. Apple’s development platform makes it so easy for anyone to distribute an app and sell it over iTunes (or AppleStore?) enabling you to earn several thousand dollars in a few months.

Oh, and they had free pizza during the talk :)

Related Posts Related Websites