Pierre's personal timeline, a place to collect and share things from Pierre's life.
Created by lindenb on Jul 18, 2008
Last updated: 03/10/10 at 07:29 PM
Pierre L. has no followers yet. Be the first one to follow.
In this post I'll create a Mysql Defined function (UDF) answering if a defined GO term is a descendant of another term. This post is not much different from two previous posts I wrote here:MYSQL UDF, trees of data, hierarchy Mysql user defined function (UDF) for Bioinformatics. Here I built a binary file containing an array of pairs(term,parent_term) of GO identifiers from the XML/RDF file
http://plindenbaum.blogspot.com/2010/02/mysql-user-defined-function-udf-for.html
In a previous post, I've played with Oracle's BerkeleyDB-XML. Here, I used with eXist-db, an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.Download & Installwget http://downloads.sourceforge.net/project/exist/Stable/1.4/eXist-setup-1.4.0-rev10440.jarjava -jar eXist-setup-
http://plindenbaum.blogspot.com/2010/02/exist-open-source-native-xml-database.html
neo4j "Neo4j is a graph database. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. A graph (mathematical lingo for a network) is a flexible data structure that allows a more agile and rapid style of development.".In the current post, I'll use the neo4j API to load a set of pubmed entries and find the shortest
http://plindenbaum.blogspot.com/2010/02/path-from-egonwillighagen-to-jandot.html
This post is a quick overview of Freebase for the participants of biohackathon2010 who attended François Belleau's lecture this morning. It was suggested here that Freebase could be used to store and share some predicates for the semantic web. In the current post, I'm going to use CURL to programmatically add a new Namespace in freebase .OK. First, let's send a MQL query to Freebase and get the
http://plindenbaum.blogspot.com/2010/02/freebase-and-biohackathon-2010.html
In the previous post I showed how to parse dbSNP/XML with libxml. In the current post I'll insert the results into MongoDB using the native Mongo C++ API. Some others have already posted about MongoDB: for example see Jan's, Brad's or Neil's posts The Boost-C++library is required: my code failed to compile with boost 4.* but it compiled fine with boost 3.9.Starting mongoDB> mkdir ~/tmp/MONGODB/
http://plindenbaum.blogspot.com/2010/02/i-really-need-to-sleep-inserting-snps.html
In this post, I'll show how a Fasta file can be used as a source of RDF statements for the Jena API.The DNA sequences in the Fasta file will be used by Jena without any prior transformation: the file will be used as a Graph by Jena by implementing com.hp.hpl.jena.graph.Graph.Here, my example uses a Fasta file but it could have been any kind of input: a SQL database, a XML file, a GFF file, etc...
http://plindenbaum.blogspot.com/2010/02/using-fasta-file-as-source-of-rdf.html
Trying to find the CSS style of a HTML element is a common task for me and I often look in the of the pages to try to find what can be "this inspiring CSS". So, I've created CSSPopup, a small extension for firefox. This extension appends a new button in the contextual menu that will print all the CSS selectors of the element that was clicked. For example when I clicked on "Welcome to
http://plindenbaum.blogspot.com/2010/01/what-is-css-style-of-that-html-element.html
Cameron Neylon recently asked on friendfeed:"Advice request: What is the best approach to exposing a publications list online. Any good tools for generating html/xml/rdfa?".In the comments, it was suggested to use Simile/Exhibit. This gave me the idea to write a XSLT stylesheet to transform a pubmed xml result to an Exhibit file. This XSLT stylesheet is available at:http://code.google.com/p/
http://plindenbaum.blogspot.com/2010/01/transforming-pubmed-to-simileexhibit.html
I'm currently reading Joe Armstrong's "Programming Erlang". Here are a couple of notes about ERLANG.Starting and stopping the Erlang shell:~> erlErlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]Eshell V5.7.2 (abort with ^G)1> halt().:~>Simple MathInput:2*(3+4).PI=3.14159.R=2.SURFACE=PI*R*R.R=3.Output:1> 142> 3.141593> 24> 12.56636##Variables in
http://plindenbaum.blogspot.com/2009/11/playing-with-erlang-i.html
This post is a description of my implementation of Jan Aerts' LocusTree algorithm (I want to thank Jan, our discussion and his comments were as great source of inspiration) based on BerkeleyDB-JE, a Key/Value datastore. This implementation has been used to build a genome browser displaying its data with the SVG format. In brief: splicing each chromosome using a dichotomic approach allows to
http://plindenbaum.blogspot.com/2009/11/java-implementation-of-jan-aerts.html
20th century.1900Charles Warren Eaton1901Prinet - Kreutzer So1902Ryabushkin dance.jpg1903Bobrikov by Kustodie1904Bertha Worms - Cançã1905Elizabeth Drexel.jpg1906Friedrich Kallmorgen1907Lawrence Alma-Tadema1908Thomas Moran - Grand1909George Bellows - Les1910Баян.jpg1911Checa Y Sanz Ulpiano1912Compton, 1912, Mount1913M.O.Lowenfeld by Rep1914Antônio Parreiras - 1915Amadeo Modigliani 011916Gustav
http://plindenbaum.blogspot.com/2009/11/600-years-of-paintings-part-6.html
18th century.1700Sebastiano Ricci 0301701Coypel, Antoine - El1702Sir Isaac Newton 1701703Houbraken, Arnold - 1704Sebastiano Ricci 0421705Florinus Astronomy 11706Sebastiano Ricci 0181707Sebastiano Ricci 0121708Sebastiano Ricci 0041709Giuseppe Maria Cresp1710Christian Ludwig Mar1711Nusplingen Friedhofs1712Jan Dobrogost Krasiń1713Sebastiano Ricci 0601714Allegory on the Peac1715Antoine Watteau
http://plindenbaum.blogspot.com/2009/11/600-years-of-paintings-part-4.html
17th century.1600Bernardo Strozzi - S1601Bril, Paul - Feudo d16023233 - Milano, Duomo1603Jan Brueghel d.Ä.- G1604Felipe Manuel, Princ1605Peter Paul Rubens 121606Rottenhammer Hochzei1607Bassot-Saint-Christo1608Alof Louvre.jpg1609ElGreco-HortensioPar1610Gregorythegreat.jpg1611Jan Brueghel the Eld1612Peter Paul Rubens 011613MariadeMedici11.jpg1614FishersOfMen.jpg1615Peter Paul Rubens 081616Peter
http://plindenbaum.blogspot.com/2009/11/600-years-of-paintings-part-3.html
16th century.1500Pietro Perugino 061.1501Pietro Perugino 051.1502Vittore Carpaccio 001503Albrecht Dürer 050.j1504Raffaello - Spozaliz1505Lorenzo Lotto 006.jp1506Lucas Cranach d. Ä. 1507Adam et Eve.jpg1508Albrecht Dürer 066b.1509Sibyl of Delphi - Si1510Lorenzo Lotto 025.jp1511Raffael 055.jpg1512TitianFirstAretinoPo1513GaudenzioFerrari Sto1514Vittore Carpaccio 051515Bernhard Strigel 0041516Hans
http://plindenbaum.blogspot.com/2009/11/600-years-of-paintings-part-2.html
Last year, I described how to use the JAVACC a parser/scanner generator for java. This WE, I've played with JJTREE: JJTree is a preprocessor for JavaCC that inserts parse tree building actions at various places in the JavaCC source.. Here I describe how to build a simple expression language to find an object in a simple 'JSON 'object only built with arrays, java.util.List, java.util.Map, String,
http://plindenbaum.blogspot.com/2009/11/building-simple-expression-language.html
A short post: I was asked to write a web server to allow people access their PDFs when they are away from the laboratory. People enter a Doi, a PMID or the URL of the page and the system tries to retrieve the PDF using a set of pre-defined patterns (e.g. the PDF of http://www.pnas.org/content/X/Y/Z is http://www.pnas.org/content/X/Y/Z.full.pdf ). This idea was suggested by Chris Miller on
http://plindenbaum.blogspot.com/2009/11/my-pdfs-anywhere.html
This post is about using Apache Velocity ( a Java-based template engine ) and the Jena RDF library. My aim was to use Velocity to handle the content of one or more RDF store without compiling, just by using a custom velocity template. This idea was much inspired by Egon Willighagen's posts where the RDF was handled with a scripting engine embedded in bioclipse. It also seems that I'm not the
http://plindenbaum.blogspot.com/2009/11/handling-rdf-statements-with-apache.html
The new version of BerkeleyDB 4.8 has been released. This new version of the key/value storage engine comes with a new utility called db_sqlFrom Oracle: Db_sql is a utility program that translates a schema description written in a SQL Data Definition Language dialect into C code that implements the schema using Berkeley DB. It is intended to provide a quick and easy means of getting started with
http://plindenbaum.blogspot.com/2009/09/dbsql-new-utility-for-berkeleydb-my.html
Deepak Singh and Andrew Su have both already posted on their blog about it: I'm proud to be the second author of a paper published in the "Database Issue" of Nucleic Acids Research.The Gene Wiki: community intelligence applied to human gene annotationJon W. Huss III, Pierre Lindenbaum, Michael Martone, Donabel Roberts, Angel Pizarro, Faramarz Valafar, John B. Hogenesch and Andrew I. Su Nucleic
http://plindenbaum.blogspot.com/2009/09/from-friendfeed-to-nucleic-acids.html
In my previous post I showed how to call mysql from the XALAN XSLT engine. In the current post, I'll show how a custom function for XALAN can return a new DOM/XML document that will be later used by the XSLT stylesheet: To get a source of data, i'm going to create a key-value database with berkeleyDB (Java Edition) storing strings (as the key) and XML document (as the value).The Database
http://plindenbaum.blogspot.com/2009/09/xalan-part-3-berkeleydbxsltpubmed.html
Image via wikipediaBowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie is open source http://bowtie.cbcb.umd.edu (Langmead B, & al. "Ultrafast and memory-efficient
http://plindenbaum.blogspot.com/2009/09/my-notes-about-bowtie.html
I've resurrected an old Treemap algorithm I described three years ago, and I've used it to plot the activity of the "Life Scientists Room" on FriendFeed.The source of the packing algorithm is available here, and the source of the tool scanning FriendFeed is available here.In the end, here is a clickable treemap of the top 'commentators' in the "Life Scientists Room" of FriendFeed:Two other maps:
http://plindenbaum.blogspot.com/2009/08/treemap-for-friendfeed.html
Jersey is the open source JAX-RS implementation for building RESTful Web services. JAX-RS uses java annotations to simplify the development and deployment of web service clients and endpoints. In this post I'll describe how I've implemented a naive RESTful web service for storing and querying a DNA database. This code was tested and deployed under netbeans 6.1.The service is defined in a class
http://plindenbaum.blogspot.com/2009/07/restful-web-service-storing-dna.html
This post follows my previous post Ajax/PHP/Mysql/Canvas Drawing a circular genome, my notebook. The problem here, is drawing a circular genomic map that might contain a huge number of data and using an asynchronous method to fetch and display the data. Here, the server returning some JSON data is the same as in the last post but I now use a Java Swing client to fetch and display the data. Here
http://plindenbaum.blogspot.com/2009/07/drawing-circular-genome-chapter-2-java.html
I've been asked to draw a circular map of the genome. Some tools already exist, for example circos, a Perl program.Jan Aerts is also writing pARP, a circular genome browser using Ruby and ruby-processing:My data are stored in big database and it might take some time before all the data are processed and displayed. So my idea was to call the server with some asynchronous ajax queries, retrieve the
http://plindenbaum.blogspot.com/2009/07/ajaxphpmysqlcanvas-drawing-circular.html
This post is my notebook for programming with the Spring Framework.(wikipedia)The Spring Framework is an open source application framework for the Java platform. Central to the Spring Framework is its Inversion of Control container, which provides a consistent means of configuring and managing Java objects using callbacks. The container is responsible for managing object lifecycles: creating
http://plindenbaum.blogspot.com/2009/07/springframeworkbeanfactory-my-notebook.html
A short post. I've just implemented a simple and small SVG renderer. It works fine with simple Documents. DocumentBuilderFactory domFactory= DocumentBuilderFactory.newInstance();domFactory.setCoalescing(true);domFactory.setExpandEntityReferences(true);domFactory.setIgnoringComments(true);domFactory.setNamespaceAware(true);domFactory.setValidating(false);domFactory.setNamespaceAware(true);
http://plindenbaum.blogspot.com/2009/07/simple-java-based-svg-renderer.html
Firefox 3.5 includes a new CSS property called -moz-transform. The -moz-transform CSS property lets you modify the coordinate space of the CSS visual formatting model. Using it, elements can be translated, rotated, scaled, and skewed as this text..I've used this new property to draw a 3D histogram:[0,0]12499%[0,1]9577%[0,2]8768%[0,3]7254%[0,4]60[0,5]5043%[1,0]13978%[1,1]13764%[1,2]10863%[1,3]81[
http://plindenbaum.blogspot.com/2009/07/3d-histograms-using-css-moz-transform.html
In this post I'll show how Apache Lucene can be grammatically used to index the content of a set of NCBI Genes entries and how to query and retrieve those data.(via wikipedia:)Apache Lucene is a free/open source information retrieval java library, It is supported by the Apache Software Foundation. While suitable for any application which requires full text indexing and searching capability,
http://plindenbaum.blogspot.com/2009/07/indexing-and-searching-ncbi-genes-with.html
In a recent short discussion on FriendFeed, Benjamin Good asked me what are the reporting tool I've used. On my potential list there was:Jasper ReportsApache PDFBoxthe recent Eclipse BRIT.But the only tool have used so far to produce a PDF document is a XSL-FO document converted with Apache FOP. XSL-FO is an XML vocabulary for specifying formatting semantics and FOP is a print formatter driven
http://plindenbaum.blogspot.com/2009/07/xsltxsl-fo-fop-pdf.html
A few notes:I've implemented a javascript library to parse RDF (I love re-inventing the wheel, it's always interesting to learn how softwares and algorithms are working ). A demo was uploaded here (XUL/Firefox). The RDF syntax is still not fully implemented (e.g. it don't support xml:lang, parseType=Literal, etc...).I've also created 3 XSLT stylesheets transforming RDF to ....N3:xsltproc rdf2n3.
http://plindenbaum.blogspot.com/2009/06/rdf-javascript-xsl-stylesheets.html
Just for fun. I've played with the compounds stored in NCBI/Pubchem and I've created a XSLT stylesheet transforming the pubchem/xml format into a SVG figure.The XSLT stylesheet is available here:http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/pubchem2svg.xsl This xml format was new to me, so feel free to tell me if I've missed something...Here are two examples: AspirinChol-SdC10The
http://plindenbaum.blogspot.com/2009/06/fun-with-svg-ncbipubchemxslt-svg.html
In this post I describe how I used XProc, the XML "pipeline language" to create a workflow of XML data calling the NCBI for some SNP and building a HTML table describing those markers.W3C:XProc: (the) XML Pipeline Language, (is) a language for describing operations to be performed on XML documents.An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents.
http://plindenbaum.blogspot.com/2009/05/xml-pipelines-xproc-for-bioinformatics.html
Here, I describe my experience with XFORMS:(W3C) XForms is an XML application that represents the next generation of forms for the Web. By splitting traditional XHTML forms into three parts—XForms model, instance data, and user interface, it separates presentation from content, allows reuse, gives strong typing—reducing the number of round-trips to the server, as well as offering device
http://plindenbaum.blogspot.com/2009/05/xforms-for-bioinformatics-my-notebook.html
In this post I describe how to deploy a WebService in the GlassFish web server and to to use it via the Taverna workflow engine.Server sideClassesThe JAX-WS API (the java API for Web Services) was used here. Our Web Service will be designed to find the position of the SNP from his namefind the SNPs in a given regionFirst of all, a simple POJO (Plain Old Java Object) for a SNP (name, chromosome,
http://plindenbaum.blogspot.com/2009/05/webservicesjaxws-for-snp-glassfish.html
This post is about LSID (The Life Science Identifier) and was inspired by the recent activity of Roderic Page on Twitter and by Roderic's paper "LSID Tester, a tool for testing Life Science Identifier resolution services".OK.At the beginning, there is a LSIDurn:lsid:ubio.org:namebank:11815ubio.org is the authority.It is followed by a database and an id.We need to resolve this authority to find
http://plindenbaum.blogspot.com/2009/04/resolving-lsid-my-notebook.html
This post is about Consequences, a tool finding the consequences of a set of mutations mapped on the human genome. It was motivated by a recent post of FriendFeed, Daniel MacArthur asked:“Given a list of human b36 coordinates for a list of genic SNPs (most not in dbSNP), what would be the quickest way to get a list of the genes they're found in and, if possible, the amino acid position they would
http://plindenbaum.blogspot.com/2009/04/consequences-snp-cdna-proteins-etc.html
BACKGROUND: The monogenic disease osteogenesis imperfecta (OI) is due to single mutations in either of the collagen genes ColA1 or ColA2, but within the same family a given mutation is accompanied by a wide range of disease severity. Although this phenotypic variability implies the existence of modifier gene variants, genome wide scanning of DNA from OI patients has not been reported. Promising genome wide marker-independent physical methods for identifying disease-related loci have lacked robustness for widespread applicability. Therefore we sought to improve these methods and demonstrate their performance to identify known and novel loci relevant to OI. RESULTS: We have improved methods for enriching regions of identity-by-descent (IBD) shared between related, afflicted individuals. The extent of enrichment exceeds 10- to 50-fold for some loci. The efficiency of the new process is shown by confirmation of the identification of the Col1A2 locus in osteogenesis imperfecta patients from Amish families. Moreover the analysis revealed additional candidate linkage loci that may harbour modifier genes for OI; a locus on chromosome 1q includes COX-2, a gene implicated in osteogenesis. CONCLUSION: Technology for physical enrichment of IBD loci is now robust and applicable for finding genes for monogenic diseases and genes for complex diseases. The data support the further investigation of genetic loci other than collagen gene loci to identify genes affecting the clinical expression of osteogenesis imperfecta. The discrimination of IBD mapping will be enhanced when the IBD enrichment procedure is coupled with deep resequencing.
http://www.ncbi.nlm.nih.gov/pubmed/19331686
OK, after Scifoo 2007 (http://plindenbaum.blogspot.com/2007/07/scifoo-07-anxiety-from-homebody.html). Here are my apprehensions for BioHackathon 2009, you know, I'm lost and anxious when I cannot see the Peripherique ;-)
http://plindenbaum.blogspot.com/2009/03/few-nightmares-before-biohackathon-2009.html
About one year ago, I wrote a lightweight java parser for RDF based on the Stream API for XML (Stax). It is far from being perfect as , for example, it does not handle the reified statements, xml:base, ... but it is small (24K) and works fine with most RDF files. Inspired by the XML SAX parsers, this RDF parser doesn't keep the statements in memory but calls a method "found" each time a triple is
http://plindenbaum.blogspot.com/2009/03/lightweight-java-parser-for-rdf.html
In this post, I present my (brute-force/quick n'dirty) solution to the recent 'String Challenge' submited by Thomas Mailund on his blog: http://www.mailund.dk/index.php/2009/03/02/a-string-algorithms-challenge/. Briefly, here is the problem: Given an input string X, an integer k and a frequency f, report all k-mers that occur with frequency higher than f. Expect the length of X to be from a few
http://plindenbaum.blogspot.com/2009/03/string-challenge-my-brute-force.html
In this post I describe how I mapped a list of genes involved in the Translational process on a Hilbert Curve .This post was mostly inspired by Gareth Palidwor's recent post titled "Mapping genomes to a Hilbert Curve" see http://www.palidwor.com/blog/?p=123 (and yes, Paulo is right: , a blog can be a source of inspiration)Wikipedia: A Hilbert curve is a continuous fractal space-filling curve
http://plindenbaum.blogspot.com/2009/02/genes-for-translation-mapped-on-hilbert.html
Au revoir Albert Barillé (1921-2009), et merci pour tout.Darwin et l'évolution partie 1envoyé par dimi1000
http://plindenbaum.blogspot.com/2009/02/au-revoir-albert-barille.html
This post is about a new extension for MediaWiki (the wiki engine of wikipedia written in PHP). This was the first extension I wrote: this extension add a new custom tag and it simply displays a DNA sequence. Here is a screenshot of this extension installed in my local mediawiki.and the source code for this extension is available here:http://code.google.com/p/lindenb/source/browse/trunk/
http://plindenbaum.blogspot.com/2009/01/extension-for-mediawiki-displaying-dna.html
This blog is about how I wrote a java plugin for the workflow engine KNIME (http://www.knime.org). This plugin reads a FASTA file containing one or more sequence and transforms it into a table containing to columns: one for the name of the sequence and the other for the sequence itself.In the last weeks , I've been looking for a workflow engine that could be easily handled by the members of my
http://plindenbaum.blogspot.com/2008/12/knimeorg-creating-new-source-node.html
In a previous post I described how I've implemented a genetic algorithm finding the best set of colored triangles to re-create an image. I've just changed the output of the program: it now saves the output as a dynamic SVG picture. Watch the creation of the picture here: scrolling="auto" frameborder="1"> [Your user agent does not support frames or is currently configured not to
http://plindenbaum.blogspot.com/2008/12/genetic-algorithm-with-darwins-face.html
This post is about the ONSolubility project (For references search FriendFeed for Solubility). This post is about how I've used Egon's code to create a web service to query the data of solubility. Egon has already done a great job by using the google java spreasheet API to download Jean-Claude's Solubility data. On his side, Rajarshi Guha wrote an HTML page querying those data using the Google
http://plindenbaum.blogspot.com/2008/11/web-service-for-onsolubility.html
This post covers my experience with the IntAct API at EBI. IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.This web service is invoked for searching binary interactions, it is described (but not documented...) as a WSDL file at
http://plindenbaum.blogspot.com/2008/10/ebiintact-web-service-api-my-notebook.html
In a previous post I described how I generated some wrappers in java to map the tables of the mysql database at the UCSC, and I wrote a tool to get the data about a set of snp (cytoband, genes, hapmap...). Today I was asked if I could transform this application into a GUI (the fear of the infamous command line.. again...)That was straightforward to embed my code into an interactive software. I
http://plindenbaum.blogspot.com/2008/10/what-is-in-list-of-snp-again-but-gui.html
Today was my first day as a bioinformatician at the Center for the Study of Human Polymorphisms (CEPH http://www.cephb.fr/en/cephdb) and I want to thank my former colleagues Christine K and Philippe Gesnouin (philguess on twitter/FF ) who helped me to find this position. It's a short term contract (one year).The CEPH is localized in Paris near the St-Louis Hospital and the "Place de la République
http://plindenbaum.blogspot.com/2008/09/im-not-looking-for-job-anymore-welcome.html

