{"id":722,"date":"2015-01-13T20:00:58","date_gmt":"2015-01-13T20:00:58","guid":{"rendered":"http:\/\/stg-blogs.bmj.com\/jmg\/?p=722"},"modified":"2015-01-13T20:00:58","modified_gmt":"2015-01-13T20:00:58","slug":"seqhbase-a-big-data-toolset-for-family-based-sequencing-data-analysis","status":"publish","type":"post","link":"https:\/\/stg-blogs.bmj.com\/jmg\/2015\/01\/13\/seqhbase-a-big-data-toolset-for-family-based-sequencing-data-analysis\/","title":{"rendered":"SeqHBase: a big data toolset for family-based sequencing data analysis"},"content":{"rendered":"<p>High-throughput sequencing technologies are now increasingly used to find disease genes, but it is difficult to infer biological insights from massive amounts of data in a short period of time. We developed a software framework called SeqHBase to help quickly identify disease genes. SeqHBase was developed based on Apache <a href=\"http:\/\/hadoop.apache.org\/\">Hadoop<\/a> and <a href=\"http:\/\/hbase.apache.org\/\">HBase<\/a> infrastructure, which works through distributed and parallel manner over multiple data nodes. Its input includes coverage information of 3 billion sites, over 3 million variants and their associated functional annotations for each genome. With 20 data nodes, SeqHBase took about 5 seconds for analyzing whole-exome sequencing data for a family quartet and approximately 1 minute for analyzing whole-genome sequencing data for a 10-member family. We demonstrated SeqHBase\u2019s high efficiency and scalability with several real sequencing data sets. (By Min He, Ph.D., <a href=\"http:\/\/jmg.bmj.com\/content\/early\/2015\/01\/13\/jmedgenet-2014-102907\">http:\/\/jmg.bmj.com\/content\/early\/2015\/01\/13\/jmedgenet-2014-102907<\/a> )<!--TrendMD v2.4.8--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>High-throughput sequencing technologies are now increasingly used to find disease genes, but it is difficult to infer biological insights from massive amounts of data in a short period of time. We developed a software framework called SeqHBase to help quickly identify disease genes. SeqHBase was developed based on Apache Hadoop and HBase infrastructure, which works [&#8230;]<\/p>\n<p><a class=\"btn btn-secondary understrap-read-more-link\" href=\"https:\/\/stg-blogs.bmj.com\/jmg\/2015\/01\/13\/seqhbase-a-big-data-toolset-for-family-based-sequencing-data-analysis\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-722","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/posts\/722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/comments?post=722"}],"version-history":[{"count":0,"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/posts\/722\/revisions"}],"wp:attachment":[{"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/media?parent=722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/categories?post=722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/stg-blogs.bmj.com\/jmg\/wp-json\/wp\/v2\/tags?post=722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}