Eureka Genomics: FAQ

Frequently Asked Questions

Next-Gen Genotyping (NGG)

Q: What is the capacity of the assay?

A: The NGG is suitable for querying hundreds of loci for hundreds to thousands of samples. At this time, EG has validated the assay with 400 SNPs being run in parallel for over 1,500 samples. Maximum capacity is sequencing platform dependent. Please contact EG to discuss the details of your specific project to determine assay capacity based on sequencing platform.

Q: What sequencing platforms are compatible with the assay?

A: Eureka Genomics NGG is platform agnostic and is compatible with any Next Generation Sequencing platform including Illumina’s GAIIx, MiSeq, HiSeq and Life Technologies’ Ion Torrent PGM.

Q: How much DNA is required for the assay?

A: The DNA requirement for the assay depends upon the size of the reaction plate being used and the number of loci in the assay. For 100 loci in 96 well plates, 200 ng of DNA is required. For 100 loci in 384 well plates, 40 ng of DNA is required.

Q: Can you tell me what types of research questions are suitable for the assay?

A: The NGG enables broad profiling of single nucleotide polymorphisms (SNPs), copy number variations (CNVs), INDELs, Gene Expression and epigenetic events, such as methylation. The assay can be used to define presence or absence, identify contamination or determine percent present in a sample. Because of the unique coding of samples, mixed questions can be examined in a single library.

Q: How does the assay work?

A: A probe triplet is used to interrogate each locus of interest. The probe triplet contains a unique interrogation portion, a common PCR primer and an allele barcode. Many triplets can be mixed in a probe blend. Following sample DNA and probe blend mixing and hybridization, a ligase is added to join any adjacent left and right probes. PCR amplification proceeds from sample ID barcode-containing primers that are unique to each sample, but contain the common PCR primer site. A sample ID barcode can be added to both sides to obtain a multiplicative number of sample ID barcode combinations from a limited set of left and right PCR primers. The PCR primers also contain sequences required for HTS data generation. Each usable sequence read contains a SNP barcode, a locus barcode and a sample ID barcode. Based on the barcodes the sequence data is placed into bins for sample, allele and locus. The number of reads in each bin is then analyzed to determine the probability of each genotype for a given sample and a given locus. Depending on the confidence levels of the genotype call established by the statistical model used, a genotype assignment is made.

Q: What is the success rate for probe development?

A: For the first iteration of an assay design, there is typically an 80-85% success rate for probes. Subsequent iterations can improve probe compatibility through advanced design processes to achieve a greater number of targets for screening.

Q: What types of data output will I get from the assay?

A: For every project, data can be reported back with a variety of information including sample, loci, allele A and B reads, probability of each genotype and the EG genotype call (AA, AB, BB or no call).

Q: Is the assay available as a product or a service?

A: The NGG is available as BOTH a product and a service. We have standard panels that are available for purchase for a variety of applications in animal health, plant science and clinical applications. When purchased as a product, the assay can be run in your lab on your NGS platform or outsourced to a commercial partner of your choosing. If you are going to be outsourcing the sequencing, we would encourage you to have EG run the assay for you. You can either elect to receive the full service (you send EG extracted DNA and EG does the rest) or you can purchase the kit, prepare your samples and have EG perform the sequencing and send you back results.

Q: If I run the assay in my lab, what do I need to do prior to sequencing?

A: If you purchase the NGG from us as a commercial product, we will send you a kit that contains the necessary index plate(s) for your NGS platform along with the probe blend for your specific panel. Universal reagents, such as polymerase and ligase, may be ordered through your regular provider. Once the reagents arrive, the steps of the assay are as follows:

  • Add the Eureka Genomics probe blend to the DNA from your samples in a 96 (or 384) well plate and allow hybridization to occur.
  • Following hybridization, add Taq DNA ligase and ligate the mixtures.
  • Move a portion of the ligation reaction to a new PCR plate and add our indices to your DNA samples in a 96 or 384 well plate
  • Add your preferred Taq polymerase and perform PCR
  • Consolidate the PCR plate into a collection plate and run through the PCR clean-up kit of your choice
  • QC mix and send for sequencing

Q: What is the anticipated timeline for developing an NGG panel, testing and delivery of results?

A: For a custom design project, the anticipated timeline to completion is approximately 6 to 8 weeks. For testing samples against a standard panel already offered by Eureka Genomics, the expected timeline for delivery of results is approximately 2 weeks.

Q: Is the assay design fixed or flexible? In other words, can I easily add or remove targets from a design?

A: For custom NGG panel design, EG can easily add or remove probes based on test results. While there would be an additional charge for optimization following probe changes, this charge is negligible compared to the initial validation (unless a significant number of probes are added or replaced requiring, essentially, an entirely new design). If you have specific questions regarding assay flexibility, please contact EG for more information.

Q: What types of samples is the assay compatible with?

A: The assay is compatible with virtually any sample type from animals, plants or humans as long as the minimum amount of DNA required for the assay can be isolated.

Sequencing Technology

Q: Can you give me an overview of the Illumina sequencing technology?

A: Illumina sequencing technology relies on clusters of identical molecules that are attached to the surface of a flow cell. The clusters are generated by the clonal expansion (in an Illumina cluster station or cBot) of individual ssDNA molecules hybridized to the flow cell. Illumina uses the sequencing-by-synthesis technology, in which fluorophores-labeled bases are added and read sequentially. The reading is done by microscopically imaging the surface of the flow cell after the addition of each base.

Q: What is the geometry of an Illumina flow cell?

A: Flow cells for the Illumina machines consist of 8 individual lanes, and each lane consists of 120 tiles that are imaged individually after the addition of each base. Normally we obtain 250,000-300,000 raw clusters per tile on average, of which 70-90% will pass filter. The filter will pass clusters of a given size and color intensity, and that are not too close to other clusters.

Q: What is a tile?

A: A tile is a square portion of a flow cell lane that is imaged as a single image during sequence data generation. There is no physical barrier between neighboring tiles.

Q: How much data is generated in a sequencing run?

A: Illumina machines generate 8 lanes of sequence data per flow cell. At Eureka Genomics we generate around 25 million reads per lane, sometimes more, sometimes less. The amount of data generated depends on the type of run: for example, 36 cycle single end reads can generate 25 million reads x 36 bases = 900 MB of data (or more; exact amount depending on the number of reads obtained). Longer reads will generate more data.

Q: How many reads should I expect from an indexed sample?

A: Theoretically, the number of reads per sample will be 1/n of the number of reads generated per lane, where n is the number of barcodes, if the n samples are in an equimolar mix. Factors that influence the number of reads in practice are: randomness of the library capture onto the flow cell, quantification and pipetting variability, as well as the fact that the efficiency of cluster generation is dependent on the sample. Because of all those variables, it is possible that some of the samples in an indexed pool will yield extremely low numbers of reads.

Q: What % adaptors can I expect?

A: The percent adaptors will depend on library; with Eureka Genomics made DNA-based libraries normal amount of adaptor contamination is less than 0.1%.

Q: How many samples can be run at once on a flow cell?

A: Several samples can be run at once on each lane of a flow cell, if they are labeled with unique sequence tags (barcodes). If using kits with 12 unique barcodes, up to 96 samples can be sequenced at once on a flow cell. However, the indexes can be custom designed and there is no upper limit to the number of indexes that can be used.

Q: Can you use all 8 lanes on a flow cell for generating sequencing reads?

A: Normally one lane on the flow cell is reserved for a sequencing control. However, if the customer is confident on their library and wish to generate useful data on a full flow cells by not running a sequencing control, it can be arranged. Please specify when requesting a quote.

Q: How much sequencing data do I need? What kind of reads do I need?

A: The amount of data and the type of reads depend on the project. A general guideline for de novo assembly is to generate enough data to obtain an average coverage of 30X. For example, most draft de novo bacterial sequencing projects can be completed with 300 Mb (60x coverage) of data, if the bacterial genome is smaller than 5 Mbases. The amount of data needed is also dependent on the type of data. For example, for the same amount of data the assembly will consist of longer contigs if the data is generated as paired-end reads, and the assembly will have less errors/base but shorter contigs if the data is generated as single end reads. Contact us for details.

Q: What if I only need a little bit of data? Can you accommodate this?

A: We can accommodate such needs, with limitations. Next Generation Sequencing (NGS, also known as High Throughput Sequencing or HTS) platforms produce large amounts of data, and it is impractical to use a NGS platform for generating low amounts of data. In general, Eureka Genomics can accommodate requests for generating amounts of data down to 100 Mb per sample or lower, but that will depend on the type of run, the number of samples and other factors. Contact us for details.

Samples

Q: Can you process low quantity DNA samples?

A: Yes, Eureka Genomics can prepare libraries and generate sequencing data from samples that have as little as 1 ng of purified DNA without a whole-genome amplification step; this helps reduce amplification bias before library prep. Eureka Genomics cannot guarantee results with low sample quantity.

Q: Can you process low quantity RNA samples?

A: The requirements for RNA samples vary depending on the sample (enriched or not, intact or fragmented, etc). Contact us for details.

Q: What does Eureka Genomics recommend for purifying RNA species shorter than 70 bases?

A: Isopropanol precipitation with glycogen added and freezing to -70.C is likely the best way to capture 25bp and greater RNA fragments. The glycogen also makes the pellet very visible.

Q: Do you sequence ChIP-Seq and MeDIP-Seq?

A: Eureka Genomics can prepare sequencing libraries and generate sequencing data from ChIP-enriched and MeDIP-enriched DNA samples supplied by client. Eureka Genomics does not provide ChIP or MeDIP DNA enrichment at this time. Contact us for details.

Q: Can you prepare strand-specific RNA-Seq libraries?

A: Eureka Genomics is currently developing strand-specific RNA-Seq. Contact us for details.

Q: Should I submit ready-made libraries or use the Eureka Genomics library prep service?

A: We prefer to work with in-house generated libraries, to have better control of the samples. However, libraries prepared by customers with experience in library prep can perform as well as Eureka Genomics prepared libraries. Factors influencing this decision include cost, availability of library prep facilities, prior experience, required TAT, etc. Our experts can work with you and help you decide what choice is best. Contact us for details.

Q: Why do the library preparation prices vary so much?

A: Library prep prices vary based on the amount and type of sample, and some libraries are more complicated to generate and have higher fail rates than others.

Q: Is the quality of my sample adequate?

A: Quality requirements for samples are on our website. However, if you can't find quality requirements for your sample type, if you analyzed your sample by other methods or if you have doubts, Contact us.

Q: What should I do if my sample does not meet the Eureka Genomics quality requirements?

A: Eureka Genomics experts can recommend methods for sample purification, or the sample can be further purified at Eureka Genomics at extra cost. If the sample quality cannot be improved, we will attempt to generate libraries and sequence data from low quality samples at the customer's express request. However, Eureka Genomics cannot guarantee sequencing results with poor quality samples.

Q: What does normal library QC look like?

A: A good library when analyzed on a Bioanalyzer or by a similar method will show one narrow DNA peak centered at ~340 bp. Libraries with different profiles, like multiple peaks, one main peak significantly shorter or longer than 320 bp, or broad peaks, are problematic. Eureka Genomics will generate sequencing libraries from good quality libraries, but at the express request of the client we will attempt to generate sequence data from low quality libraries. Eureka Genomics does not guarantee sequencing results from poor quality libraries. Contact us for details.

Q: Do you accept samples from overseas?

A: Yes. Samples must be shipped and received frozen. Sequencing projects for overseas customers must be prepaid in full before initiating sequence data generation. Contact us for details.

Q: Are you interested in collaborations on grants or research projects?

A: Yes. Eureka Genomics would be happy to participate on your project. Eureka Genomics is also interested in publishing results from such projects as co-authors.

Q: What is the turnaround time (TAT) for a sequencing project?

A: The design of sequencing projects varies widely, and influences TAT. Library prep construction at Eureka Genomics will add to the TAT, but tends to produce better overall results than customer-prepared libraries. Common reads have faster TAT than less common reads (please refer to the table of common reads). Providing enough samples to fill up a whole flow cell generally reduces TAT (as there is no need to wait for other samples to start the run); at the same time, a large number of samples (more than one flow cell) can have a longer TAT than a few samples. Generally, the TAT is 4-6 weeks for most projects, but it might be shorter or longer. Contact us for details.

Q: Do you offer expedite services?

A: Yes. Depending on the project details we can sometimes obtain a TAT of 3 days. Special pricing applies for expedite sequencing. Contact us for details.

Q: What is the difference between single end and paired end reads?

A: Single end (SE) reads are generated from one end of library fragments; paired end (PE) reads consists of pairs of reads, where each pair corresponds to the opposite ends of library fragments. The advantage of PE sequence data is that the two reads are likely to be linked in the biological sample, and the approximate distance between the reads is known; PE reads are very useful for de novo genome assemblies. Also see: SIPES and LIPES.

Q: What is the difference between SIPES and LIPES?

A: SIPES (Short Insert Paired-End Sequencing) and LIPES (Long Insert Paired End Sequencing) differ in the DNA insert length: SIPES generates reads separated by 150-400 bp, while LIPES generates reads separated by 2-10 kb. Also, SIPES predominantly generates reads that point inward (towards each other), while LIPES predominantly generates reads that point outward (away from each other). The SIPES and LIPES libraries are made through different processes. LIPES libraries are more complicated to make. The insert length in both cases can be specified by customer (within limits). Both SIPES and LIPES are useful in de novo assembly projects.

Q: How are Nextera vs Illumina libraries different?

A:The Nextera protocol results in simultaneous DNA fragmentation and adaptor ligation, while the Illumina protocol requires the sequential DNA fragmentation, end repair, A-tagging and adaptor ligation. Due to the multiple steps and requirement for DNA purification between the steps, the Illumina protocol requires longer total time, longer hands-on time and a higher DNA input because of DNA loss at each step. However, the mechanical DNA fragmentation methods used with the Illumina protocol have less sequence bias than the enzymatic DNA fragmentation method used by the Nextera protocol. Additionally, libraries prepared with the Nextera protocol require special sequencing primers for sequence data generation, and it is imperative that the library prep protocol used be communicated to Eureka Genomics prior to sequence data generation. A Nextera library will produce no data if Illumina primers are used.

Q: I made my libraries with NuGen - can you make sequence data?

A: Yes. Whenever supplying ready-made libraries, you are required to specify the library prep method used and the primer sequences at the time of sample submission.

Q: I made my libraries with Nextera - can you make sequence data?

A: Yes. Whenever supplying ready-made libraries, you are required to specify the library prep method used and the primer sequences at the time of sample submission.

Q: The sequence data from my LIPES seems wrong - what is going on?

A: LIPES sequence data generation can fail in many ways, due to multiple reasons. One common problem with LIPES is that chimeric reads are generated (continuous reads that map to two different regions of the reference genome). This is an inherent problem that arises from library generation, and while it can be minimized it cannot be avoided. The proportion of chimeric reads is dependent on read length and library insert size. Please contact us if you experience other problems with LIPES data.

Bioinformatics

Q: Do you offer library prep or bioinformatics services only, or do I have to use the sequencing services you offer?

A: Eureka Genomics offers any combination of services the customer needs. Eureka Genomics can extract DNA and/or RNA, prepare libraries and send them back to the customer, provide bioinformatics services on data provided by the customer, sequence customer-prepared libraries etc. Contact us for details.

Q: I like that Eureka Genomics lists the top 50 most common sequences, but I need the top 500; can EG do this?

A: The top 50 most frequent sequences is the default setup, but the number can be expanded by request. Please specify the number of the most common reads desired.

Q: I need a different moving window size. Is this doable?

A: The size of the moving window can be specified by client. The default value is 500 bp.

Q: I suspect that the mutation that I am interested in is caused by a three base deletion; can EG help me find it?

A: By comparison of the generated sequence data with the reference sequence, Eureka Genomics can identify SNPs, short indels of 1 bp and large deletions, as well as coverage gaps; three bases deletions will be not identified by this analysis. However, custom bioinformatics analysis can identify complex polymorphisms such as three base deletions, if requested. Contact us for details.

Q: In bacteria, either a mutation is there or it is not; why is the SNP detection threshold set at 30%?

A: Bacterial samples are thought of as clonal, but often they are not. SNPs can arise in a bacterial population either in a patient sample or during culture, and such SNPs can be present at any abundance level. Contact us if the SNPs you are looking for are expected to be present at lower than 30% abundance.

Q: I need my data in a special format - can you do this?

A: We can provide sequencing data in several formats. Contact us for details.