Agos is an alignment-free approach to detect Argonaute-binding WG/GW domains in protein sequence. The identification method is based on the amino-acid composition specificity of the domain. The initial sequence dataset of experimentally-confirmed AGO-binding proteins was used to introduce two-parameter scoring system that help to discriminate genuine GW repeat proteins.
Dos scoring table contains values for each dipeptide (400) and reflects compositional differences between the domain and plant and animal proteomes and the score is used for the detection of domain boundaries in given protein.
Ics score (internal composition score) allows accurate representation of amino acid composition within WG/GW domains. The introduction of both scoring systems provides a measurement of the degree of compositional compatibility of the new domains with the already-known WG/GW domains.
The procedure of domain boundaries identification was previously explained (Karlowski et al., 2010). Briefly, the algorithm uses as a starting point each WG/GW motif location. By progressing in both directions, it calculates the cumulative score for each position using values from the dos scoring matrix. The domain extension is terminated when the calculated linear progression score for the current position drops from its last maximum below the value given by the dec threshold. Finally, overlapping domains are joined and both dos and ics score values for the assembled domain are calculated.
Paste your DNA or protein sequence in fasta format in the text box. Sequence submission is limited to one sequence only. The DNA sequence is translated in all six reading frames before applying the domain detection algorithm. The query protein sequence must contain at least one WG or GW motif to be further processed. The user will also be asked to provide a valid, non-commercial email address before submitting a job to the server. Once the analysis has been started, the data will be posted to a WG/GW protein identification pipeline to screen for all regions containing WG/GW domains.
Once the analysis is complete, the results will be directly displayed in the user’s web browser.
Output is separated into three categories: 'Data info', 'WG/GW domains' and 'Sequence'.
(1)The 'Data info' field provides basic information about the query sequence (id, description, length and number of WG/GW motifs), as well as a graphical view of the protein with marked positions for all detected potential WG/GW regions.
The quality of GW domain predictions are color coded: green block indicates domains that passed statistical threshold values (dos and ics) and may be considered as putative AGO-binding sites; yellow-colored blocks label sequence regions that passed only dos score threshold; red block corresponds to regions having very low compositional compatibility to the GW AGO-binding domain.
(2) The 'WG/GW domains' field provides more detailed, textual information. It is shown in the form of a table containing all of the domains identified and sorted according to the start position in the protein. For each domain, the index number, the start and stop positions in the query sequence, the length of the domain, the number of WG/GW motifs, dos score, p-value and ics score are shown.
(3) The 'Sequence' section displays the full-length query sequence.
Moving the cursor over a match block in the graphical protein view will highlight its position in the full-length protein sequence as well as the corresponding row in the 'WG/GW domains' table. Highlighted regions are preserved as long as the user doesn't move the cursor over another block.
The plain text output provides no links, but is optimal for copy/pasting and corresponds to the 'WG/GW domains' field where values are tab-separated.