Five Firmicutes encode scaffolding proteins and CDCs but no recognizable SLH Pifithrin-�� order domains, a key feature for the cell surface anchoring proteins.
The cellulosomes were observed to anchor on the cell surfaces in Clostridium cellulolyticum [22], Clostridium cellulovorans [42] and Ruminococcus flavefaciens [7]. But the detailed mechanisms remain to be known. The cellulosomes in Clostridium acetobutylicum and Clostridium josui may also be linked to the cell surfaces through some unknown mechanisms. Our analysis suggests that the domain of unknown function DUF291 (PF03442) might be involved in attaching these cellulosomes to the cell surfaces. We predicted the 3D structure of the first DUF291 domain in the scaffolding Q977Y4 of the Clostridium acetobutylicum glydrome, as shown in Figure 5. The first template (1EHX) does not show functional implication,
while the second one (1CS6) is involved in cell adhesion [43, 44]. The difference between the two predicted structures of the DUF291 domain is similar to each other with RMSD~2.7 A and TM score 0.6 using TM-align [45, 46]. Figure 5 Top two predicted structures of the first DUF291 (PF03442) domain of the scaffolding Q977Y4 of the Clostridium acetobutylicum glydrome, with templates 1ehxa and 1cs6a, respectively. We collected 41 proteins encoded in the same operons with the components of Clostridium acetobutylicum glydrome but not in our GASdb. 16 of these proteins cover the following functional categories: binding 2-hydroxyphytanoyl-CoA lyase (GO:0005488), catalytic activity (GO:0003824) and transporter activity (GO:0005215), and the remaining 25 are hypothetical or uncharacterized proteins. Only five proteins ABT-263 solubility dmso were annotated to be involved in the glycosyl hydrolysis, e.g. carbohydrate binding (GO:0030246) or hydrolase activity (GO:0016787). Three of the five proteins missed in our GASdb, i.e. Q97EZ1, Q97FI9 and Q97TI3, do not
have recognizable Pfam domains related to the glycosyl hydrolysis. Q97TP4 is annotated to be an esterase (family 4 CE). The cellulosome integrating protein Q97KK4 has only one Cohesin domain occupying ~77.35% (140/181) of its total length, and might have been inactivated by domain deletion. In general, the glycosyl hydrolases and the cellulosome components attack the biomass after they are secreted outside the cells and properly assembled [23, 47], and hence we would expect that they have certain signal peptides. However the majority of the annotated glycosyl hydrolases do not have any signal peptides, based on the predictions of SignalP 3.0 [13, 14]. We found that over 65% of WGHs across all organisms except for Eukaryota do not have predicted signal peptides suggesting the possibility of these proteins using a novel secretion mechanism. The ratio between the numbers of WGHs and FACs in a glydrome tends to be no more than 30. We calculated this ratio for each glydrome in a genome or metagenome with at least 1,000 proteins and at least one FAC and one WGH.