Genomic DNA extraction and sequencing
Aw and P10 seeds were germinated on half-strength Murashige and Skoog (MS) medium with agar and filter paper moistened with water respectively. The seedlings were grown in a soil/sand mixture under greenhouse conditions for 5 weeks, with the final 48 h in darkness. From each plant, multiple leaf samples exceeding 1.5 g were harvested and immediately flash-frozen in liquid nitrogen. One leaf sample per plant was ground into a fine powder with mortar and pestle while still frozen, whereas a corresponding sample from the same plant was preserved at −80 °C.
High molecular weight DNA was extracted following the workflow for HMW DNA extraction for third-generation sequencing38 using the Genomics-Tip 100 G kit (Qiagen), with the following modifications: For pearl millet, we used 1.5 to 2 g of ground tissue exceeding the standard recommendation of 1 g per four 100/G columns. Despite the protocol’s specific warnings against shaking the 50 mL tubes during the lysis step, we found it necessary to occasionally disrupt tissue clots formation by shaking gently. The elution step extended to up to 4 h instead of the 1 h duration specified in the protocol.
Quality control of HMW DNA was performed using a FEMTO Pulse (Agilent) with the gDNA 165 kb kit (FP-1002-0275) and a separation time of 70 min. Both Aw and P10 samples contained a high amount of genomic DNA with a length above the cut-off of 50 kb. Quantification was conducted with a Qubit assay using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific) yielding 288 ng/µL for Aw and 922 ng/µL for P10.
The HMW DNA was sheared to 20 kb and processed for PacBio HiFi sequencing by the KAUST Bioscience Core Lab. Samples were sequenced on a PacBio Sequel II using three SMRT cells each. The total throughput exceeded 30 Gb per SMRT cell for P10 and over 35 Gb per SMRT cell for Aw. The median read length was 15 kb per cell for P10 and 16 kb per cell for Aw.
Omni-CTM library construction and sequencing
The Omni-CTM library was prepared using the Dovetail® Omni-CTM Kit for plant tissues according to the manufacturer’s protocol. Chromatin was fixed with disuccinimidyl glutarate (DSG) and formaldehyde in the nucleus from dark-treated young leaves. The cross-linked chromatin was digested in situ with DNase I. After digestion, the cells were lysed with SDS to extract chromatin fragments, which were then bound to Chromatin Capture Beads. The chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, the crosslinks were reversed, the associated proteins were degraded, and the DNA was purified. The purified DNA was then converted into a sequencing library using Illumina-compatible adapters. Biotin-containing fragments were isolated with streptavidin beads prior to PCR amplification. The two libraries were sequenced on an Illumina MiSeq platform to generate >214 and >269 millions 2 × 150 bp read pairs for Aw and P10, respectively.
Genome assembly
PacBio HiFi reads were assembled using hifiasm39 (v17.6) with default parameters (https://github.com/chhylp123/hifiasm/) to generate primary contig assemblies. Subsequently, construction of the pseudomolecules was performed by integration of Omni-CTM read data using Juicer40 (v2; https://github.com/aidenlab/juicer) and the 3D-DNA pipeline41 (https://github.com/aidenlab/3d-dna). First, to generate the Hi-C contact maps for P10 and Aw genomes, Omni-CTM Illumina short reads were processed with juicer.sh (parameter: -s none –assembly). The resulting output file “merged_nodups.txt” and the primary assembly were then used to produce an assembly with 3D-DNA3 (using run-asm-pipeline.sh with the -r 0 parameter). Juicebox42 (v2.14.00) was employed to visualize the Hi-C contact matrix alongside the assembly and to manually curate the assembly. The orientation and order of each pseudomolecule were defined by dot-plot comparison using chromeister43 (https://github.com/estebanpw/chromeister) against the pearl millet genotype Tift 23D2B1-P1-P56. All the remaining contigs not anchored to the pseudomolecules were concatenated into “unanchored chromosomes”. The final Hi-C contact maps and assemblies were saved using run-asm-pipeline-post-review.sh from the 3D-DNA pipeline.
RNA extraction and sequencing
Seeds of Aw and P10 descended from the sequenced individuals, were germinated as described above. The seedlings were transferred to 50 mL hydroponics tubes and grown in Hoagland solution modified in the following ways: no modifications, low phosphate (1% of normal P), low phosphate with MP3 (1.0 μM), and low phosphate with only acetone mock treatment. Treatment with MP3 and acetone mock were done only for the 6 h before harvesting. The roots of the +P and lowP plants as well as 3 day old seedlings and a flowering inflorescence were sampled for Iso-Seq.
Roots and shoot stubs were collected separately from each hydroponic growth treatment for both Aw and P10, with samples pooled from three plants at each collection and four such biological replicates obtained.
All samples were flash-frozen in liquid nitrogen and ground to a fine powder in a sterilized mortar and pestle. Then 100 mg of the samples was separated for RNA extraction. RNA extraction was performed using an RSC 48 RNA extraction robot (Maxwell) and the Maxwell RSC Plant RNA kit (Promega). Although the RNA yield for some of the 64 samples was low, ranging from 36 ng/µL to 260 ng/µL as measured by NanoDrop, the purity was consistently high. The RNA integrity number (RIN) scores ranged from 8 to 10 for over 92% of the samples with an average RIN of 9.0.
The extracted RNA was submitted to the KAUST Bioscience Core Lab for Iso-Seq sequencing. Samples from five different tissues for both Aw and P10 were tagged and multiplexed onto a SMRT cell each and sequencing on the PacBio Sequel II platform.
Samples for RNA-Seq were sent to Novogene (Singapore) for mRNA library preparation and 150 bp pair-end sequencing on Illumina’s NovaSeq 6000 platform, targeting a throughput of 12 Gb of data per sample. The returned data were consistently high quality, with the percentage of reads scoring a Phred value over 30 (indicating a base error below 0.1%) exceeded 90% for each sample.
Transposable element identification and quantification
Transposable elements (TE) were identified searching the genome assemblies with the Extensive denovo TE Annotator pipeline EDTA44 (version 2.0) run under default settings. Due to the high incidence of false positives in the prediction of helitrons, representatives of this class of TEs were removed from the final EDTA output. The TE library was then employed to mask the two genome assemblies and quantify the TE content using the tool RepeatMasker (http://www.repeatmasker.org/) ran under the default parameters (with the exception of the -qq option).
Genome annotation
We annotated the two genomes using the MAKER pipeline45 v3.01.03. For a detailed breakdown of the genomes annotation, please refer to the supplementary materials (Supplementary Fig. 21) or the project’s GitHub page (https://github.com/mjfi2sb3/millet-genome-annotation). First, we prepared the necessary transcriptomic and homology data to inform and support the prediction in the MAKER workflow. We began by preprocessing the Iso-Seq data following PacBio’s recommended workflow using SMRT tools v11.0 (https://www.pacb.com/support/software-downloads/). The product of this step was a set of high-quality full-length isoforms for each submitted sample. Details regarding the preprocessing of RNA-seq data can be found in another section of this manuscript. For homology evidence, we incorporated the manually curated UniProt Swiss-Prot database46 (downloaded in Nov 2022), along with published protein annotation for Cenchrus americanus21 and Cenchrus purpureus47 (elephant grass). Additionally, we included NCBI annotations for Setaria viridis (green millet), Setaria italica (foxtail millet) and Sorghum bicolor (sorghum).
Subsequently, we processed the repeat-masked genome assemblies through the MAKER pipeline. The workflow calls an array of tools, including NCBI BLAST tools48 v2.2.28 + , Exonerate49 v2.2.0, Augustus50 v3.2.3 and tRNAscan-SE51 v2.0. We ran MAKER’s workflow primarily with the default values except for alt_splice=1 and always_complete=1. Iso-Seq and RNA-seq transcripts were aligned to their respective assemblies using NCBI blastn, and these alignments were subsequently refined using Exonerate.
The protein evidence was aligned and refined using NCBI blastp and Exonerate, respectively. Subsequently gene structure prediction was performed using Augustus with the species parameter set to Zea mays. EST and protein hints were created using alignments obtained in the previous step. MAKER was then used to assess the predicted genes, correct some of the predictions, add isoform information and calculate quality scores (AED scores).
In the final step, we divided the predicted gene models into High Confidence (HC) and Low Confidence (LC) categories using four strategies: 1) based on EST evidence (MAKER’s quality index scores as well as alignment-based filtering); 2) annotating with the KEGG database52; 3) annotating using InterProScan53 v5; and 4) annotating against the UniProt Swiss-Prot database.
RNA-Seq data mapping onto P10 and Aw genomes
To map the RNA-Seq reads from each experiment using Spliced Transcripts Alignment to a Reference (STAR) software54, we first created an index of each of the P10 and Aw genome assemblies. For each RNA-Seq sample, the paired-end fastq data were then mapped on to the corresponding genome assembly using STAR with the option “–outSAMstrandField intronMotif” option. Subsequently, we assembled the transcripts for each RNA-Seq sample with StringTie55 using the BAM files generated in the previous alignment step. For each genome, we merged all transcripts from individual experiments using the StringTie merge option to produce a non-redundant set of transcripts.
Determining differential expression
In all two-way comparisons we used the R package edgeR56 for differential expression analysis, with the default settings. We first filtered out genes having more than two replicates out of the total eight with a count per million (cpm) <=0.5. We performed the differential expression analysis using Fisher’s exact test and the p-values were adjusted for multiple testing using the Benjamini-Hochberg method.
Gene identification and phylogeny
Homologs of known strigolactone biosynthetic pathway enzymes were identified through tblastn searches on the Aw and P10 genome assemblies using Persephone (persephonesoft.com). Phylogenetic trees of the protein families were constructed in Geneious v2023.2.1 (Biomatters) using muscle v5.1 based on the PPP alignment algorithm. The consensus trees were constructed using the neighbor-joining method, relying on the Jukes-Cantor model and was supported by 1000 bootstrap replicates.
Screening diverse pearl millet accessions collection
We acquired a panel of 10 sequenced pearl millet accessions23 from the U.S. National Plant Germplasm System. We screened the genomes of these accessions for the presence of the CLAMT region genes and the four flanking genes using tblastn, noting both presence and sequence similarity at the protein level (Supplementary Table 5). Seedlings of the panel were genotyped through PCR using the Phire Plant Direct kit (Thermo Fisher Scientific) directly on leaf extracts. For two accessions, PI527388 and PI186338, we detected the presence of the CLAMT fragment through genotyping, despite its absence in their genome assemblies. Consequently, we excluded these two lines from subsequent experiments.
Strigolactone collection, extraction, and measurements
Pearl millet seedlings were cultivated under controlled conditions with a day/night temperature of 28/22 °C. The seeds were surface-sterilized in a 50% sodium hypochlorite solution for 10 min and rinsed with sterile water. They were then placed in magenta boxes containing half-strength MS medium and allowed to germinate in darkness for 24 h. Following this period, they were incubated in a Percival chamber for 4 days. The germinated seedlings were then transferred into the soil for phenotyping, to sand for SL detection, or to a hydroponic system for Striga bioassays.
Analysis of SLs in root exudates was conducted using a previously published protocol57. In summary, 1 L of root exudates, spiked with 20 ng GR24, was collected and applied to a C18-Fast Reversed-Phase SPE column (500 mg/3 mL; GracePure™) pre-conditioned with 3 mL of methanol and 3 mL of water. The column was then washed with 3 mL of water, and SLs were eluted with 5 mL of acetone. The SL fraction was concentrated to approximately 1 mL in an aqueous SL solution and subsequently extracted using 1 mL of ethyl acetate. Then, 750 µL of the SL-enriched organic phase was dried under a vacuum. The residue was reconstituted in 100 μL of acetonitrile:water (25:75, v/v) and filtered through a 0.22 μm filter for LC-MS/MS analysis.
For SL extraction from N. benthamiana leaf, samples were ground to a powder in liquid nitrogen using a mortar and pestle. About 300 mg of powder was weighed out and transferred to an 8 mL brown glass vial to which cold 2 mL ethyl acetate were added. After vortexing, sonication and centrifugation at 3300 g, the supernatant was transferred into an 8 mL glass vial. The pellet was extracted once more and the supernatants combined and dried in a SpeedVac. After drying, the residue was dissolved in 50 μL of ethyl acetate and 2 mL hexane. Further purification was performed using the Silica gel SPE column (500 mg/3 mL) preconditioned with 3 mL of ethyl acetate and 3 mL of hexane. After washing with 3 mL hexane, SLs were eluted in 3 mL ethyl acetate and evaporated to dryness under vacuum.
SL identification was performed using a UHPLC-Orbitrap ID-X Tribrid Mass Spectrometer (Thermo Fisher Scientific) equipped with a heated electrospray ionization source. Chromatographic separation was achieved using Hypersil GOLD C18 Selectivity HPLC Columns (150 × 4.6 mm; 3 μm; Thermo Fisher Scientific). The mobile phase comprised water (A) and acetonitrile (B), each containing 0.1% formic acid. A linear gradient was applied as follows (flow rate, 0.5 mL/min): 0–15 min, 25–100% B, followed by washing with 100% B, and a 3-min equilibration with 25% B. The injection volume was 10 μL, and the column temperature was consistently maintained at 35 °C. The MS conditions included: positive mode; spray voltage of 3500 V; sheath gas flow rate of 60 arbitrary units; auxiliary gas flow rate of 15 arbitrary units; sweep gas flow rate of 2 arbitrary units; ion transfer tube temperature of 350 °C; vaporizer temperature of 400 °C; S-lens RF level of 60; resolution of 120000 for MS; stepped HCD collision energies of 10, 20, 30, 40, and 50%; and a resolution of 30000 for MS/MS. The mass accuracy of identified compounds, with a mass tolerance of ± 5 ppm, is presented in Supplementary Table 6. All data were acquired using Xcalibur software version 4.1 (Thermo Fisher Scientific).
SLs were quantified using LC-MS/MS with a UHPLC-Triple-Stage Quadrupole Mass Spectrometer (Thermo Fisher Scientific AltisTM). Chromatographic separation was achieved on a Hypersil GOLD C18 Selectivity HPLC Column (150 mm × 4.6 mm; 3 μm; Thermo Fisher Scientific), utilizing a mobile phase comprising water (A) and acetonitrile (B), each with 0.1% formic acid. The linear gradient was as follows (flow rate, 0.5 mL/min): 0–15 min, 25–100% B, followed by washing with 100% B, and a 3-min equilibration with 25% B. The injection volume was 10 μL, and the column temperature was consistently maintained at 35 °C. The MS parameters included: positive ion mode; H-ESI ion source; ion spray voltage of 5000 V; sheath gas flow rate of 40 arbitrary units; aux gas flow rate of 15 arbitrary units; sweep gas flow rate of 20 arbitrary units; ion transfer tube gas temperature of 350 °C; vaporizer temperature of 350 °C; collision energy of 17 eV; CID gas at 2 mTorr; and a Q1/Q3 mass with a full-width half maximum (FWHM) value of 0.4 Da. The characteristic Multiple Reaction Monitoring (MRM) transitions (precursor ion → product ion) were 347.14 → 97.02, 347.14 → 233.1, 347.14 → 205.1 for Oro; 389.15 → 97.02, 411.1 → 97.02, 389.15 → 233.1 for Oro Ace; 347.18 → 97.02, 347.18 → 287.1, 347.18 → 315.1, 347.18 → 329.14 for MeCLA; 299.09 → 185.06, 299.09 → 157.06, 299.09 → 97.02 for GR24; 359.14 → 97.02, 359.14 → 345.1, 359.14 → 299.1 for PL1; 377.15 → 97.02, 377.15 → 359.1, 377.15 → 249.1 for PL2; 375.14 → 97.02, 375.14 → 343.1, 375.14 → 247.1 for PL3; 452.19 → 97.02, 452.19 → 375.1, 452.19 → 315.1 for PL4.
SL collection and fractioning
Analysis of SLs in root exudates followed the protocol by Wang et al.57. In summary, 1 L of collected root exudates was extracted using a C18-Fast Reversed-Phase SPE column (500 mg/3 mL; GracePure™), which had been pre-conditioned with 3 mL of methanol and 3 mL of water. The column was then washed with 3 mL of water, and SLs were eluted with 5 mL of acetone. The SL fraction was concentrated to approximately 1 mL of aqueous solution and then extracted with 1 mL of ethyl acetate. 750 μL of SL enriched organic phase was dried under vacuum. Concentrated SL extracts of root exudates obtained from 12 replicates (~12 L) were dissolved in 1.5 mL EtOAc/ 2 mL Hexane and subjected to silica gel column chromatography (SPE column 60 g /50 mL) with a stepwise elution of Hexane/EtOAc (100:0–0:100, 10% step, 3 mL in each step) to yield 11fractions (A-K). 1 mL of each fraction was subjected to LC-MS analysis for monitoring the potential SLs and verify the Striga bioassay.
Striga germination bioassays
The Striga germination bioassays were conducted following a previously described procedure58. In summary, Striga seeds were surface-sterilized with 50% diluted commercial bleach for 5 min. Then, they were dried and uniformly spread (approximately 50–100 seeds) on 9 mm filter paper discs made of glass fiber. Subsequently, 12 seed-laden discs were placed in a 9 cm Petri dish containing a Whatman filter paper moistened with 3.0 mL of sterilized Milli-Q water. The dishes were sealed with parafilm and incubated at 30 °C for 10 days for pre-conditioning. Post-conditioning, the Striga seeds were treated with SLs from root exudates of various pearl millet lines and incubated again at 30 °C for 24 h. Then, germinated and total seeds were scanned and counted using SeedQuant59, and the percentage of germination was calculated.
Striga emergence under greenhouse pot conditions
The millet lines underwent Striga infection testing in pots within a greenhouse setting. Approximately 2.0 L of blank soil, a mixture of sand and Stender soil, Basissubstrat, in a 1:3 ratio was placed at the base of an 8.0 L perforated plastic pot. Subsequently, approximately 40,000 Striga seeds, equating to roughly 100 mg, were evenly distributed within a 5.0 L soil mixture and layered atop the blank soil in the pot. The Striga seeds within each pot underwent a pre-conditioning period of 10 days at 30 °C with light irrigation maintained under greenhouse conditions. Following this, a single 10-day-old seedling was planted centrally in each pot. The millet plants were cultivated under standard growth conditions, with a temperature of 30 °C and 65% RH. Striga emergence was monitored and recorded for each pot at 70 days post-millet sowing.
Transient expression in Nicotiana benthamiana leaf
The correct transcripts for the three CLAMT genes in pearl millet were identified from the annotation for CLAMT1a and CLAMT1b, while the ambiguity in CLAMT1c was addressed by picking the most likely predicted transcript (CLAMT1c–Iso1) and generating an alternative by FGENESH+ (Berrysoft) (CLAMT1c–Iso2). To generate pearl millet CLAMT plasmids for transient expression in Nicotiana benthamiana, the full-length cDNA of CLAMT1b, CLAMT1a, CLAMT1c-iso1, CLAMT1c-Iso2 (Supplementary Table 7) were amplified by Phusion polymerase (New England Biolabs) from cDNA (CLAMT1b) or synthesized fragments (CLAMT1a and CLAMT1c; Azenta Life Sciences) using primers indicated in Supplementary Table 8. The PCR products were purified and sequenced. Following Sanger sequencing, the gene sequences were amplified by using primers with suitable restriction enzyme sites. The resulting fragments were digested and ligated into the linearized entry vector pIV1A_2.1 which includes the CaMV35S promoter (www.pri.wur.nl/UK/products/ImpactVector/).
After sequence confirmation of the pIV1A_2.1 entry clones, Gateway LR clonase II enzyme mix (Invitrogen) reactions were performed to transfer the fragments into the pBinPlus binary vector60, generating p35S:PBIN-CLAMT1b, p35S:PBIN-CLAMT1a, p35S:PBIN-CLAMT1c-iso1 and p35S:PBIN-CLAMT1c–iso2. Additionally, we cloned the Arabidopsis Atmax1 and Atclamt cDNAs in the same binary vector; pBinPlus for transient expression in N. benthamiana.
The binary vector harboring various genes was introduced into Agrobacterium tumefaciens strain AGL0 via electroporation. Positive clones were cultured at 28 °C at 220 rpm for 2 days in LB medium supplemented with 50 mg/L Kanamycin and 35 mg/L Rifampicin. Cells were collected by centrifugation for 15 min at 3300 g and room temperature. They were then resuspended in 10 mM MES-KOH buffer (pH 5.7) with 10 mM MgCl2 and 100 mM acetosyringone (49-hydroxy-3′,5′-dimethoxyacetophenone; Sigma) to achieve a final OD600 of 0.5. The suspension was incubated with gentle rolling at 22 °C for 2–4 h. For various gene combinations, equal concentrations of Agrobacterium strains carrying different constructs were mixed, using strains with empty vectors to compensate for gene dosage in each combination. Additionally, an Agrobacterium strain containing a gene for the TBSV P19 protein was included to enhance protein production by inhibiting gene silencing. N. benthamiana plants were cultivated in soil pots in a greenhouse under a 14 h light/10 h dark cycle at 25 °C and 22 °C, respectively. Combinations of constructs in Agrobacterium were infiltrated into the leaves of 5-week-old N. benthamiana plants using a 1-mL syringe. Leaves at the same developmental stage were selected to reduce variability. For each gene combination, two to three leaves per plant were infiltrated, with three plants serving as individual biological replicates. The bacterial suspension was gently injected into the abaxial side of the leaf to ensure distribution throughout the entire leaf area. Six days post-infiltration, the leaves were collected for subsequent analysis.
Analysis of resequencing data
We downloaded 1036 sequence read archives (SRA) from NCBI SRA study SRP063925 and converted them to fastq files using sratools61 v3.0.7. We mapped the paired-end reads to our reference P10 assembly using the bwa-mem2 v2.2.1 mem subcommand with default parameters62. We then extracted read mappings that fall within the region of interest (ROI). We estimated the mean coverage for the ROI and that of the harboring chromosome using the samtools63 v1.16.1 subcommand coverage with a minimum MAPQ score of 15. We calculated the coverage ratio (mean coverage of chromosome / mean coverage of ROI) as a proxy for the presence or absence of the ROI and plotted these ratios using Python. A detailed breakdown of the command workflow (Supplementary Fig. 22) is available on our GitHub page (https://github.com/mjfi2sb3/millet-genome-annotation).
Mycorrhizal colonization of P10 and Aw
P10 and Aw were cultivated in sand and inoculated with approximately 1,000 sterile spores of Rhizophagus irregularis (DAOM 197198, Agronutrition, Labège, France). They received watering twice weekly, alternating between with tap water and a modified Long-Ashton (LA) solution containing 3.2 μM Na2HPO4·12 H2O.
All the plants were sampled at 45-days post-inoculation (dpi), corresponding to the late stage of mycorrhization. To evaluate the level of mycorrhization, we performed a morphological analysis according to Trouvelot et al.64. Moreover, we conducted qRT-PCR assays to assess the expression level of the fungal housekeeping gene (RiEF) and a fungal gene preferentially expressed in the intraradical structures (RiPEIP1)65, and two plant AM marker phosphate transporter genes (PtH1.9; EcPt4)66,67. We used alpha-tubulin (TUA)68 as the reference plant gene.
Statistical analysis
Data are represented as mean and their variations as SD. The statistical significance was determined by the two-tailed unpaired Student’s t test or one-way ANOVA and Tukey’s multiple comparison test, using a probability level of P < 0.05. All statistical elaborations were performed using Prism 9 (GraphPad).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.