https://www.ncbi.nlm.nih.gov/datasets/g ... 04526295.1
The download is selfexplanatory. The only thing you should do is to set the file content into all upper case and remove five textual separators
">CP038189.1 ...."
Or:
If you can provide a ftp-address for me, I could load it there.
Addl. Info:
-The u/l-cases are for our later use: lower case are "repeats" from the preprocessing of this file version, the upper cases are so called "non-repeating" sequences. For our core-counting process evaluation just put the file all in one case.
Alternative download-sites:
https://parasite.wormbase.org/Caenorhab ... nfo/Index/
https://www.ncbi.nlm.nih.gov/genome/41
The download site for human genome files is:
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/
Then: "latest", then "hg38.fa.gz"
A single human chromosome size is between 50 and 200 mio, which is in the ballpark of c.elegans. For our current purposes it doesn' matter right now, which data we use. What to choose for the features of a real program depends on a variety of intentions, accompaniing files, policies and sources....
Be aware that there are several consortia and database providers, with different science goals with different stages and dates of the same genome.
@Wilbert:
I remember what you mentioned i.r. to genome sizes. Give a bit time to express my thoughts more precisely there.
(The first would be: If I could have a x64 register, I'll take that, all smaller sizes are included....)
