Page 1 of 2
Genomic sequences matches
Posted: Sat Mar 28, 2020 12:41 pm
by Psychophanta
There is some info there about the current SARS-CoV-2 genome sequence has several threads coincident with other HIV threads in its genome.
In order to locate the supposed coincident sequences, here you have the supposed SARS-CoV-2 complete genome sequence (this one from Japan), and also the complete sequences of the HIV-1 and HIV-2.
I have there some dealing tips nice to work with plain text in order to perform a lot of tasts, however, i don't have any to do this kind of so simple task (and not found there in the forum).
The task is so simple: it should locate any identical sequence, with a size at least of 'n' bytes (programmable parameter) between 2 given ascii (or unicode, or any) texts.
https://www.ncbi.nlm.nih.gov/nuccore/LC528232
HIV 1 :
https://www.ncbi.nlm.nih.gov/nuccore/MN692147.1
HIV 2:
https://www.ncbi.nlm.nih.gov/nuccore/AF082339.1
I will do it when i have time enough, even some of you do it too. My version will be able also to manage files bigger than the amount of RAM in the system.
I just post this here for your possible interest and curiosity.
Re: Genomic sequences matches
Posted: Tue Mar 31, 2020 12:25 pm
by Olliv
There is two roots : S and L.
Why just only one root is displayed ?
This just shows it is caused by a natural selecting.
Re: Genomic sequences matches
Posted: Thu Apr 02, 2020 5:16 pm
by Psychophanta
@Olliv, also in ARN is two roots?
I have no idea about genetics, but no doubt it must be an awesome knowledge.
Well, in fact, the goal is not to deal just with ARN sequences, but to locate coincident sequences of data in any kind of data type large sequence (file, for example).
So the silliest algorithm is the one i wrote, which is functional, but silly and slooooow:
I have conceived another algorythms to do it, but it requires some few time to write a correct code, and it is not my priority for now:
Code: Select all
;Esto es un programa que usa el algoritmo más tonto para hacer lo que sigue:
;Se trata de encontrar toda secuencia coincidente entre un fichero 'fil0$' y otro 'fil1$'
;la LONGITUD MINIMA de la secuencia se da como entrada.
; a ver si a alguien se le ocurre un algoritmo, medio decente al menos, para hacer esto.
Procedure.q leedato(fil,*store.ascii,tamano.u)
Protected dato.a,n.u,punto.q=0
For n=0 to tamano-1
If Eof(fil):ProcedureReturn 0:EndIf
dato.a=ReadAsciiCharacter(fil)
While dato<'A' Or dato>'z' Or (dato>'Z' And dato<'a')
If Eof(fil):ProcedureReturn 0:EndIf
dato.a=ReadAsciiCharacter(fil)
Wend
PokeA(*store.ascii+n,dato.a):If n=0:punto.q=Loc(fil):EndIf
Next
ProcedureReturn punto.q
EndProcedure
fil0$="D:\Genoma SARS-CoV-2.txt"
; fil1$="D:\Genoma VIH-1.txt"
fil1$="D:\Genoma VIH-2.txt"
Readfile(0,fil0$,#PB_Ascii)
Readfile(1,fil1$,#PB_Ascii)
tamanobloque.u=10; <- minimum size of string to find
*almacen.ascii=AllocateMemory(1024,#PB_Memory_NoClear)
*almacen2.ascii=AllocateMemory(1024,#PB_Memory_NoClear)
posinit0.q=18937:posinit1.q=9862; <- starting positions in each file
pos0.q=posinit0.q:pos1.q=posinit1.q
OpenConsole("encuentros",#PB_Ascii)
FileSeek(0,posinit0.q,#PB_Absolute)
While Eof(0)=0
pos0=leedato(0,*almacen.ascii,tamanobloque.u)
If pos0=0:Break:EndIf
FileSeek(1,posinit1.q,#PB_Absolute)
matchesinfil1.q=0
While Eof(1)=0
pos1=leedato(1,*almacen2.ascii,tamanobloque.u)
If pos1=0:Break:EndIf
If CompareMemory(*almacen.ascii,*almacen2.ascii,tamanobloque.u):matchesinfil1.q+1
pushpos0.q=Loc(0)
n.u=0
dato.a=ReadAsciiCharacter(0)
dato1.a=ReadAsciiCharacter(1)
If Eof(0) Or Eof(1):Break 2:EndIf
While dato=dato1 And (dato>='A' And dato<='z') And (dato<='Z' Or dato>='a')
PokeA(*almacen.ascii+tamanobloque.u+n,dato)
PokeA(*almacen2.ascii+tamanobloque.u+n,dato1)
n.u+1
dato.a=ReadAsciiCharacter(0)
dato1.a=ReadAsciiCharacter(1)
If Eof(0) Or Eof(1):Break 3:EndIf
Wend
If n:Beep_(444+n*100,100):EndIf
FileSeek(0,pushpos0,#PB_Absolute)
pos1=Loc(1)-1
PrintN("Match#: "+Str(matchesinfil1.q)+", Pos in SARS-CoV-2: "+Hex(pos0-posinit0)+", Pos in VIH-2: "+Hex(pos1-posinit1)+", String: "+PeekS(*almacen.ascii,tamanobloque.u+n.u,#PB_Ascii))
EndIf
FileSeek(1,pos1,#PB_Absolute)
Wend
If matchesinfil1.q:pos0+tamanobloque.u+matchesinfil1.q:EndIf
FileSeek(0,pos0,#PB_Absolute)
Wend
Beep_(811,1200)
PrintN("End of data reached. Press return to exit"):Input()
CloseConsole()
FreeMemory(*almacen2.ascii)
FreeMemory(*almacen.ascii)
CloseFile(1)
CloseFile(0)
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 5:26 am
by Olliv
Psychophanta wrote:@Olliv, also in ARN is two roots?
There is no DNA, just RNA.
You produce RNA from your DNA in order to make again your DNA next generation cells.
Retrovirions replace a part of your RNA to modify your DNA cell in order to produce its own RNA and capsids to multiply themselves. A capsid is the protecting coverage of this virus. Human DNA changes allow also the virus to prevent antibody producing.
It is a delicate and fragile period without any datas about cancerous effects on asymptomatic bodies... But the humankind can certainly win. Remember London under V2s bombing : they won.
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 1:29 pm
by Psychophanta
About this called SARS-CoV-2, there are not answered questions:
Every country has their genetists and virologists which work for the main administrations.
There is known that the complete genome of any virus is sequenced in some hours or few days.
The global media is confusing people every day , because some of them say the virus has a 100% natural origin, and other ones say the virus is synthetic, or semi-synthetic.
So, looks like very clear that all the governments know the truth.
Then: Why they do not say that true about the virus origin and other technical details to the people?
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 3:07 pm
by Lord
Don't start a conspiracy theory.
It is known that Virus is originated from bats.
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 4:42 pm
by Psychophanta
Lord wrote:Don't start a conspiracy theory.
It is known that Virus is originated from bats.
Sorry, but firstly:
My comment had NOTHING to do with your answer, neither about conspiracy, because the statement "Virus is originated from bats" IS oficially hypothetical. Just for your info.
Secondly:
conspiracy is a fact, not a theory. Man; what about to read a little bit, instead to repeat things without thinks.
There are lots of works at your hand about the conspiracy facts, since Sumeria, at least.
"The prince", from Niccolò Machiavelli, is one which comes now to my mind, but you have lots.
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 5:23 pm
by Kiffi
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 8:35 pm
by Olliv
I do not know what to say, Psychophanta. I think I said the maximum about a simple explanation of retroviral mechanism.
Conspiracy question is ever to keep in mind, but conspiracy answer does not exist without well detailed analysis.
We often find excellent point of view in the analysis, excellent anymore to stop the conspiracy concludings...
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 9:19 pm
by mk-soft
In conspiracy theories, facts don't help either. They are always ignored.
My think,
This is the first wave of an alien invasion.

Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 9:32 pm
by idle
mk-soft wrote:In conspiracy theories, facts don't help either. They are always ignored.
My think,
This is the first wave of an alien invasion.

and we're all packaged up in our homes ready for harvesting!
There's nothing wrong with looking at it but it's not enough to search for exact matches as such.
It might be better to try BLAST
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 10:17 pm
by Josh
Oh folks, don't act like that. Psychophanta just forgot to set the irony tags

Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 10:42 pm
by Olliv
Here in France, a quick blood test, to
check if a person has antibodies defenses, is the object of a dealing authorizing submission to the governement.
What about your own countries ?
@Idle
I can access to the site BLAST. I add I saw APIs. Temptation to a PureBlast ?

Re: Genomic sequences matches
Posted: Fri Apr 03, 2020 11:04 pm
by idle
this site is interesting perhaps, shows all the strains
https://nextstrain.org/ncov
@Olliv
I think you can also download Blast executables and DataBase and run it locally.
Re: Genomic sequences matches
Posted: Sat Apr 04, 2020 3:37 am
by Olliv
I have TV since 13 years. I try to repair computer first !