PureBasic Forums - English

Posted: **Thu Apr 17, 2014 4:11 pm**

Hi all,
I'm still working on a LinkGrammar (http://www.abisource.com/projects/link-grammar/) wrapper for PB, and I'm having a quite interesting issue.
Here's what I have so far: https://dl.dropboxusercontent.com/u/287 ... 3.zip?dl=1

The package includes a x86 dll and static lib, includes and a basic example.

I'm not sure why, but LinkGrammar is unable to find the specified dictionary (see ln. 7 in lg_test.pb). The example is compiled as unicode.
What I managed to find out is that the functions, particularly Dictionary_Set_Data_Dir and possibly Dictionary_Create_Lang expect a utf8 string. If I pass a data directory to Dictionary_Set_Data_Dir, and read it back via a peek, however, the string is cut off right before a colon, e.g. "C:\test" becomes just "C".
If there is no default path specified, a peek returns the proper program directory.

I could think that the dictionary pointer is null because PB does not convert to UTF8 when a string is passed in a function call, but "en" should be the same in both unicode and utf8... I am really out of ideas.

If anyone could take a look at this, and we could fix it, the PB community would gain a quite useful library for language parsing IMHO.
Thanks for any help provided in advance!

Erion

Posted: **Thu Apr 17, 2014 8:23 pm**

Are you using the appropriate psuedo-types and/or Unicode versions of commands when handling strings?

I only took a quick glance, but it didn't seem like it.

Posted: **Thu Apr 17, 2014 8:32 pm**

Hi,
Thanks for your reply!

I have modified the two functions, where I replaced .s with .p-unicode. Unfortunately, there is no change.

Shouldn't PB auto convert the parameters though, especially if the executable is compiled as unicode? Just wondering...

Erion

Posted: **Thu Apr 17, 2014 8:43 pm**

Honestly, don't know. I've never compiled a unicode application before.

There is also PeekU(), try that?
One of the Peek commands also has flags to specify Unicode, ASCII etc..

Try looking up "Unicode" in the help CHM and see if you find anything useful

Syntax

Text$ = PeekS(*MemoryBuffer [, Length [, Format]])
Description

Reads a string from the specified memory address.
Parameters

*MemoryBuffer The address to read from.
Length (optional) The maximum number of characters to read. If this parameter is not specified or -1 is used then there is no maximum. The string is read until a terminating null-character is encountered or the maximum length is reached.
Format (optional) The string format to use when reading the string. This can be one of the following values:
#PB_Ascii : Reads the strings as ascii
#PB_UTF8 : Reads the strings as UTF8
#PB_Unicode: Reads the strings as unicode

The default is #PB_Unicode if the program is compiled in unicode mode and #PB_Ascii otherwise.

Return value

Returns the read string.
See Also

PokeS(), MemoryStringLength(), CompareMemoryString(), CopyMemoryString()
Supported OS

All

Posted: **Fri Apr 18, 2014 3:57 pm**

Hi,

Yes. As I wrote in the first post, if I peek a path that I set using Dictionary_Set_Data_Dir, it returns only up to the first colon. PeekU returns only the first unicode character (2 bytes) of a buffer, so unfortunately it won't work here.
What baffles me is why the dictionaries are not found, even when using the default, unspecified dictionary path, which is the program's directory. Could it be because PB's not converting argument calls to UTF8, and the function expects a utf8 path/dictionary language string?
I might end up contacting the LinkGrammar people, but I thought I'd ask here first, in case it's related to PB.

In case this helps, here's an official example http://abiword.com/projects/link-grammar/api/index.html
This is roughly the same as what I have in the LinkGrammar archive, except my *dict pointer is always null.

Erion

Posted: **Fri Apr 18, 2014 5:34 pm**

Hi erion,

The following files are missing from your LinkGrammar package:
- msys-1.0.dll
- msys-regex-1.dll

Addressing your original question:
- change the following (lg_test.pb):
-- remove Unicode from "Compiler Options"
-- remove backslash from "data": Dictionary_Set_Data_Dir(GetPathPart(ProgramFilename())+"data")
-- change the error MessageRequester:
--- MessageRequester("Error", PeekS(Dictionary_Get_Data_Dir()))
- change to the following (link-includes.pbi):
-- linkgrammar_get_dict_version(Dictionary.p-utf8)
-- dictionary_create_lang(lang.p-utf8)
-- dictionary_set_data_dir(path.p-utf8)

*** This will only show a new set of problems ***

Note (missing from your example):
- dict : dictionary Structure
-- required Sub-Structures
- sent : Sentence Structure
-- required Sub-Structures
- link : linkage Structure
-- required Sub-Structures
- etc.

Just a suggestion: Take what you've already done - Start from scratch - Build the includes - Test each step.
- begin with the file [ api-structures.h ] from the download [ link-grammar-5.0.5 ]

I started the following... Structures still need work, but it's how I would begin:
- simple example: http://abiword.com/projects/link-gramma ... l#example1
- break the code into separate includes only after you have a working example
- add Constants, Structures, Macros, Functions only as needed, until you have a working example

Code: Select all

Enumeration ConstituentDisplayStyle
  #NO_DISPLAY               = 0
  #MULTILINE                = 1
  #BRACKET_TREE             = 2
  #SINGLE_LINE              = 3
  #MAX_STYLES               = 3
EndEnumeration

Structure Dictionary Align #PB_Structure_AlignC
  
EndStructure

Structure Sentence Align #PB_Structure_AlignC
  
EndStructure

Structure Linkage Align #PB_Structure_AlignC
  
EndStructure

Structure Resources Align #PB_Structure_AlignC
  max_parse_time.l
  max_memory.l
  time_when_parse_started.d
  space_when_parse_started.l
  when_created.d
  when_last_called.d
  cumulative_time.d
  memory_exhausted.l
  timer_expired.l
EndStructure

Structure Cost_Model Align #PB_Structure_AlignC
	type.l
	*compare_fn
EndStructure

Structure Parse_Options Align #PB_Structure_AlignC
  verbosity.l
  *debug
  *test
  use_sat_solver.b
  use_viterbi.b
  linkage_limit.l
  disjunct_cost.d
  min_null_count.l
  max_null_count.l
  null_block.l
  islands_ok.b
  twopass_length.l
  max_sentence_length.l
  short_length.l
  all_short.b
  use_spell_guess.b
  repeatable_rand.b
  cost_model.Cost_Model
  resources.Resources
  display_short.b
  display_word_subscripts.b
  display_link_subscripts.b
  display_walls.b
  allow_null.b
  use_cluster_disjuncts.b
  echo_on.b
  batch_mode.b
  panic_mode.b
  screen_width.l
  display_on.b
  display_postscript.b
  display_constituents.l
  display_bad.b
  display_disjuncts.b
  display_links.b
  display_morphology.b
  display_senses.b
EndStructure

ImportC "liblink-grammar-5.lib"
  dictionary_create_lang(lang.p-utf8)
  dictionary_set_data_dir(path.p-utf8)
  sentence_create(input_string.p-utf8, *dict)
  parse_options_create()
  sentence_split(*sent, *opts)
  sentence_parse(*sent, *opts)
  linkage_create(index, *sentSentence, *opts)
  linkage_print_diagram(*linkage)
EndImport

dictionary_create_lang("en")
*dict = dictionary_set_data_dir("data")

If *dict
  *sent = sentence_create("My dog likes dog food.", *dict)
  *options.Parse_Options = parse_options_create()
;   sentence_split(*sentence, *options)
;   sentence_parse(*sentence, *options)
  *link = linkage_create(0, *sent, *options)
  Debug linkage_print_diagram(*link)
EndIf

Posted: **Sat Apr 19, 2014 11:50 am**

Hi JHPJHP,

Huge thanks for your help!

JHPJHP wrote:The following files are missing from your LinkGrammar package:
- msys-1.0.dll
- msys-regex-1.dll

I have noticed that the regexp library was missing right after I uploaded it to DropBox, but Windows did not complain, so I thought it would work anyway. Interestingly, msys-1.0.dll did not pop up, when I checked via DependencyWalker.

It seems that you do have to specify the argument type, according to your code. In the PB manual, however, in the Import : EndImport section we have:

The compiler will automatically converts the strings to unicode when needed.

It seems that this is not entirely automatic, not at least when it comes to unicode/utf-8.

JHPJHP wrote:Note (missing from your example):
- dict : dictionary Structure
-- required Sub-Structures
- sent : Sentence Structure
-- required Sub-Structures
- link : linkage Structure
-- required Sub-Structures
- etc.

I did not find a structure for these in the sources, when Iwas searching for struct dict, struct link, etc.

Once again, thank you very much for your help!

Erion

Posted: **Sat Apr 19, 2014 7:08 pm**

To add to what I have said previously:
I compiled LG 5.06 https://dl.dropboxusercontent.com/u/287 ... 6.zip?dl=1
The missing structures are hopefully in, also Jhpjhp's example as test.pb.
The problem is still with dictionary_create_lang: invalid memory read at address 16. It does not matter what data directory I set or don't set, LinkGrammar is unable to open the specified dictionary.
Jhpjhp's example does exactly the same.

Erion

Posted: **Tue Apr 22, 2014 8:30 pm**

Hi erion,

I'm not sure if your still looking for a solution...

A couple observations:
- missing the Sentence Structure
- link-includes.pbi: there are Enumeration and Structure declarations inside the ImportC declaration
-- putting them outside allows the GUI debugger to work correctly

Something to try:

While working on the OpenCV frame-work, and because PureBasic for the most part doesn't allow values from a Function to be returned directly to a Structure (outside of a Pointer or using ASM) - the following worked for me, and may provide a solution for you; or possibly a variation of the following.

Standard (Function returns *dict pointer):

Code: Select all

ImportC "..\liblink-grammar-5.lib"
  dictionary_create_lang(lang.p-utf8)
EndImport
Global *dict.Dictionary = Dictionary_Create_Lang("en")

Alternate (@dict pointer included in the Function):

Code: Select all

ImportC "..\liblink-grammar-5.lib"
  dictionary_create_lang(*dict, lang.p-utf8)
EndImport
Global dict.Dictionary
Dictionary_Create_Lang(@dict, "en")

Posted: **Tue Apr 22, 2014 10:04 pm**

Hi,

Thank you! The alternate pointer to return a struct solved the invalid memory read at address 16 error. Unfortunately, the *dict pointer is still zero, which seems to indicate that linkGrammar is unable to find the appropriate dictionary.

What is even more frustrating is that I have no idea if this is an issue on PB's end, or a LinkGrammar bug.
To add to this, PB seems to be able to find the function names if I use Import, but also if ImportC is specified.

Erion

PureBasic Forums - English

Yet another LinkGrammar Wrapper issue

Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue

Re: Yet another LinkGrammar Wrapper issue