Thanh Han The wrote:
I am working on a wrapper in C so that application can call pdftex via a library call. Can you please have a look at the proposed API and comment on it if you find a potential problem?
Trying to do something like this cross-platform is very difficult because not all systems have the necessary underlying functionatlity to do it right. For instance, the init_pdftex_data interface in your proposal has a working directory parameter as a string. That's a security problem. Since it's not guaranteed that there is a reference (open file descriptor etc) in the directory somebody might change the directory (rename an existing one) and the TeX run overwrites other files. Or, more likely, a part of the path name is changed (symlink attack). The only way to guarantee that directory the caller intends to use is indeed used is by passing in a file descriptor. In the POSIX world this is no problem. The file descriptor is inherited through a fork() call and before the exec() call to pdftex you call fchdir(fd). He is where you'll find problems since not all systems can implement this. I assume your 'run_pdftex' interface is synchronous. IMO It would be at least required to have an asynchronous version as well. I.e., a version where you initiate the start and then later independently query and if necessary wait for the result. The reason is obvious: the program can do work on its own while TeX is running. Parallelism is extremely important going forward. And an implementation detail: _never_ expose data structures unless it is really, *REALLY* needed. I'm talking here about the pdftex_data_struct, of course. Direct access to any of its members in the user code is in no way performance critical. The initiated TeX runs are quite expensive in terms of execution time so that any memory allocation performed is completely negligible. So I propose to make the structure completely opaque. I.e., in the public header only have typedef struct pdftex_data_struct pdftex_data_t; (I renamed the struct as well, _t is often used to indicate type names). Then change the init_pdftex_data() function to take a pdftex_data_t** parameter. The function will itself allocate the memory for the structure. If allocation fails the pointer variable pointed to by the parameter is set to NULL. Otherwise to the newly allocated memory. Error handling when returning from init_pdftex_data() has to handle this case (BTW: why not return an error code and not just success/failure information from the functions, then you don't have to pass a pointer to the tmp variable to pds_print_error). Anyway, if you make this change the information about the struct is completely encapsulated in your code. This is important for maintainability since it gives you the opportunity to change the implementation as much as you want as long as the function interfaces remain the same. About pds_print_error_and_exit: such an interface is usually not useful except in tiny little programs. Assume you write a graphical shell for TeX. You don't want to terminate the program after a failed run, the user should be able to fix problems and rerun. What is needed, though, is the ability to show an error string. So, what maybe is needed is to have a function which returns an error string which can be printed in the appropriate way (on terminal, in dialog box, whatever). About the interface naming: C's flat namespace is crowded. To minimize the risk of conflicts you should standardize on a common prefix for all function and type names and stick with it. E.g., pdftex_data_struct -> pdftexlib_data_struct init_pdftex_data -> pdftexlib_data_init pds_print_error_and_exit -> pdftexlib_error run_pdftex -> pdftexlib_run you get the idea. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖