Original post is here: eklausmeier.goip.de
This blog uses MD4C to convert Markdown to HTML. So far I used PHP:FFI to link PHP with the MD4C C library. PHP:FFI is "Foreign Function Interface" in PHP and allows to call C functions from PHP without writing a PHP extension. Using FFI is very easy.
Previous profiling measurements with XHProf and PHPSPY indicated that the handling of the return value from MD4C via FFI::String takes some time. So I changed FFI to a "real" PHP extension. I measured again. Result: No difference between FFI and PHP extension. So the profiling measurements were misleading.
Also the following claim in the PHP manual is downright false:
it makes no sense to use the FFI extension for speed; however, it may make sense to use it to reduce memory consumption.
Nevertheless, writing a PHP extension was a good exercise to keep my acquaintance with the PHP development ecosystem up to date. I had already written a COBOL to PHP and an IMS/DC to PHP extension:
Literature on writing PHP extension are here:
- Sara Golemon: Extending and Embedding PHP, Sams Publishing, 2006, xx+410 p.
- PHP Internals: Zend extensions
- https://github.com/dstogov/php-extension
The PHP extension code is in GitHub: php-md4c.
1. Walk through the C code. For this simple extension there is no need for a separate header file.
The extension starts with basic includes for PHP, for the phpinfo()
, and for MD4C:
1// MD4C extension for PHP: Markdown to HTML conversion
2
3#ifdef HAVE_CONFIG_H
4#include "config.h"
5#endif
6
7#include <php.h>
8#include <ext/standard/info.h>
9#include <md4c-html.h>
The following code is directly from the FFI part php_md4c_toHtml.c:
1struct membuffer {
2 char* data;
3 size_t asize; // allocated size = max usable size
4 size_t size; // current size
5};
The following routines are also almost the same as in the FFI case, except that memory allocation is using safe_pemalloc()
instead of native malloc()
.
In our case this doesn't make any difference.
1static void membuf_init(struct membuffer* buf, MD_SIZE new_asize) {
2 buf->size = 0;
3 buf->asize = new_asize;
4 if ((buf->data = safe_pemalloc(buf->asize,sizeof(char),0,1)) == NULL)
5 php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_init: safe_pemalloc() failed with asize=%ld.\n",(long)buf->asize);
6}
Next routine uses safe_perealloc()
instead of realloc()
.
1static void membuf_grow(struct membuffer* buf, size_t new_asize) {
2 buf->data = safe_perealloc(buf->data, sizeof(char*), new_asize, 0, 1);
3 if (buf->data == NULL)
4 php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_grow: realloc() failed, new_asize=%ld.\n",(long)new_asize);
5 buf->asize = new_asize;
6}
The rest is identical to FFI.
1static void membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) {
2 if (buf->asize < buf->size + size)
3 membuf_grow(buf, buf->size + buf->size / 2 + size);
4 memcpy(buf->data + buf->size, data, size);
5 buf->size += size;
6}
7
8static void process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) {
9 membuf_append((struct membuffer*) userdata, text, size);
10}
11
12static struct membuffer mbuf = { NULL, 0, 0 };
Now we come to something PHP specific.
We encapsulate the C function into PHP_FUNCTION
.
Furthermore, the arguments of the routine are parsed with ZEND_PARSE_PARAMETERS_START(1, 2)
.
This routine must have at least one argument.
It might have an optional second argument.
That is what is meant by (1,2)
.
The return string is allocated via estrndup()
.
In the FFI case we just return a pointer to a string.
1/* {{{ string md4c_toHtml( string $markdown, [ int $flag ] )
2 */
3PHP_FUNCTION(md4c_toHtml) { // return HTML string
4 char *markdown;
5 size_t markdown_len;
6 int ret;
7 long flag = MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS;
8
9 ZEND_PARSE_PARAMETERS_START(1, 2)
10 Z_PARAM_STRING(markdown, markdown_len)
11 Z_PARAM_OPTIONAL Z_PARAM_LONG(flag)
12 ZEND_PARSE_PARAMETERS_END();
13
14 if (mbuf.asize == 0) membuf_init(&mbuf,16777216); // =16MB
15
16 mbuf.size = 0; // prepare for next call
17 ret = md_html(markdown, markdown_len, process_output,
18 &mbuf, (MD_SIZE)flag, 0);
19 membuf_append(&mbuf,"\0",1); // make it a null-terminated C string, so PHP can deduce length
20 if (ret < 0) {
21 RETVAL_STRINGL("<br>- - - Error in Markdown - - -<br>\n",sizeof("<br>- - - Error in Markdown - - -<br>\n"));
22 } else {
23 RETVAL_STRING(estrndup(mbuf.data,mbuf.size));
24 }
25}
26/* }}}*/
The following two PHP extension specific functions are just for initialization and shutdown. The following diagram from PHP internals shows the sequence of initialization and shutdown.
Init: Do nothing.
1/* {{{ PHP_MINIT_FUNCTION
2 */
3PHP_MINIT_FUNCTION(md4c) { // module initialization
4 //REGISTER_INI_ENTRIES();
5 //php_printf("In PHP_MINIT_FUNCTION(md4c): module initialization\n");
6
7 return SUCCESS;
8}
9/* }}} */
Shutdown: Do nothing.
1/* {{{ PHP_MSHUTDOWN_FUNCTION
2 */
3PHP_MSHUTDOWN_FUNCTION(md4c) { // module shutdown
4 if (mbuf.data) pefree(mbuf.data,1);
5 return SUCCESS;
6}
7/* }}} */
The following function prints out information when called via phpinfo()
.
1/* {{{ PHP_MINFO_FUNCTION
2 */
3PHP_MINFO_FUNCTION(md4c) {
4 php_info_print_table_start();
5 php_info_print_table_row(2, "MD4C", "enabled");
6 php_info_print_table_row(2, "PHP-MD4C version", "1.0");
7 php_info_print_table_row(2, "MD4C version", "0.5.2");
8 php_info_print_table_end();
9}
10/* }}} */
The output looks like this:
Below describes the argument list.
1/* {{{ arginfo
2 */
3ZEND_BEGIN_ARG_INFO(arginfo_md4c_test, 0)
4ZEND_END_ARG_INFO()
5
6ZEND_BEGIN_ARG_INFO(arginfo_md4c_toHtml, 1)
7 ZEND_ARG_INFO(0, str)
8 ZEND_ARG_INFO_WITH_DEFAULT_VALUE(0, flag, "MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS")
9ZEND_END_ARG_INFO()
10/* }}} */
11
12/* {{{ php_md4c_functions[]
13 */
14static const zend_function_entry php_md4c_functions[] = {
15 PHP_FE(md4c_toHtml, arginfo_md4c_toHtml)
16 PHP_FE_END
17};
18/* }}} */
The zend_module_entry
is somewhat classical.
All the above is configured here.
1/* {{{ md4c_module_entry
2 */
3zend_module_entry md4c_module_entry = {
4 STANDARD_MODULE_HEADER,
5 "md4c", // Extension name
6 php_md4c_functions, // zend_function_entry
7 NULL, //PHP_MINIT(md4c), // PHP_MINIT - Module initialization
8 PHP_MSHUTDOWN(md4c), // PHP_MSHUTDOWN - Module shutdown
9 NULL, // PHP_RINIT - Request initialization
10 NULL, // PHP_RSHUTDOWN - Request shutdown
11 PHP_MINFO(md4c), // PHP_MINFO - Module info
12 "1.0", // Version
13 STANDARD_MODULE_PROPERTIES
14};
15/* }}} */
This seemingly innocent looking statement is important: Without it you will get PHP Startup: Unable to load dynamic library
.
1#ifdef COMPILE_DL_TEST
2# ifdef ZTS
3ZEND_TSRMLS_CACHE_DEFINE()
4# endif
5#endif
6ZEND_GET_MODULE(md4c)
2. M4 config file.
The PHP extension requires a config.m4
file.
dnl config.m4 for php-md4c extension
PHP_ARG_WITH(md4c, [whether to enable MD4C support],
[ --with-md4c[[=DIR]] Enable MD4C support.
DIR is the path to MD4C install prefix])
if test "$PHP_YAML" != "no"; then
AC_MSG_CHECKING([for md4c headers])
for i in "$PHP_MD4C" "$prefix" /usr /usr/local; do
if test -r "$i/include/md4c-html.h"; then
PHP_MD4C_DIR=$i
AC_MSG_RESULT([found in $i])
break
fi
done
if test -z "$PHP_MD4C_DIR"; then
AC_MSG_RESULT([not found])
AC_MSG_ERROR([Please install md4c])
fi
PHP_ADD_INCLUDE($PHP_MD4C_DIR/include)
dnl recommended flags for compilation with gcc
dnl CFLAGS="$CFLAGS -Wall -fno-strict-aliasing"
export OLD_CPPFLAGS="$CPPFLAGS"
export CPPFLAGS="$CPPFLAGS $INCLUDES -DHAVE_MD4C"
AC_CHECK_HEADERS([md4c.h md4c-html.h], [], AC_MSG_ERROR(['md4c.h' header not found]))
#AC_CHECK_HEADER([md4c-html.h], [], AC_MSG_ERROR(['md4c-html.h' header not found]))
PHP_SUBST(MD4C_SHARED_LIBADD)
PHP_ADD_LIBRARY_WITH_PATH(md4c, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
PHP_ADD_LIBRARY_WITH_PATH(md4c-html, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
export CPPFLAGS="$OLD_CPPFLAGS"
PHP_SUBST(MD4C_SHARED_LIBADD)
AC_DEFINE(HAVE_MD4C, 1, [ ])
PHP_NEW_EXTENSION(md4c, md4c.c, $ext_shared)
fi
3. Compiling. Run
1phpize
2./configure
3make
Symbols are as follows:
1$ nm md4c.so
20000000000002160 r arginfo_md4c_test
30000000000003d00 d arginfo_md4c_toHtml
4 w __cxa_finalize@GLIBC_2.2.5
500000000000040a0 d __dso_handle
60000000000003dc0 d _DYNAMIC
7 U _emalloc
8 U _emalloc_64
9 U _estrndup
1000000000000016c8 t _fini
11 U free@GLIBC_2.2.5
1200000000000016c0 T get_module
130000000000003fe8 d _GLOBAL_OFFSET_TABLE_
14 w __gmon_start__
1500000000000021c8 r __GNU_EH_FRAME_HDR
160000000000001000 t _init
17 w _ITM_deregisterTMCloneTable
18 w _ITM_registerTMCloneTable
190000000000004180 b mbuf
2000000000000040c0 D md4c_module_entry
21 U md_html
22 U memcpy@GLIBC_2.14
23 U php_error_docref
24 U php_info_print_table_end
25 U php_info_print_table_row
26 U php_info_print_table_start
270000000000003d60 d php_md4c_functions
28 U php_printf
290000000000001640 t process_output
300000000000001234 t process_output.cold
31 U _safe_malloc
32 U _safe_realloc
33 U __stack_chk_fail@GLIBC_2.4
34 U strlen@GLIBC_2.2.5
350000000000004168 d __TMC_END__
36 U zend_parse_arg_long_slow
37 U zend_parse_arg_str_slow
38 U zend_wrong_parameter_error
39 U zend_wrong_parameters_count_error
40 U zend_wrong_parameters_none_error
41. . .
420000000000001380 T zif_md4c_toHtml
4300000000000011cf t zif_md4c_toHtml.cold
440000000000001175 T zm_info_md4c
450000000000001350 T zm_shutdown_md4c
4600000000000016b0 T zm_startup_md4c
4. Installing on Arch Linux. Copy the md4c.so
library to /usr/lib/php/modules
as root:
1cp modules/md4c.so /usr/lib/php/modules
Finally activate the extension in php.ini
:
1extension=md4c
5. Notes on Windows. On Linux we use the installed MD4C library. As noted in Installing Simplified Saaze on Windows 10 #2 it is advisable to amalgamate all MD4C source files into a single file for easier compilation.