MD4C PHP Extension

· klm's blog

How to write a PHP extension. This covers writing an extension for the Markdown library MD4C.

Original post is here: eklausmeier.goip.de

This blog uses MD4C to convert Markdown to HTML. So far I used PHP:FFI to link PHP with the MD4C C library. PHP:FFI is "Foreign Function Interface" in PHP and allows to call C functions from PHP without writing a PHP extension. Using FFI is very easy.

Previous profiling measurements with XHProf and PHPSPY indicated that the handling of the return value from MD4C via FFI::String takes some time. So I changed FFI to a "real" PHP extension. I measured again. Result: No difference between FFI and PHP extension. So the profiling measurements were misleading.

Also the following claim in the PHP manual is downright false:

it makes no sense to use the FFI extension for speed; however, it may make sense to use it to reduce memory consumption.

Nevertheless, writing a PHP extension was a good exercise to keep my acquaintance with the PHP development ecosystem up to date. I had already written a COBOL to PHP and an IMS/DC to PHP extension:

  1. PHP extension seg-faulting
  2. IMS/DC MFS To PHP

Literature on writing PHP extension are here:

  1. Sara Golemon: Extending and Embedding PHP, Sams Publishing, 2006, xx+410 p.
  2. PHP Internals: Zend extensions
  3. https://github.com/dstogov/php-extension

The PHP extension code is in GitHub: php-md4c.

1. Walk through the C code. For this simple extension there is no need for a separate header file. The extension starts with basic includes for PHP, for the phpinfo(), and for MD4C:

1// MD4C extension for PHP: Markdown to HTML conversion
2
3#ifdef HAVE_CONFIG_H
4#include "config.h"
5#endif
6
7#include <php.h>
8#include <ext/standard/info.h>
9#include <md4c-html.h>

The following code is directly from the FFI part php_md4c_toHtml.c:

1struct membuffer {
2	char* data;
3	size_t asize;	// allocated size = max usable size
4	size_t size;	// current size
5};

The following routines are also almost the same as in the FFI case, except that memory allocation is using safe_pemalloc() instead of native malloc(). In our case this doesn't make any difference.

1static void membuf_init(struct membuffer* buf, MD_SIZE new_asize) {
2	buf->size = 0;
3	buf->asize = new_asize;
4	if ((buf->data = safe_pemalloc(buf->asize,sizeof(char),0,1)) == NULL)
5		php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_init: safe_pemalloc() failed with asize=%ld.\n",(long)buf->asize);
6}

Next routine uses safe_perealloc() instead of realloc().

1static void membuf_grow(struct membuffer* buf, size_t new_asize) {
2	buf->data = safe_perealloc(buf->data, sizeof(char*), new_asize, 0, 1);
3	if (buf->data == NULL)
4		php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_grow: realloc() failed, new_asize=%ld.\n",(long)new_asize);
5	buf->asize = new_asize;
6}

The rest is identical to FFI.

 1static void membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) {
 2	if (buf->asize < buf->size + size)
 3		membuf_grow(buf, buf->size + buf->size / 2 + size);
 4	memcpy(buf->data + buf->size, data, size);
 5	buf->size += size;
 6}
 7
 8static void process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) {
 9	membuf_append((struct membuffer*) userdata, text, size);
10}
11
12static struct membuffer mbuf = { NULL, 0, 0 };

Now we come to something PHP specific. We encapsulate the C function into PHP_FUNCTION. Furthermore, the arguments of the routine are parsed with ZEND_PARSE_PARAMETERS_START(1, 2). This routine must have at least one argument. It might have an optional second argument. That is what is meant by (1,2). The return string is allocated via estrndup(). In the FFI case we just return a pointer to a string.

 1/* {{{ string md4c_toHtml( string $markdown, [ int $flag ] )
 2 */
 3PHP_FUNCTION(md4c_toHtml) {	// return HTML string
 4	char *markdown;
 5	size_t markdown_len;
 6	int ret;
 7	long flag = MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS;
 8
 9	ZEND_PARSE_PARAMETERS_START(1, 2)
10		Z_PARAM_STRING(markdown, markdown_len)
11		Z_PARAM_OPTIONAL Z_PARAM_LONG(flag)
12	ZEND_PARSE_PARAMETERS_END();
13
14	if (mbuf.asize == 0) membuf_init(&mbuf,16777216);	// =16MB
15
16	mbuf.size = 0;	// prepare for next call
17	ret = md_html(markdown, markdown_len, process_output,
18		&mbuf, (MD_SIZE)flag, 0);
19	membuf_append(&mbuf,"\0",1); // make it a null-terminated C string, so PHP can deduce length
20	if (ret < 0) {
21		RETVAL_STRINGL("<br>- - - Error in Markdown - - -<br>\n",sizeof("<br>- - - Error in Markdown - - -<br>\n"));
22	} else {
23		RETVAL_STRING(estrndup(mbuf.data,mbuf.size));
24	}
25}
26/* }}}*/

The following two PHP extension specific functions are just for initialization and shutdown. The following diagram from PHP internals shows the sequence of initialization and shutdown.

Init: Do nothing.

1/* {{{ PHP_MINIT_FUNCTION
2 */
3PHP_MINIT_FUNCTION(md4c) {	// module initialization
4	//REGISTER_INI_ENTRIES();
5	//php_printf("In PHP_MINIT_FUNCTION(md4c): module initialization\n");
6
7	return SUCCESS;
8}
9/* }}} */

Shutdown: Do nothing.

1/* {{{ PHP_MSHUTDOWN_FUNCTION
2 */
3PHP_MSHUTDOWN_FUNCTION(md4c) {	// module shutdown
4	if (mbuf.data) pefree(mbuf.data,1);
5	return SUCCESS;
6}
7/* }}} */

The following function prints out information when called via phpinfo().

 1/* {{{ PHP_MINFO_FUNCTION
 2 */
 3PHP_MINFO_FUNCTION(md4c) {
 4	php_info_print_table_start();
 5	php_info_print_table_row(2, "MD4C", "enabled");
 6	php_info_print_table_row(2, "PHP-MD4C version", "1.0");
 7	php_info_print_table_row(2, "MD4C version", "0.5.2");
 8	php_info_print_table_end();
 9}
10/* }}} */

The output looks like this:

Below describes the argument list.

 1/* {{{ arginfo
 2 */
 3ZEND_BEGIN_ARG_INFO(arginfo_md4c_test, 0)
 4ZEND_END_ARG_INFO()
 5
 6ZEND_BEGIN_ARG_INFO(arginfo_md4c_toHtml, 1)
 7	ZEND_ARG_INFO(0, str)
 8	ZEND_ARG_INFO_WITH_DEFAULT_VALUE(0, flag, "MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS")
 9ZEND_END_ARG_INFO()
10/* }}} */
11
12/* {{{ php_md4c_functions[]
13 */
14static const zend_function_entry php_md4c_functions[] = {
15	PHP_FE(md4c_toHtml,	arginfo_md4c_toHtml)
16	PHP_FE_END
17};
18/* }}} */

The zend_module_entry is somewhat classical. All the above is configured here.

 1/* {{{ md4c_module_entry
 2 */
 3zend_module_entry md4c_module_entry = {
 4	STANDARD_MODULE_HEADER,
 5	"md4c",						// Extension name
 6	php_md4c_functions,			// zend_function_entry
 7	NULL,	//PHP_MINIT(md4c),	// PHP_MINIT - Module initialization
 8	PHP_MSHUTDOWN(md4c),		// PHP_MSHUTDOWN - Module shutdown
 9	NULL,						// PHP_RINIT - Request initialization
10	NULL,						// PHP_RSHUTDOWN - Request shutdown
11	PHP_MINFO(md4c),			// PHP_MINFO - Module info
12	"1.0",						// Version
13	STANDARD_MODULE_PROPERTIES
14};
15/* }}} */

This seemingly innocent looking statement is important: Without it you will get PHP Startup: Unable to load dynamic library.

1#ifdef COMPILE_DL_TEST
2# ifdef ZTS
3ZEND_TSRMLS_CACHE_DEFINE()
4# endif
5#endif
6ZEND_GET_MODULE(md4c)

2. M4 config file. The PHP extension requires a config.m4 file.

dnl config.m4 for php-md4c extension

PHP_ARG_WITH(md4c, [whether to enable MD4C support],
[  --with-md4c[[=DIR]]       Enable MD4C support.
                          DIR is the path to MD4C install prefix])

if test "$PHP_YAML" != "no"; then

	AC_MSG_CHECKING([for md4c headers])
	for i in "$PHP_MD4C" "$prefix" /usr /usr/local; do
		if test -r "$i/include/md4c-html.h"; then
			PHP_MD4C_DIR=$i
			AC_MSG_RESULT([found in $i])
			break
		fi
	done
	if test -z "$PHP_MD4C_DIR"; then
		AC_MSG_RESULT([not found])
		AC_MSG_ERROR([Please install md4c])
	fi

	PHP_ADD_INCLUDE($PHP_MD4C_DIR/include)
	dnl recommended flags for compilation with gcc
	dnl CFLAGS="$CFLAGS -Wall -fno-strict-aliasing"

	export OLD_CPPFLAGS="$CPPFLAGS"
	export CPPFLAGS="$CPPFLAGS $INCLUDES -DHAVE_MD4C"
	AC_CHECK_HEADERS([md4c.h md4c-html.h], [], AC_MSG_ERROR(['md4c.h' header not found]))
	#AC_CHECK_HEADER([md4c-html.h], [], AC_MSG_ERROR(['md4c-html.h' header not found]))
	PHP_SUBST(MD4C_SHARED_LIBADD)

	PHP_ADD_LIBRARY_WITH_PATH(md4c, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
	PHP_ADD_LIBRARY_WITH_PATH(md4c-html, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
	export CPPFLAGS="$OLD_CPPFLAGS"

	PHP_SUBST(MD4C_SHARED_LIBADD)
	AC_DEFINE(HAVE_MD4C, 1, [ ])
	PHP_NEW_EXTENSION(md4c, md4c.c, $ext_shared)
fi

3. Compiling. Run

1phpize
2./configure
3make

Symbols are as follows:

 1$ nm md4c.so
 20000000000002160 r arginfo_md4c_test
 30000000000003d00 d arginfo_md4c_toHtml
 4                 w __cxa_finalize@GLIBC_2.2.5
 500000000000040a0 d __dso_handle
 60000000000003dc0 d _DYNAMIC
 7                 U _emalloc
 8                 U _emalloc_64
 9                 U _estrndup
1000000000000016c8 t _fini
11                 U free@GLIBC_2.2.5
1200000000000016c0 T get_module
130000000000003fe8 d _GLOBAL_OFFSET_TABLE_
14                 w __gmon_start__
1500000000000021c8 r __GNU_EH_FRAME_HDR
160000000000001000 t _init
17                 w _ITM_deregisterTMCloneTable
18                 w _ITM_registerTMCloneTable
190000000000004180 b mbuf
2000000000000040c0 D md4c_module_entry
21                 U md_html
22                 U memcpy@GLIBC_2.14
23                 U php_error_docref
24                 U php_info_print_table_end
25                 U php_info_print_table_row
26                 U php_info_print_table_start
270000000000003d60 d php_md4c_functions
28                 U php_printf
290000000000001640 t process_output
300000000000001234 t process_output.cold
31                 U _safe_malloc
32                 U _safe_realloc
33                 U __stack_chk_fail@GLIBC_2.4
34                 U strlen@GLIBC_2.2.5
350000000000004168 d __TMC_END__
36                 U zend_parse_arg_long_slow
37                 U zend_parse_arg_str_slow
38                 U zend_wrong_parameter_error
39                 U zend_wrong_parameters_count_error
40                 U zend_wrong_parameters_none_error
41. . .
420000000000001380 T zif_md4c_toHtml
4300000000000011cf t zif_md4c_toHtml.cold
440000000000001175 T zm_info_md4c
450000000000001350 T zm_shutdown_md4c
4600000000000016b0 T zm_startup_md4c

4. Installing on Arch Linux. Copy the md4c.so library to /usr/lib/php/modules as root:

1cp modules/md4c.so /usr/lib/php/modules

Finally activate the extension in php.ini:

1extension=md4c

5. Notes on Windows. On Linux we use the installed MD4C library. As noted in Installing Simplified Saaze on Windows 10 #2 it is advisable to amalgamate all MD4C source files into a single file for easier compilation.