{
    "mode": "man",
    "parameter": "perlhacktips",
    "section": "1",
    "url": "https://www.chedong.com/phpMan.php/man/perlhacktips/1/json",
    "generated": "2026-06-15T14:22:09Z",
    "sections": {
        "NAME": {
            "content": "perlhacktips - Tips for Perl core C code hacking\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "This document will help you learn the best way to go about hacking on the Perl core C code.\nIt covers common problems, debugging, profiling, and more.\n\nIf you haven't read perlhack and perlhacktut yet, you might want to do that first.\n",
            "subsections": []
        },
        "COMMON PROBLEMS": {
            "content": "Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.  You don't care about some\nparticular platform having broken Perl? I hear there is still a strong demand for J2EE\nprogrammers.\n",
            "subsections": [
                {
                    "name": "Perl environment problems",
                    "content": "•   Not compiling with threading\n\nCompiling with threading (-Duseithreads) completely rewrites the function prototypes of\nPerl.  You better try your changes with that.  Related to this is the difference between\n\"Perl-less\" and \"Perl-ly\" APIs, for example:\n\nPerlsvsetiv(aTHX ...);\nsvsetiv(...);\n\nThe first one explicitly passes in the context, which is needed for e.g. threaded builds.\nThe second one does that implicitly; do not get them mixed.  If you are not passing in a\naTHX, you will need to do a dTHX as the first thing in the function.\n\nSee \"How multiple interpreters and concurrency are supported\" in perlguts for further\ndiscussion about context.\n\n•   Not compiling with -DDEBUGGING\n\nThe DEBUGGING define exposes more code to the compiler, therefore more ways for things to\ngo wrong.  You should try it.\n\n•   Introducing (non-read-only) globals\n\nDo not introduce any modifiable globals, truly global or file static.  They are bad form\nand complicate multithreading and other forms of concurrency.  The right way is to\nintroduce them as new interpreter variables, see intrpvar.h (at the very end for binary\ncompatibility).\n\nIntroducing read-only (const) globals is okay, as long as you verify with e.g. \"nm\nlibperl.a|egrep -v ' [TURtr] '\" (if your \"nm\" has BSD-style output) that the data you\nadded really is read-only.  (If it is, it shouldn't show up in the output of that\ncommand.)\n\nIf you want to have static strings, make them constant:\n\nstatic const char etc[] = \"...\";\n\nIf you want to have arrays of constant strings, note carefully the right combination of\n\"const\"s:\n\nstatic const char * const yippee[] =\n{\"hi\", \"ho\", \"silver\"};\n\n•   Not exporting your new function\n\nSome platforms (Win32, AIX, VMS, OS/2, to name a few) require any function that is part\nof the public API (the shared Perl library) to be explicitly marked as exported.  See the\ndiscussion about embed.pl in perlguts.\n\n•   Exporting your new function\n\nThe new shiny result of either genuine new functionality or your arduous refactoring is\nnow ready and correctly exported.  So what could possibly go wrong?\n\nMaybe simply that your function did not need to be exported in the first place.  Perl has\na long and not so glorious history of exporting functions that it should not have.\n\nIf the function is used only inside one source code file, make it static.  See the\ndiscussion about embed.pl in perlguts.\n\nIf the function is used across several files, but intended only for Perl's internal use\n(and this should be the common case), do not export it to the public API.  See the\ndiscussion about embed.pl in perlguts.\n"
                },
                {
                    "name": "Portability problems",
                    "content": "The following are common causes of compilation and/or execution failures, not common to Perl\nas such.  The C FAQ is good bedtime reading.  Please test your changes with as many C\ncompilers and platforms as possible; we will, anyway, and it's nice to save oneself from\npublic embarrassment.\n\nIf using gcc, you can add the \"-std=c89\" option which will hopefully catch most of these\nunportabilities.  (However it might also catch incompatibilities in your system's header\nfiles.)\n\nUse the Configure \"-Dgccansipedantic\" flag to enable the gcc \"-ansi -pedantic\" flags which\nenforce stricter ANSI rules.\n\nIf using the \"gcc -Wall\" note that not all the possible warnings (like \"-Wuninitialized\") are\ngiven unless you also compile with \"-O\".\n\nNote that if using gcc, starting from Perl 5.9.5 the Perl core source code files (the ones at\nthe top level of the source code distribution, but not e.g. the extensions under ext/) are\nautomatically compiled with as many as possible of the \"-std=c89\", \"-ansi\", \"-pedantic\", and\na selection of \"-W\" flags (see cflags.SH).\n\nAlso study perlport carefully to avoid any bad assumptions about the operating system,\nfilesystems, character set, and so forth.\n\nYou may once in a while try a \"make microperl\" to see whether we can still compile Perl with\njust the bare minimum of interfaces.  (See README.micro.)\n\nDo not assume an operating system indicates a certain compiler.\n\n•   Casting pointers to integers or casting integers to pointers\n\nvoid castaway(U8* p)\n{\nIV i = p;\n\nor\n\nvoid castaway(U8* p)\n{\nIV i = (IV)p;\n\nBoth are bad, and broken, and unportable.  Use the PTR2IV() macro that does it right.\n(Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and NUM2PTR().)\n\n•   Casting between function pointers and data pointers\n\nTechnically speaking casting between function pointers and data pointers is unportable\nand undefined, but practically speaking it seems to work, but you should use the\nFPTR2DPTR() and DPTR2FPTR() macros.  Sometimes you can also play games with unions.\n\n•   Assuming sizeof(int) == sizeof(long)\n\nThere are platforms where longs are 64 bits, and platforms where ints are 64 bits, and\nwhile we are out to shock you, even platforms where shorts are 64 bits.  This is all\nlegal according to the C standard.  (In other words, \"long long\" is not a portable way to\nspecify 64 bits, and \"long long\" is not even guaranteed to be any wider than \"long\".)\n\nInstead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.  Avoid things like\nI32 because they are not guaranteed to be exactly 32 bits, they are at least 32 bits, nor\nare they guaranteed to be int or long.  If you really explicitly need 64-bit variables,\nuse I64 and U64, but only if guarded by HASQUAD.\n\n•   Assuming one can dereference any type of pointer for any type of data\n\nchar *p = ...;\nlong pony = *(long *)p;    /* BAD */\n\nMany platforms, quite rightly so, will give you a core dump instead of a pony if the p\nhappens not to be correctly aligned.\n\n•   Lvalue casts\n\n(int)*p = ...;    /* BAD */\n\nSimply not portable.  Get your lvalue to be of the right type, or maybe use temporary\nvariables, or dirty tricks with unions.\n\n•   Assume anything about structs (especially the ones you don't control, like the ones\ncoming from the system headers)\n\n•       That a certain field exists in a struct\n\n•       That no other fields exist besides the ones you know of\n\n•       That a field is of certain signedness, sizeof, or type\n\n•       That the fields are in a certain order\n\n•       While C guarantees the ordering specified in the struct definition,\nbetween different platforms the definitions might differ\n\n•       That the sizeof(struct) or the alignments are the same everywhere\n\n•       There might be padding bytes between the fields to align the fields - the\nbytes can be anything\n\n•       Structs are required to be aligned to the maximum alignment required by\nthe fields - which for native types is for usually equivalent to sizeof()\nof the field\n\n•   Assuming the character set is ASCIIish\n\nPerl can compile and run under EBCDIC platforms.  See perlebcdic.  This is transparent\nfor the most part, but because the character sets differ, you shouldn't use numeric\n(decimal, octal, nor hex) constants to refer to characters.  You can safely say 'A', but\nnot 0x41.  You can safely say '\\n', but not \"\\012\".  However, you can use macros defined\nin utf8.h to specify any code point portably.  \"LATIN1TONATIVE(0xDF)\" is going to be\nthe code point that means LATIN SMALL LETTER SHARP S on whatever platform you are running\non (on ASCII platforms it compiles without adding any extra code, so there is zero\nperformance hit on those).  The acceptable inputs to \"LATIN1TONATIVE\" are from 0x00\nthrough 0xFF.  If your input isn't guaranteed to be in that range, use\n\"UNICODETONATIVE\" instead.  \"NATIVETOLATIN1\" and \"NATIVETOUNICODE\" translate the\nopposite direction.\n\nIf you need the string representation of a character that doesn't have a mnemonic name in\nC, you should add it to the list in regen/unicodeconstants.pl, and have Perl create\n\"#define\"'s for you, based on the current platform.\n\nNote that the \"isFOO\" and \"toFOO\" macros in handy.h work properly on native code points\nand strings.\n\nAlso, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case alphabetic\ncharacters.  That is not true in EBCDIC.  Nor for 'a' to 'z'.  But '0' - '9' is an\nunbroken range in both systems.  Don't assume anything about other ranges.  (Note that\nspecial handling of ranges in regular expression patterns and transliterations makes it\nappear to Perl code that the aforementioned ranges are all unbroken.)\n\nMany of the comments in the existing code ignore the possibility of EBCDIC, and may be\nwrong therefore, even if the code works.  This is actually a tribute to the successful\ntransparent insertion of being able to handle EBCDIC without having to change pre-\nexisting code.\n\nUTF-8 and UTF-EBCDIC are two different encodings used to represent Unicode code points as\nsequences of bytes.  Macros  with the same names (but different definitions) in utf8.h\nand utfebcdic.h are used to allow the calling code to think that there is only one such\nencoding.  This is almost always referred to as \"utf8\", but it means the EBCDIC version\nas well.  Again, comments in the code may well be wrong even if the code itself is right.\nFor example, the concept of UTF-8 \"invariant characters\" differs between ASCII and\nEBCDIC.  On ASCII platforms, only characters that do not have the high-order bit set\n(i.e.  whose ordinals are strict ASCII, 0 - 127) are invariant, and the documentation and\ncomments in the code may assume that, often referring to something like, say, \"hibit\".\nThe situation differs and is not so simple on EBCDIC machines, but as long as the code\nitself uses the \"NATIVEISINVARIANT()\" macro appropriately, it works, even if the\ncomments are wrong.\n\nAs noted in \"TESTING\" in perlhack, when writing test scripts, the file t/charsettools.pl\ncontains some helpful functions for writing tests valid on both ASCII and EBCDIC\nplatforms.  Sometimes, though, a test can't use a function and it's inconvenient to have\ndifferent test versions depending on the platform.  There are 20 code points that are the\nsame in all 4 character sets currently recognized by Perl (the 3 EBCDIC code pages plus\nISO 8859-1 (ASCII/Latin1)).  These can be used in such tests, though there is a small\npossibility that Perl will become available in yet another character set, breaking your\ntest.  All but one of these code points are C0 control characters.  The most significant\ncontrols that are the same are \"\\0\", \"\\r\", and \"\\N{VT}\" (also specifiable as \"\\cK\",\n\"\\x0B\", \"\\N{U+0B}\", or \"\\013\").  The single non-control is U+00B6 PILCROW SIGN.  The\ncontrols that are the same have the same bit pattern in all 4 character sets, regardless\nof the UTF8ness of the string containing them.  The bit pattern for U+B6 is the same in\nall 4 for non-UTF8 strings, but differs in each when its containing string is UTF-8\nencoded.  The only other code points that have some sort of sameness across all 4\ncharacter sets are the pair 0xDC and 0xFC.  Together these represent upper- and lowercase\nLATIN LETTER U WITH DIAERESIS, but which is upper and which is lower may be reversed:\n0xDC is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the capital in\nEBCDIC and 0xDC is the small one.  This factoid may be exploited in writing case\ninsensitive tests that are the same across all 4 character sets.\n\n•   Assuming the character set is just ASCII\n\nASCII is a 7 bit encoding, but bytes have 8 bits in them.  The 128 extra characters have\ndifferent meanings depending on the locale.  Absent a locale, currently these extra\ncharacters are generally considered to be unassigned, and this has presented some\nproblems.  This has being changed starting in 5.12 so that these characters can be\nconsidered to be Latin-1 (ISO-8859-1).\n\n•   Mixing #define and #ifdef\n\n#define BURGLE(x) ... \\\n#ifdef BURGLEOLDSTYLE        /* BAD */\n... do it the old way ... \\\n#else\n... do it the new way ... \\\n#endif\n\nYou cannot portably \"stack\" cpp directives.  For example in the above you need two\nseparate BURGLE() #defines, one for each #ifdef branch.\n\n•   Adding non-comment stuff after #endif or #else\n\n#ifdef SNOSH\n...\n#else !SNOSH    /* BAD */\n...\n#endif SNOSH    /* BAD */\n\nThe #endif and #else cannot portably have anything non-comment after them.  If you want\nto document what is going (which is a good idea especially if the branches are long), use\n(C) comments:\n\n#ifdef SNOSH\n...\n#else /* !SNOSH */\n...\n#endif /* SNOSH */\n\nThe gcc option \"-Wendif-labels\" warns about the bad variant (by default on starting from\nPerl 5.9.4).\n\n•   Having a comma after the last element of an enum list\n\nenum color {\nCERULEAN,\nCHARTREUSE,\nCINNABAR,     /* BAD */\n};\n\nis not portable.  Leave out the last comma.\n\nAlso note that whether enums are implicitly morphable to ints varies between compilers,\nyou might need to (int).\n\n•   Using //-comments\n\n// This function bamfoodles the zorklator.   /* BAD */\n\nThat is C99 or C++.  Perl is C89.  Using the //-comments is silently allowed by many C\ncompilers but cranking up the ANSI C89 strictness (which we like to do) causes the\ncompilation to fail.\n\n•   Mixing declarations and code\n\nvoid zorklator()\n{\nint n = 3;\nsetzorkmids(n);    /* BAD */\nint q = 4;\n\nThat is C99 or C++.  Some C compilers allow that, but you shouldn't.\n\nThe gcc option \"-Wdeclaration-after-statement\" scans for such problems (by default on\nstarting from Perl 5.9.4).\n\n•   Introducing variables inside for()\n\nfor(int i = ...; ...; ...) {    /* BAD */\n\nThat is C99 or C++.  While it would indeed be awfully nice to have that also in C89, to\nlimit the scope of the loop variable, alas, we cannot.\n\n•   Mixing signed char pointers with unsigned char pointers\n\nint foo(char *s) { ... }\n...\nunsigned char *t = ...; /* Or U8* t = ... */\nfoo(t);   /* BAD */\n\nWhile this is legal practice, it is certainly dubious, and downright fatal in at least\none platform: for example VMS cc considers this a fatal error.  One cause for people\noften making this mistake is that a \"naked char\" and therefore dereferencing a \"naked\nchar pointer\" have an undefined signedness: it depends on the compiler and the flags of\nthe compiler and the underlying platform whether the result is signed or unsigned.  For\nthis very same reason using a 'char' as an array index is bad.\n\n•   Macros that have string constants and their arguments as substrings of the string\nconstants\n\n#define FOO(n) printf(\"number = %d\\n\", n)    /* BAD */\nFOO(10);\n\nPre-ANSI semantics for that was equivalent to\n\nprintf(\"10umber = %d\\10\");\n\nwhich is probably not what you were expecting.  Unfortunately at least one reasonably\ncommon and modern C compiler does \"real backward compatibility\" here, in AIX that is what\nstill happens even though the rest of the AIX compiler is very happily C89.\n\n•   Using printf formats for non-basic C types\n\nIV i = ...;\nprintf(\"i = %d\\n\", i);    /* BAD */\n\nWhile this might by accident work in some platform (where IV happens to be an \"int\"), in\ngeneral it cannot.  IV might be something larger.  Even worse the situation is with more\nspecific types (defined by Perl's configuration step in config.h):\n\nUidt who = ...;\nprintf(\"who = %d\\n\", who);    /* BAD */\n\nThe problem here is that Uidt might be not only not \"int\"-wide but it might also be\nunsigned, in which case large uids would be printed as negative values.\n\nThere is no simple solution to this because of printf()'s limited intelligence, but for\nmany types the right format is available as with either 'f' or 'f' suffix, for example:\n\nIVdf /* IV in decimal */\nUVxf /* UV is hexadecimal */\n\nprintf(\"i = %\"IVdf\"\\n\", i); /* The IVdf is a string constant. */\n\nUidtf /* Uidt in decimal */\n\nprintf(\"who = %\"Uidtf\"\\n\", who);\n\nOr you can try casting to a \"wide enough\" type:\n\nprintf(\"i = %\"IVdf\"\\n\", (IV)somethingverysmallandsigned);\n\nSee \"Formatted Printing of Sizet and SSizet\" in perlguts for how to print those.\n\nAlso remember that the %p format really does require a void pointer:\n\nU8* p = ...;\nprintf(\"p = %p\\n\", (void*)p);\n\nThe gcc option \"-Wformat\" scans for such problems.\n\n•   Blindly using variadic macros\n\ngcc has had them for a while with its own syntax, and C99 brought them with a\nstandardized syntax.  Don't use the former, and use the latter only if the\nHASC99VARIADICMACROS is defined.\n\n•   Blindly passing valist\n\nNot all platforms support passing valist to further varargs (stdarg) functions.  The\nright thing to do is to copy the valist using the Perlvacopy() if the NEEDVACOPY is\ndefined.\n\n•   Using gcc statement expressions\n\nval = ({...;...;...});    /* BAD */\n\nWhile a nice extension, it's not portable.  The Perl code does admittedly use them if\navailable to gain some extra speed (essentially as a funky form of inlining), but you\nshouldn't.\n\n•   Binding together several statements in a macro\n\nUse the macros STMTSTART and STMTEND.\n\nSTMTSTART {\n...\n} STMTEND\n\n•   Testing for operating systems or versions when should be testing for features\n\n#ifdef FOONIX    /* BAD */\nfoo = quux();\n#endif\n\nUnless you know with 100% certainty that quux() is only ever available for the \"Foonix\"\noperating system and that is available and correctly working for all past, present, and\nfuture versions of \"Foonix\", the above is very wrong.  This is more correct (though still\nnot perfect, because the below is a compile-time check):\n\n#ifdef HASQUUX\nfoo = quux();\n#endif\n\nHow does the HASQUUX become defined where it needs to be?  Well, if Foonix happens to be\nUnixy enough to be able to run the Configure script, and Configure has been taught about\ndetecting and testing quux(), the HASQUUX will be correctly defined.  In other\nplatforms, the corresponding configuration step will hopefully do the same.\n\nIn a pinch, if you cannot wait for Configure to be educated, or if you have a good hunch\nof where quux() might be available, you can temporarily try the following:\n\n#if (defined(FOONIX) || defined(BARNIX))\n# define HASQUUX\n#endif\n\n...\n\n#ifdef HASQUUX\nfoo = quux();\n#endif\n\nBut in any case, try to keep the features and operating systems separate.\n\nA good resource on the predefined macros for various operating systems, compilers, and so\nforth is <http://sourceforge.net/p/predef/wiki/Home/>\n\n•   Assuming the contents of static memory pointed to by the return values of Perl wrappers\nfor C library functions doesn't change.  Many C library functions return pointers to\nstatic storage that can be overwritten by subsequent calls to the same or related\nfunctions.  Perl has light-weight wrappers for some of these functions, and which don't\nmake copies of the static memory.  A good example is the interface to the environment\nvariables that are in effect for the program.  Perl has \"PerlEnvgetenv\" to get values\nfrom the environment.  But the return is a pointer to static memory in the C library.  If\nyou are using the value to immediately test for something, that's fine, but if you save\nthe value and expect it to be unchanged by later processing, you would be wrong, but\nperhaps you wouldn't know it because different C library implementations behave\ndifferently, and the one on the platform you're testing on might work for your situation.\nBut on some platforms, a subsequent call to \"PerlEnvgetenv\" or related function WILL\noverwrite the memory that your first call points to.  This has led to some hard-to-debug\nproblems.  Do a \"savepv\" in perlapi to make a copy, thus avoiding these problems.  You\nwill have to free the copy when you're done to avoid memory leaks.  If you don't have\ncontrol over when it gets freed, you'll need to make the copy in a mortal scalar, like\nso:\n\nif ((s = PerlEnvgetenv(\"foo\") == NULL) {\n... /* handle NULL case */\n}\nelse {\ns = SvPVX(sv2mortal(newSVpv(s, 0)));\n}\n\nThe above example works only if \"s\" is \"NUL\"-terminated; otherwise you have to pass its\nlength to \"newSVpv\".\n"
                },
                {
                    "name": "Problematic System Interfaces",
                    "content": "•   Perl strings are NOT the same as C strings:  They may contain \"NUL\" characters, whereas a\nC string is terminated by the first \"NUL\".  That is why Perl API functions that deal with\nstrings generally take a pointer to the first byte and either a length or a pointer to\nthe byte just beyond the final one.\n\nAnd this is the reason that many of the C library string handling functions should not be\nused.  They don't cope with the full generality of Perl strings.  It may be that your\ntest cases don't have embedded \"NUL\"s, and so the tests pass, whereas there may well\neventually arise real-world cases where they fail.  A lesson here is to include \"NUL\"s in\nyour tests.  Now it's fairly rare in most real world cases to get \"NUL\"s, so your code\nmay seem to work, until one day a \"NUL\" comes along.\n\nHere's an example.  It used to be a common paradigm, for decades, in the perl core to use\n\"strchr(\"list\", c)\" to see if the character \"c\" is any of the ones given in \"list\", a\ndouble-quote-enclosed string of the set of characters that we are seeing if \"c\" is one\nof.  As long as \"c\" isn't a \"NUL\", it works.  But when \"c\" is a \"NUL\", \"strchr\" returns a\npointer to the terminating \"NUL\" in \"list\".   This likely will result in a segfault or a\nsecurity issue when the caller uses that end pointer as the starting point to read from.\n\nA solution to this and many similar issues is to use the \"mem\"-foo C library functions\ninstead.  In this case \"memchr\" can be used to see if \"c\" is in \"list\" and works even if\n\"c\" is \"NUL\".  These functions need an additional parameter to give the string length.\nIn the case of literal string parameters, perl has defined macros that calculate the\nlength for you.  See \"String Handling\" in perlapi.\n\n•   malloc(0), realloc(0), calloc(0, 0) are non-portable.  To be portable allocate at least\none byte.  (In general you should rarely need to work at this low level, but instead use\nthe various malloc wrappers.)\n\n•   snprintf() - the return type is unportable.  Use mysnprintf() instead.\n"
                },
                {
                    "name": "Security problems",
                    "content": "Last but not least, here are various tips for safer coding.  See also perlclib for libc/stdio\nreplacements one should use.\n\n•   Do not use gets()\n\nOr we will publicly ridicule you.  Seriously.\n\n•   Do not use tmpfile()\n\nUse mkstemp() instead.\n\n•   Do not use strcpy() or strcat() or strncpy() or strncat()\n\nUse mystrlcpy() and mystrlcat() instead: they either use the native implementation, or\nPerl's own implementation (borrowed from the public domain implementation of INN).\n\n•   Do not use sprintf() or vsprintf()\n\nIf you really want just plain byte strings, use mysnprintf() and myvsnprintf() instead,\nwhich will try to use snprintf() and vsnprintf() if those safer APIs are available.  If\nyou want something fancier than a plain byte string, use \"Perlform\"() or SVs and\n\"Perlsvcatpvf()\".\n\nNote that glibc \"printf()\", \"sprintf()\", etc. are buggy before glibc version 2.17.  They\nwon't allow a \"%.s\" format with a precision to create a string that isn't valid UTF-8 if\nthe current underlying locale of the program is UTF-8.  What happens is that the %s and\nits operand are simply skipped without any notice.\n<https://sourceware.org/bugzilla/showbug.cgi?id=6530>.\n\n•   Do not use atoi()\n\nUse grokatoUV() instead.  atoi() has ill-defined behavior on overflows, and cannot be\nused for incremental parsing.  It is also affected by locale, which is bad.\n\n•   Do not use strtol() or strtoul()\n\nUse grokatoUV() instead.  strtol() or strtoul() (or their IV/UV-friendly macro\ndisguises, Strtol() and Strtoul(), or Atol() and Atoul() are affected by locale, which is\nbad.\n"
                }
            ]
        },
        "DEBUGGING": {
            "content": "You can compile a special debugging version of Perl, which allows you to use the \"-D\" option\nof Perl to tell more about what Perl is doing.  But sometimes there is no alternative than to\ndive in with a debugger, either to see the stack trace of a core dump (very useful in a bug\nreport), or trying to figure out what went wrong before the core dump happened, or how did we\nend up having wrong or unexpected results.\n",
            "subsections": [
                {
                    "name": "Poking at Perl",
                    "content": "To really poke around with Perl, you'll probably want to build Perl for debugging, like this:\n\n./Configure -d -DDEBUGGING\nmake\n\n\"-DDEBUGGING\" turns on the C compiler's \"-g\" flag to have it produce debugging information\nwhich will allow us to step through a running program, and to see in which C function we are\nat (without the debugging information we might see only the numerical addresses of the\nfunctions, which is not very helpful). It will also turn on the \"DEBUGGING\" compilation\nsymbol which enables all the internal debugging code in Perl.  There are a whole bunch of\nthings you can debug with this: perlrun lists them all, and the best way to find out about\nthem is to play about with them.  The most useful options are probably\n\nl  Context (loop) stack processing\ns  Stack snapshots (with v, displays all stacks)\nt  Trace execution\no  Method and overloading resolution\nc  String/numeric conversions\n\nFor example\n\n$ perl -Dst -e '$a + 1'\n....\n(-e:1)      gvsv(main::a)\n=>  UNDEF\n(-e:1)      const(IV(1))\n=>  UNDEF  IV(1)\n(-e:1)      add\n=>  NV(1)\n\nSome of the functionality of the debugging code can be achieved with a non-debugging perl by\nusing XS modules:\n\n-Dr => use re 'debug'\n-Dx => use O 'Debug'\n"
                },
                {
                    "name": "Using a source-level debugger",
                    "content": "If the debugging output of \"-D\" doesn't help you, it's time to step through perl's execution\nwith a source-level debugger.\n\n•  We'll use \"gdb\" for our examples here; the principles will apply to any debugger (many\nvendors call their debugger \"dbx\"), but check the manual of the one you're using.\n\nTo fire up the debugger, type\n\ngdb ./perl\n\nOr if you have a core dump:\n\ngdb ./perl core\n\nYou'll want to do that in your Perl source tree so the debugger can read the source code.\nYou should see the copyright message, followed by the prompt.\n\n(gdb)\n\n\"help\" will get you into the documentation, but here are the most useful commands:\n\n•  run [args]\n\nRun the program with the given arguments.\n\n•  break functionname\n\n•  break source.c:xxx\n\nTells the debugger that we'll want to pause execution when we reach either the named\nfunction (but see \"Internal Functions\" in perlguts!) or the given line in the named source\nfile.\n\n•  step\n\nSteps through the program a line at a time.\n\n•  next\n\nSteps through the program a line at a time, without descending into functions.\n\n•  continue\n\nRun until the next breakpoint.\n\n•  finish\n\nRun until the end of the current function, then stop again.\n\n•  'enter'\n\nJust pressing Enter will do the most recent operation again - it's a blessing when\nstepping through miles of source code.\n\n•  ptype\n\nPrints the C definition of the argument given.\n\n(gdb) ptype PLop\ntype = struct op {\nOP *opnext;\nOP *opsibparent;\nOP *(*opppaddr)(void);\nPADOFFSET optarg;\nunsigned int optype : 9;\nunsigned int opopt : 1;\nunsigned int opslabbed : 1;\nunsigned int opsavefree : 1;\nunsigned int opstatic : 1;\nunsigned int opfolded : 1;\nunsigned int opspare : 2;\nU8 opflags;\nU8 opprivate;\n} *\n\n•  print\n\nExecute the given C code and print its results.  WARNING: Perl makes heavy use of macros,\nand gdb does not necessarily support macros (see later \"gdb macro support\").  You'll have\nto substitute them yourself, or to invoke cpp on the source code files (see \"The .i\nTargets\") So, for instance, you can't say\n\nprint SvPVnolen(sv)\n\nbut you have to say\n\nprint Perlsv2pvnolen(sv)\n\nYou may find it helpful to have a \"macro dictionary\", which you can produce by saying \"cpp\n-dM perl.c | sort\".  Even then, cpp won't recursively apply those macros for you.\n"
                },
                {
                    "name": "gdb macro support",
                    "content": "Recent versions of gdb have fairly good macro support, but in order to use it you'll need to\ncompile perl with macro definitions included in the debugging information.  Using gcc version\n3.1, this means configuring with \"-Doptimize=-g3\".  Other compilers might use a different\nswitch (if they support debugging macros at all).\n"
                },
                {
                    "name": "Dumping Perl Data Structures",
                    "content": "One way to get around this macro hell is to use the dumping functions in dump.c; these work a\nlittle like an internal Devel::Peek, but they also cover OPs and other structures that you\ncan't get at from Perl.  Let's take an example.  We'll use the \"$a = $b + $c\" we used before,\nbut give it a bit of context: \"$b = \"6XXXX\"; $c = 2.3;\".  Where's a good place to stop and\npoke around?\n\nWhat about \"ppadd\", the function we examined earlier to implement the \"+\" operator:\n\n(gdb) break Perlppadd\nBreakpoint 1 at 0x46249f: file pphot.c, line 309.\n\nNotice we use \"Perlppadd\" and not \"ppadd\" - see \"Internal Functions\" in perlguts.  With\nthe breakpoint in place, we can run our program:\n\n(gdb) run -e '$b = \"6XXXX\"; $c = 2.3; $a = $b + $c'\n\nLots of junk will go past as gdb reads in the relevant source files and libraries, and then:\n\nBreakpoint 1, Perlppadd () at pphot.c:309\n1396    dSP; dATARGET; bool useleft; SV *svl, *svr;\n(gdb) step\n311           dPOPTOPnnrlul;\n(gdb)\n\nWe looked at this bit of code before, and we said that \"dPOPTOPnnrlul\" arranges for two\n\"NV\"s to be placed into \"left\" and \"right\" - let's slightly expand it:\n\n#define dPOPTOPnnrlul  NV right = POPn; \\\nSV *leftsv = TOPs; \\\nNV left = USELEFT(leftsv) ? SvNV(leftsv) : 0.0\n\n\"POPn\" takes the SV from the top of the stack and obtains its NV either directly (if \"SvNOK\"\nis set) or by calling the \"sv2nv\" function.  \"TOPs\" takes the next SV from the top of the\nstack - yes, \"POPn\" uses \"TOPs\" - but doesn't remove it.  We then use \"SvNV\" to get the NV\nfrom \"leftsv\" in the same way as before - yes, \"POPn\" uses \"SvNV\".\n\nSince we don't have an NV for $b, we'll have to use \"sv2nv\" to convert it.  If we step\nagain, we'll find ourselves there:\n\n(gdb) step\nPerlsv2nv (sv=0xa0675d0) at sv.c:1669\n1669        if (!sv)\n(gdb)\n\nWe can now use \"Perlsvdump\" to investigate the SV:\n\n(gdb) print Perlsvdump(sv)\nSV = PV(0xa057cc0) at 0xa0675d0\nREFCNT = 1\nFLAGS = (POK,pPOK)\nPV = 0xa06a510 \"6XXXX\"\\0\nCUR = 5\nLEN = 6\n$1 = void\n\nWe know we're going to get 6 from this, so let's finish the subroutine:\n\n(gdb) finish\nRun till exit from #0  Perlsv2nv (sv=0xa0675d0) at sv.c:1671\n0x462669 in Perlppadd () at pphot.c:311\n311           dPOPTOPnnrlul;\n\nWe can also dump out this op: the current op is always stored in \"PLop\", and we can dump it\nwith \"Perlopdump\".  This'll give us similar output to CPAN module B::Debug.\n\n(gdb) print Perlopdump(PLop)\n{\n13  TYPE = add  ===> 14\nTARG = 1\nFLAGS = (SCALAR,KIDS)\n{\nTYPE = null  ===> (12)\n(was rv2sv)\nFLAGS = (SCALAR,KIDS)\n{\n11          TYPE = gvsv  ===> 12\nFLAGS = (SCALAR)\nGV = main::b\n}\n}\n\n# finish this later #\n"
                },
                {
                    "name": "Using gdb to look at specific parts of a program",
                    "content": "With the example above, you knew to look for \"Perlppadd\", but what if there were multiple\ncalls to it all over the place, or you didn't know what the op was you were looking for?\n\nOne way to do this is to inject a rare call somewhere near what you're looking for.  For\nexample, you could add \"study\" before your method:\n\nstudy;\n\nAnd in gdb do:\n\n(gdb) break Perlppstudy\n\nAnd then step until you hit what you're looking for.  This works well in a loop if you want\nto only break at certain iterations:\n\nfor my $c (1..100) {\nstudy if $c == 50;\n}\n"
                },
                {
                    "name": "Using gdb to look at what the parser/lexer are doing",
                    "content": "If you want to see what perl is doing when parsing/lexing your code, you can use \"BEGIN {}\":\n\nprint \"Before\\n\";\nBEGIN { study; }\nprint \"After\\n\";\n\nAnd in gdb:\n\n(gdb) break Perlppstudy\n\nIf you want to see what the parser/lexer is doing inside of \"if\" blocks and the like you need\nto be a little trickier:\n\nif ($a && $b && do { BEGIN { study } 1 } && $c) { ... }\n"
                }
            ]
        },
        "SOURCE CODE STATIC ANALYSIS": {
            "content": "Various tools exist for analysing C source code statically, as opposed to dynamically, that\nis, without executing the code.  It is possible to detect resource leaks, undefined\nbehaviour, type mismatches, portability problems, code paths that would cause illegal memory\naccesses, and other similar problems by just parsing the C code and looking at the resulting\ngraph, what does it tell about the execution and data flows.  As a matter of fact, this is\nexactly how C compilers know to give warnings about dubious code.\n",
            "subsections": [
                {
                    "name": "lint",
                    "content": "The good old C code quality inspector, \"lint\", is available in several platforms, but please\nbe aware that there are several different implementations of it by different vendors, which\nmeans that the flags are not identical across different platforms.\n\nThere is a \"lint\" target in Makefile, but you may have to diddle with the flags (see above).\n"
                },
                {
                    "name": "Coverity",
                    "content": "Coverity (<http://www.coverity.com/>) is a product similar to lint and as a testbed for their\nproduct they periodically check several open source projects, and they give out accounts to\nopen source developers to the defect databases.\n\nThere is Coverity setup for the perl5 project: <https://scan.coverity.com/projects/perl5>\n"
                },
                {
                    "name": "HP-UX cadvise (Code Advisor)",
                    "content": "HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.  (Link not given here\nbecause the URL is horribly long and seems horribly unstable; use the search engine of your\nchoice to find it.)  The use of the \"cadvisecc\" recipe with \"Configure ...\n-Dcc=./cadvisecc\" (see cadvise \"User Guide\") is recommended; as is the use of \"+wall\".\n"
                },
                {
                    "name": "cpd (cut-and-paste detector)",
                    "content": "The cpd tool detects cut-and-paste coding.  If one instance of the cut-and-pasted code\nchanges, all the other spots should probably be changed, too.  Therefore such code should\nprobably be turned into a subroutine or a macro.\n\ncpd (<http://pmd.sourceforge.net/cpd.html>) is part of the pmd project\n(<http://pmd.sourceforge.net/>).  pmd was originally written for static analysis of Java\ncode, but later the cpd part of it was extended to parse also C and C++.\n\nDownload the pmd-bin-X.Y.zip () from the SourceForge site, extract the pmd-X.Y.jar from it,\nand then run that on source code thusly:\n\njava -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \\\n--minimum-tokens 100 --files /some/where/src --language c > cpd.txt\n\nYou may run into memory limits, in which case you should use the -Xmx option:\n\njava -Xmx512M ...\n"
                },
                {
                    "name": "gcc warnings",
                    "content": "Though much can be written about the inconsistency and coverage problems of gcc warnings\n(like \"-Wall\" not meaning \"all the warnings\", or some common portability problems not being\ncovered by \"-Wall\", or \"-ansi\" and \"-pedantic\" both being a poorly defined collection of\nwarnings, and so forth), gcc is still a useful tool in keeping our coding nose clean.\n\nThe \"-Wall\" is by default on.\n\nThe \"-ansi\" (and its sidekick, \"-pedantic\") would be nice to be on always, but unfortunately\nthey are not safe on all platforms, they can for example cause fatal conflicts with the\nsystem headers (Solaris being a prime example).  If Configure \"-Dgccansipedantic\" is used,\nthe \"cflags\" frontend selects \"-ansi -pedantic\" for the platforms where they are known to be\nsafe.\n\nThe following extra flags are added:\n\n•   \"-Wendif-labels\"\n\n•   \"-Wextra\"\n\n•   \"-Wc++-compat\"\n\n•   \"-Wwrite-strings\"\n\n•   \"-Werror=declaration-after-statement\"\n\n•   \"-Werror=pointer-arith\"\n\nThe following flags would be nice to have but they would first need their own Augean\nstablemaster:\n\n•   \"-Wshadow\"\n\n•   \"-Wstrict-prototypes\"\n\nThe \"-Wtraditional\" is another example of the annoying tendency of gcc to bundle a lot of\nwarnings under one switch (it would be impossible to deploy in practice because it would\ncomplain a lot) but it does contain some warnings that would be beneficial to have available\non their own, such as the warning about string constants inside macros containing the macro\narguments: this behaved differently pre-ANSI than it does in ANSI, and some C compilers are\nstill in transition, AIX being an example.\n"
                },
                {
                    "name": "Warnings of other C compilers",
                    "content": "Other C compilers (yes, there are other C compilers than gcc) often have their \"strict ANSI\"\nor \"strict ANSI with some portability extensions\" modes on, like for example the Sun Workshop\nhas its \"-Xa\" mode on (though implicitly), or the DEC (these days, HP...) has its \"-std1\"\nmode on.\n"
                }
            ]
        },
        "MEMORY DEBUGGERS": {
            "content": "NOTE 1: Running under older memory debuggers such as Purify, valgrind or Third Degree greatly\nslows down the execution: seconds become minutes, minutes become hours.  For example as of\nPerl 5.8.1, the ext/Encode/t/Unicode.t takes extraordinarily long to complete under e.g.\nPurify, Third Degree, and valgrind.  Under valgrind it takes more than six hours, even on a\nsnappy computer.  The said test must be doing something that is quite unfriendly for memory\ndebuggers.  If you don't feel like waiting, that you can simply kill away the perl process.\nRoughly valgrind slows down execution by factor 10, AddressSanitizer by factor 2.\n\nNOTE 2: To minimize the number of memory leak false alarms (see \"PERLDESTRUCTLEVEL\" for\nmore information), you have to set the environment variable PERLDESTRUCTLEVEL to 2.  For\nexample, like this:\n\nenv PERLDESTRUCTLEVEL=2 valgrind ./perl -Ilib ...\n\nNOTE 3: There are known memory leaks when there are compile-time errors within eval or\nrequire, seeing \"Sdoeval\" in the call stack is a good sign of these.  Fixing these leaks is\nnon-trivial, unfortunately, but they must be fixed eventually.\n\nNOTE 4: DynaLoader will not clean up after itself completely unless Perl is built with the\nConfigure option \"-Accflags=-DDLUNLOADALLATEXIT\".\n",
            "subsections": [
                {
                    "name": "valgrind",
                    "content": "The valgrind tool can be used to find out both memory leaks and illegal heap memory accesses.\nAs of version 3.3.0, Valgrind only supports Linux on x86, x86-64 and PowerPC and Darwin (OS\nX) on x86 and x86-64.  The special \"test.valgrind\" target can be used to run the tests under\nvalgrind.  Found errors and memory leaks are logged in files named testfile.valgrind and by\ndefault output is displayed inline.\n\nExample usage:\n\nmake test.valgrind\n\nSince valgrind adds significant overhead, tests will take much longer to run.  The valgrind\ntests support being run in parallel to help with this:\n\nTESTJOBS=9 make test.valgrind\n\nNote that the above two invocations will be very verbose as reachable memory and leak-\nchecking is enabled by default.  If you want to just see pure errors, try:\n\nVGOPTS='-q --leak-check=no --show-reachable=no' TESTJOBS=9 \\\nmake test.valgrind\n\nValgrind also provides a cachegrind tool, invoked on perl as:\n\nVGOPTS=--tool=cachegrind make test.valgrind\n\nAs system libraries (most notably glibc) are also triggering errors, valgrind allows to\nsuppress such errors using suppression files.  The default suppression file that comes with\nvalgrind already catches a lot of them.  Some additional suppressions are defined in\nt/perl.supp.\n\nTo get valgrind and for more information see\n\nhttp://valgrind.org/\n"
                },
                {
                    "name": "AddressSanitizer",
                    "content": "AddressSanitizer (\"ASan\") consists of a compiler instrumentation module and a run-time\n\"malloc\" library. ASan is available for a variety of architectures, operating systems, and\ncompilers (see project link below).  It checks for unsafe memory usage, such as use after\nfree and buffer overflow conditions, and is fast enough that you can easily compile your\ndebugging or optimized perl with it. Modern versions of ASan check for memory leaks by\ndefault on most platforms, otherwise (e.g. x8664 OS X) this feature can be enabled via\n\"ASANOPTIONS=detectleaks=1\".\n\nTo build perl with AddressSanitizer, your Configure invocation should look like:\n\nsh Configure -des -Dcc=clang \\\n-Accflags=-fsanitize=address -Aldflags=-fsanitize=address \\\n-Alddlflags=-shared\\ -fsanitize=address \\\n-fsanitize-blacklist=`pwd`/asanignore\n\nwhere these arguments mean:\n\n•   -Dcc=clang\n\nThis should be replaced by the full path to your clang executable if it is not in your\npath.\n\n•   -Accflags=-fsanitize=address\n\nCompile perl and extensions sources with AddressSanitizer.\n\n•   -Aldflags=-fsanitize=address\n\nLink the perl executable with AddressSanitizer.\n\n•   -Alddlflags=-shared\\ -fsanitize=address\n\nLink dynamic extensions with AddressSanitizer.  You must manually specify \"-shared\"\nbecause using \"-Alddlflags=-shared\" will prevent Configure from setting a default value\nfor \"lddlflags\", which usually contains \"-shared\" (at least on Linux).\n\n•   -fsanitize-blacklist=`pwd`/asanignore\n\nAddressSanitizer will ignore functions listed in the \"asanignore\" file. (This file\nshould contain a short explanation of why each of the functions is listed.)\n\nSee also <https://github.com/google/sanitizers/wiki/AddressSanitizer>.\n"
                }
            ]
        },
        "PROFILING": {
            "content": "Depending on your platform there are various ways of profiling Perl.\n\nThere are two commonly used techniques of profiling executables: statistical time-sampling\nand basic-block counting.\n\nThe first method takes periodically samples of the CPU program counter, and since the program\ncounter can be correlated with the code generated for functions, we get a statistical view of\nin which functions the program is spending its time.  The caveats are that very small/fast\nfunctions have lower probability of showing up in the profile, and that periodically\ninterrupting the program (this is usually done rather frequently, in the scale of\nmilliseconds) imposes an additional overhead that may skew the results.  The first problem\ncan be alleviated by running the code for longer (in general this is a good idea for\nprofiling), the second problem is usually kept in guard by the profiling tools themselves.\n\nThe second method divides up the generated code into basic blocks.  Basic blocks are sections\nof code that are entered only in the beginning and exited only at the end.  For example, a\nconditional jump starts a basic block.  Basic block profiling usually works by instrumenting\nthe code by adding enter basic block #nnnn book-keeping code to the generated code.  During\nthe execution of the code the basic block counters are then updated appropriately.  The\ncaveat is that the added extra code can skew the results: again, the profiling tools usually\ntry to factor their own effects out of the results.\n",
            "subsections": [
                {
                    "name": "Gprof Profiling",
                    "content": "gprof is a profiling tool available in many Unix platforms which uses statistical time-\nsampling.  You can build a profiled version of perl by compiling using gcc with the flag\n\"-pg\".  Either edit config.sh or re-run Configure.  Running the profiled version of Perl will\ncreate an output file called gmon.out which contains the profiling data collected during the\nexecution.\n\nquick hint:\n\n$ sh Configure -des -Dusedevel -Accflags='-pg' \\\n-Aldflags='-pg' -Alddlflags='-pg -shared' \\\n&& make perl\n$ ./perl ... # creates gmon.out in current directory\n$ gprof ./perl > out\n$ less out\n\n(you probably need to add \"-shared\" to the <-Alddlflags> line until RT #118199 is resolved)\n\nThe gprof tool can then display the collected data in various ways.  Usually gprof\nunderstands the following options:\n\n•   -a\n\nSuppress statically defined functions from the profile.\n\n•   -b\n\nSuppress the verbose descriptions in the profile.\n\n•   -e routine\n\nExclude the given routine and its descendants from the profile.\n\n•   -f routine\n\nDisplay only the given routine and its descendants in the profile.\n\n•   -s\n\nGenerate a summary file called gmon.sum which then may be given to subsequent gprof runs\nto accumulate data over several runs.\n\n•   -z\n\nDisplay routines that have zero usage.\n\nFor more detailed explanation of the available commands and output formats, see your own\nlocal documentation of gprof.\n"
                },
                {
                    "name": "GCC gcov Profiling",
                    "content": "basic block profiling is officially available in gcc 3.0 and later.  You can build a profiled\nversion of perl by compiling using gcc with the flags \"-fprofile-arcs -ftest-coverage\".\nEither edit config.sh or re-run Configure.\n\nquick hint:\n\n$ sh Configure -des -Dusedevel -Doptimize='-g' \\\n-Accflags='-fprofile-arcs -ftest-coverage' \\\n-Aldflags='-fprofile-arcs -ftest-coverage' \\\n-Alddlflags='-fprofile-arcs -ftest-coverage -shared' \\\n&& make perl\n$ rm -f regexec.c.gcov regexec.gcda\n$ ./perl ...\n$ gcov regexec.c\n$ less regexec.c.gcov\n\n(you probably need to add \"-shared\" to the <-Alddlflags> line until RT #118199 is resolved)\n\nRunning the profiled version of Perl will cause profile output to be generated.  For each\nsource file an accompanying .gcda file will be created.\n\nTo display the results you use the gcov utility (which should be installed if you have gcc\n3.0 or newer installed).  gcov is run on source code files, like this\n\ngcov sv.c\n\nwhich will cause sv.c.gcov to be created.  The .gcov files contain the source code annotated\nwith relative frequencies of execution indicated by \"#\" markers.  If you want to generate\n.gcov files for all profiled object files, you can run something like this:\n\nfor file in `find . -name \\*.gcno`\ndo sh -c \"cd `dirname $file` && gcov `basename $file .gcno`\"\ndone\n\nUseful options of gcov include \"-b\" which will summarise the basic block, branch, and\nfunction call coverage, and \"-c\" which instead of relative frequencies will use the actual\ncounts.  For more information on the use of gcov and basic block profiling with gcc, see the\nlatest GNU CC manual.  As of gcc 4.8, this is at\n<http://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>\n"
                }
            ]
        },
        "MISCELLANEOUS TRICKS": {
            "content": "PERLDESTRUCTLEVEL\nIf you want to run any of the tests yourself manually using e.g.  valgrind, please note that\nby default perl does not explicitly cleanup all the memory it has allocated (such as global\nmemory arenas) but instead lets the exit() of the whole program \"take care\" of such\nallocations, also known as \"global destruction of objects\".\n\nThere is a way to tell perl to do complete cleanup: set the environment variable\nPERLDESTRUCTLEVEL to a non-zero value.  The t/TEST wrapper does set this to 2, and this is\nwhat you need to do too, if you don't want to see the \"global leaks\": For example, for\nrunning under valgrind\n\nenv PERLDESTRUCTLEVEL=2 valgrind ./perl -Ilib t/foo/bar.t\n\n(Note: the modperl apache module uses also this environment variable for its own purposes\nand extended its semantics.  Refer to the modperl documentation for more information.  Also,\nspawned threads do the equivalent of setting this variable to the value 1.)\n\nIf, at the end of a run you get the message N scalars leaked, you can recompile with\n\"-DDEBUGLEAKINGSCALARS\", (\"Configure -Accflags=-DDEBUGLEAKINGSCALARS\"), which will cause\nthe addresses of all those leaked SVs to be dumped along with details as to where each SV was\noriginally allocated.  This information is also displayed by Devel::Peek.  Note that the\nextra details recorded with each SV increases memory usage, so it shouldn't be used in\nproduction environments.  It also converts \"newSV()\" from a macro into a real function, so\nyou can use your favourite debugger to discover where those pesky SVs were allocated.\n\nIf you see that you're leaking memory at runtime, but neither valgrind nor\n\"-DDEBUGLEAKINGSCALARS\" will find anything, you're probably leaking SVs that are still\nreachable and will be properly cleaned up during destruction of the interpreter.  In such\ncases, using the \"-Dm\" switch can point you to the source of the leak.  If the executable was\nbuilt with \"-DDEBUGLEAKINGSCALARS\", \"-Dm\" will output SV allocations in addition to memory\nallocations.  Each SV allocation has a distinct serial number that will be written on\ncreation and destruction of the SV.  So if you're executing the leaking code in a loop, you\nneed to look for SVs that are created, but never destroyed between each cycle.  If such an SV\nis found, set a conditional breakpoint within \"newSV()\" and make it break only when\n\"PLsvserial\" is equal to the serial number of the leaking SV.  Then you will catch the\ninterpreter in exactly the state where the leaking SV is allocated, which is sufficient in\nmany cases to find the source of the leak.\n\nAs \"-Dm\" is using the PerlIO layer for output, it will by itself allocate quite a bunch of\nSVs, which are hidden to avoid recursion.  You can bypass the PerlIO layer if you use the SV\nlogging provided by \"-DPERLMEMLOG\" instead.\n\nPERLMEMLOG\nIf compiled with \"-DPERLMEMLOG\" (\"-Accflags=-DPERLMEMLOG\"), both memory and SV\nallocations go through logging functions, which is handy for breakpoint setting.\n\nUnless \"-DPERLMEMLOGNOIMPL\" (\"-Accflags=-DPERLMEMLOGNOIMPL\") is also compiled, the\nlogging functions read $ENV{PERLMEMLOG} to determine whether to log the event, and if so\nhow:\n\n$ENV{PERLMEMLOG} =~ /m/           Log all memory ops\n$ENV{PERLMEMLOG} =~ /s/           Log all SV ops\n$ENV{PERLMEMLOG} =~ /t/           include timestamp in Log\n$ENV{PERLMEMLOG} =~ /^(\\d+)/      write to FD given (default is 2)\n\nMemory logging is somewhat similar to \"-Dm\" but is independent of \"-DDEBUGGING\", and at a\nhigher level; all uses of Newx(), Renew(), and Safefree() are logged with the caller's source\ncode file and line number (and C function name, if supported by the C compiler).  In\ncontrast, \"-Dm\" is directly at the point of \"malloc()\".  SV logging is similar.\n\nSince the logging doesn't use PerlIO, all SV allocations are logged and no extra SV\nallocations are introduced by enabling the logging.  If compiled with\n\"-DDEBUGLEAKINGSCALARS\", the serial number for each SV allocation is also logged.\n",
            "subsections": [
                {
                    "name": "DDD over gdb",
                    "content": "Those debugging perl with the DDD frontend over gdb may find the following useful:\n\nYou can extend the data conversion shortcuts menu, so for example you can display an SV's IV\nvalue with one click, without doing any typing.  To do that simply edit ~/.ddd/init file and\nadd after:\n\n! Display shortcuts.\nDdd*gdbDisplayShortcuts: \\\n/t ()   // Convert to Bin\\n\\\n/d ()   // Convert to Dec\\n\\\n/x ()   // Convert to Hex\\n\\\n/o ()   // Convert to Oct(\\n\\\n\nthe following two lines:\n\n((XPV*) (())->svany )->xpvpv  // 2pvx\\n\\\n((XPVIV*) (())->svany )->xiviv // 2ivx\n\nso now you can do ivx and pvx lookups or you can plug there the svpeek \"conversion\":\n\nPerlsvpeek(myperl, (SV*)()) // svpeek\n\n(The myperl is for threaded builds.)  Just remember that every line, but the last one,\nshould end with \\n\\\n\nAlternatively edit the init file interactively via: 3rd mouse button -> New Display -> Edit\nMenu\n\nNote: you can define up to 20 conversion shortcuts in the gdb section.\n"
                },
                {
                    "name": "C backtrace",
                    "content": "On some platforms Perl supports retrieving the C level backtrace (similar to what symbolic\ndebuggers like gdb do).\n\nThe backtrace returns the stack trace of the C call frames, with the symbol names (function\nnames), the object names (like \"perl\"), and if it can, also the source code locations\n(file:line).\n\nThe supported platforms are Linux, and OS X (some *BSD might work at least partly, but they\nhave not yet been tested).\n\nThis feature hasn't been tested with multiple threads, but it will only show the backtrace of\nthe thread doing the backtracing.\n\nThe feature needs to be enabled with \"Configure -Dusecbacktrace\".\n\nThe \"-Dusecbacktrace\" also enables keeping the debug information when compiling/linking\n(often: \"-g\").  Many compilers/linkers do support having both optimization and keeping the\ndebug information.  The debug information is needed for the symbol names and the source\nlocations.\n\nStatic functions might not be visible for the backtrace.\n\nSource code locations, even if available, can often be missing or misleading if the compiler\nhas e.g. inlined code.  Optimizer can make matching the source code and the object code quite\nchallenging.\n\nLinux\nYou must have the BFD (-lbfd) library installed, otherwise \"perl\" will fail to link.  The\nBFD is usually distributed as part of the GNU binutils.\n\nSummary: \"Configure ... -Dusecbacktrace\" and you need \"-lbfd\".\n\nOS X\nThe source code locations are supported only if you have the Developer Tools installed.\n(BFD is not needed.)\n\nSummary: \"Configure ... -Dusecbacktrace\" and installing the Developer Tools would be\ngood.\n\nOptionally, for trying out the feature, you may want to enable automatic dumping of the\nbacktrace just before a warning or croak (die) message is emitted, by adding\n\"-Accflags=-DUSECBACKTRACEONERROR\" for Configure.\n\nUnless the above additional feature is enabled, nothing about the backtrace functionality is\nvisible, except for the Perl/XS level.\n\nFurthermore, even if you have enabled this feature to be compiled, you need to enable it in\nruntime with an environment variable: \"PERLCBACKTRACEONERROR=10\".  It must be an integer\nhigher than zero, telling the desired frame count.\n\nRetrieving the backtrace from Perl level (using for example an XS extension) would be much\nless exciting than one would hope: normally you would see \"runops\", \"entersub\", and not much\nelse.  This API is intended to be called from within the Perl implementation, not from Perl\nlevel execution.\n\nThe C API for the backtrace is as follows:\n\ngetcbacktrace\nfreecbacktrace\ngetcbacktracedump\ndumpcbacktrace\n"
                },
                {
                    "name": "Poison",
                    "content": "If you see in a debugger a memory area mysteriously full of 0xABABABAB or 0xEFEFEFEF, you may\nbe seeing the effect of the Poison() macros, see perlclib.\n"
                },
                {
                    "name": "Read-only optrees",
                    "content": "Under ithreads the optree is read only.  If you want to enforce this, to check for write\naccesses from buggy code, compile with \"-Accflags=-DPERLDEBUGREADONLYOPS\" to enable code\nthat allocates op memory via \"mmap\", and sets it read-only when it is attached to a\nsubroutine.  Any write access to an op results in a \"SIGBUS\" and abort.\n\nThis code is intended for development only, and may not be portable even to all Unix\nvariants.  Also, it is an 80% solution, in that it isn't able to make all ops read only.\nSpecifically it does not apply to op slabs belonging to \"BEGIN\" blocks.\n\nHowever, as an 80% solution it is still effective, as it has caught bugs in the past.\n"
                },
                {
                    "name": "When is a bool not a bool?",
                    "content": "On pre-C99 compilers, \"bool\" is defined as equivalent to \"char\".  Consequently assignment of\nany larger type to a \"bool\" is unsafe and may be truncated.  The \"cBOOL\" macro exists to cast\nit correctly; you may also find that using it is shorter and clearer than writing out the\nequivalent conditional expression longhand.\n\nOn those platforms and compilers where \"bool\" really is a boolean (C++, C99), it is easy to\nforget the cast.  You can force \"bool\" to be a \"char\" by compiling with\n\"-Accflags=-DPERLBOOLASCHAR\".  You may also wish to run \"Configure\" with something like\n\n-Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32'\n\nor your compiler's equivalent to make it easier to spot any unsafe truncations that show up.\n\nThe \"TRUE\" and \"FALSE\" macros are available for situations where using them would clarify\nintent. (But they always just mean the same as the integers 1 and 0 regardless, so using them\nisn't compulsory.)\n"
                },
                {
                    "name": "The .i Targets",
                    "content": "You can expand the macros in a foo.c file by saying\n\nmake foo.i\n\nwhich will expand the macros using cpp.  Don't be scared by the results.\n"
                }
            ]
        },
        "AUTHOR": {
            "content": "This document was originally written by Nathan Torkington, and is maintained by the\nperl5-porters mailing list.\n\n\n\nperl v5.34.0                                 2025-07-25                              PERLHACKTIPS(1)",
            "subsections": []
        }
    },
    "summary": "perlhacktips - Tips for Perl core C code hacking",
    "flags": [],
    "examples": [],
    "see_also": []
}