PHP Documentation Meeting 2003 - Agenda

$Id: 2003_meeting_agenda.html 156097 2004-04-15 10:29:49Z didou $

Members of the PHP Documentation Team met in 2002 in Stuttgart to discuss several problems of the PHP documentation and find ways to improve the content, the technical background, the communication, the legal issues, etc. Now it is 2003 and time has come to organize another face-to-face meeting to discuss what has to be done to achive those goals not reached yet, and to define new ones.

The meeting will take advantage of LinuxTag which will take place in Karlsruhe, between 2003. July 10 and 13. The meeting will be on July 10, and will be a whole day event. Location: Room 1.32 (located in the conference hall).

Here is a list of those who are currently known to be at Linuxtag, and probably interested in the meeting [in alphabetical order, without accents]:

If you have any problems with this list [you are on it, but will not be there, or the other way round] then fix the list in phpdoc CVS, or contact Goba

This summary was created in the hope that it will be useful, and would help better use the very few available hours to discuss the topics. Note that in 2002 we had a multiday event, while this is not possible now as it seems, but we still have plenty to discuss.

Topics for discussion

If you have a new topic for discussion, feel free to open a new thread on it at phpdoc. For topics described here, see the pointed articles, RFCs, proposals, discussions, and see if you can say new or can sum up the discussions better. In case you have pointers to more info on some topics, add those to the page in phpdoc CVS, or contact Goba

  1. Crediting manual contributors

    This problem is in the air for a long time now. We have no rules now on who can be listed as an author / editor. This was decided by the editors in the past after community discussion on a case by case basis. The current license legally puts the manual under the hands of those listed on the frontpage, though there are much more contributors who helped with adding content, finetuning the build system, etc.

    There were several suggestions on improving the situation. At the 2002 meeting we found that we need one or a few license guys, who can handle license questions, so there won't be a need to contact all listed authors / editors in case of a license related request. Who should those be, was not decided yet, Goba was proposed.

    There were no guidelines provided however on how to give credit to contributors and this is still an open question. There were some threads on this at phpdoc. It seems, that initial human based nominations were accepted and agreed, but there was no agreement regarding mass listing of smaller contributors.

    Some important questions:
    • One list or a "main" and an "others" list?
    • Human based inclusion or machine based?
    • Who should decide on license questions (including translations)?
    • What to do with the inactive contributors?
    • What names/titles exist for those that contribute? Helper, Author, Editor, Technical Editor, etc. And how are they defined?
  2. A user friendly translation / editing environment

    The PHP documentation has many translations, some of them are halted, near dead projects. The main problem with them is probably that the joining requirements are too high [Linux / cygwin, CVS, XML]. Or at least they seem too high for regular PHP programmers, who would be happy to help. It would also be convinient many times for experienced helpers to just concentrate on the content, and not bother with XML.

    Sandro Zic as well as others have many good ideas on integrating some WYSIWYG editors [optinally] into the workflow, so we can get more contributors to translate, and fix documentation. A convinient system is also needed to easily update what is translated, as left alone old translations are sometimes much worse then non-translated content.

    Some important questions:
    • How to integrate such stuff into the current system?
    • Authorize the submissions before committing?
    • What impact would a WYSIWYG editing method has on our diffing+mailing system, as WYSIWYG editors are known to 'rewrite' files, and produce useless diffs?
    • Where will the required diff -u and make test commands be executed? People really need to check these before any commit is done and we don't have the resources to execute these for everyone (or do we?)
  3. Handling user notes

    We have a very few volunteers working on the manual user notes now. This low number is somewhat because of the fact that those guys are not recognized at all, the php-notes list is even not widely known to exist. Still those who work there do a good job in keeping the notes somewhat clean and integrating useful content into the manual.

    There were some ideas on improving this system, including the adoption of the voting system used on the PHP-GTK site, an approval system for the user notes, and having extensions dedicated to given user note maintainers. Another good question is whether we will supply the user notes with the offline formats as users request it, and in what form [eg. without email addresses to prevent further mass email harvesting].

    Long ago there was discussion (and even commited code) about a system that allows people (anyone, with or without CVS accounts) to "maintain" a manual page's notes. For example, Joe would manage the strlen() notes. Not to say only Joe could edit them but Joe would at least focus on this page and/or other pages.

    See also: the PHP-GTK manual for some ideas on how the rating works there, and the 2002 meeting protocol on the user note related findings there. Also, the notes maintainer proposal.
  4. Build system [aka getting rid of DSSSL]

    The build system is the foundation of our documentation framework. There are several problems with it, depending on from where are you looking at it.

    • Getting to know the build system is hard for newcomers [especially coming from the Windows world].
    • Maintaining the build system is not easy, we don't have expertise to customize the DSSSL stylesheets, which is one of the building blocks of the system, this stops many advances [see also parts, reference grouping]
    • Getting some results out of the build system takes a very long time, and so it is not useable for quick translation testing.
    • Due to the previous point the builds take days for all the languages on the building server on 100% processor load, so building the manuals is a heavy task

    Let me explain the process of building the manual, so you can see the problems, even if you are not working for the docteam every day. Red lines mark this explanation, so in case you are not that interested, you know what to skip :) I know that this may be boring, but the quality of the translations and the manual is largely depends on the build system, so it is a very important issue [eg. PDF or extended CHM building].

    For a working build system, one needs the usual Linux tools [cvs, autoconf, make, gcc for some operations], plus a PHP CLI or CGI setup and [open]jade with [o]nsgmls. To build some format of the manual you need to do:

    • cvs checkout phpdoc
    • cd phpdoc
    • cvs chechout phpdoc-LANG-dir, replacing LANG with the desired language, repeating this for all the languages needed [except English]
    • autoconf - this will create ./configure
    • ./configure --with-lang=LANG
      This will do some preparations:
      • search for tools [jade, openjade, PHP, etc.]
      • replace configuration vars in all the files ending in ".in"
      • create file entities for all the XML files used
      • run [o]nsgmls to create a list of entities and link endpoints referenced, but not available in the document to make the build process work even if such errors exist
      After this, the build system is configured, and you can choose a format to build the manual.
    • To test the manual for errors, you can run make test here, which uses [o]nsgmls to find errors. Similar to this is make test_man_gen which is run on the build server, and it is more forgiving for link endpoint errors, in case there are any left after the above configure run.
    • In case you would like to get HTML output, you can run make html, which will invoke [open]jade and will create HTML output in about 30 minutes. In case of RTL languages [Hebrew, Arabic] a special output parser [written in PHP] runs through the output to finetune it for RTL rules.
    • For PHP coded output to use on a mirror site, you can run make phpweb and expect the output in slightly more time then make html, as this outputs more interlinks then the HTML version. The RTL patch is applied here too. Before the [open]jade run, a special phpweb_entities file is created [using a PHP script] to make the php.net links relative to the mirror sites hosting the manuals. This entity file is removed after this build.
    • To get one big HTML file, you can run make bightml, and that will produce the output in significantly in less time then the two previous output methods, as no chunking is needed and interlinks are easier to calculate. No RTL patch is applied for this output so far.
    • The PalmDoc version [make palmdoc] depends on a txt version already built [which is done with filtering the bightml output through lynx]. This target runs the scripts/makedoc program to create the isilo version out of the txt [compresses the text file].
    • The Palm iSilo version [make isilo] uses the iSilo386 external program which should be installed on the build machine. It compresses the bightml version using the iSilo format.
    • The PDF version [make pdf] "is created" by first using [open]jade to convert the manual to tex format. Then jadetex is run three times on this tex file to create a DVI [multipass is needed to make the output look right]. Then dvipdfm is run to create a PDF out of the DVI. This process is not working currently, as there are some limits in the processing tools which we managed to step through.
      There is a special pdfjadetex tool that creates PDF output from the generated tex source right away instead of DVI, but this is even more limited by hard coded table sizes within the tex binaries.
      The 3 passes are needed to render the final document as tex uses a streaming approach, creating output immediately from the input and the state information collected up to that point. So e.g. a table of contents at the beginning of a document can't be created on the first run as the chapter and section titles are not known yet. These are collected on the first run and put into a special .aux file. On the second run the table of contents is created from this .aux file. Due to the now insertet toc all following pages are shifted right so that the page numbers in the toc are not yet correct. So a final third run is needed after inserting the toc (and other stuff referencing page numbers or symbolic labels). On very rare occasions even a 4th run is needed as e.g. a page number changing from 99 to 100 in the 3rd run may lead to an additional page break somewhere in the document ...
    • To create a CHM version, the make html output is used as a basis. A custom PHP postprocesor runs though that, and gathers the table of contents information, and rewrites some parts. Then hhc [the HTML Help Compiler] is run to create the CHM. Hhc is a free program, only available for Windows.
    • To create an extended CHM version, you need XSLT tools [xsltproc]. You need to run make chm_xsl and then download and unbz2 the actual complete user notes package. Then you can run make_chm in the htmlhelp dir of phpdoc [not the CHM dir!]. This uses some PHP regex magic to rewrite some parts of the XSL output, similarly to the normal CHM version, and also uses hhc.

    As you can see the problem with the build system is that it does not attempt to reuse many already built parts. The html and phpweb outputs are completely separately built for example, while they are 95% the same content (the navigation parts differ). A further problem is that the whole manual is built everytime. That means no quick check possibility for translators, but more importantly a huge waste of time on the build machine, as most of the translations still have heavy untranslated content.

    Using PHP regex magic for providing RTL support, and building the two CHM versions is also not an ideal solution, as those are very vulnerable to changes in the output format. The extended CHM builds are in fact halted because of some significant changes in the XSL output format lately.

    The whole point in getting rid of DSSSL is that we need some solutions we can customize, and modify as needed. XSLT is ideal for that as there are some guys knowing that standard, and it is easily readable, so anyone can help. But itself XSLT does not solve the "build time waste" problems. The negative point of the XSLT based HTML generation is that when we deal with a translated manual, there can be encoding colissions in entities and untranslated documents, though there are proposed solutions for that.

    Goba had a suggestion on the XSLT base to build a TOC first and then skip non-translated parts on the XSLT level. That would enable quick testing too. Wez has a "livedocs" concept which may lead to a more maintainable solution then XSLT. Ongoing work on that shows that it can quickly become a useable solution for all of the HTML formats. Damien also uses a similar solution to parse DocBook XML with PHP to create the special French versions of the manual, including the PDF [filtering HTML though the htmldoc PDF generator].

    Regarding the offline versions of the manual, Goba had an idea of a "self-hosted manual" which would provide integration support for PEAR, PHP-GTK and any other docs, and would use XML for navigation and HTML and/or XML for formatted pages. Local searching would be supported with a PHP based solution. Even customized PDF building would be possible with this version. Wez's livedocs will be the base of this for the PHP docs.

    Some important questions:
    • Where to go from DSSSL? XSLT or custom PHP solution?
    • How to get a full PDF version working again?
    • Use an xmllint based "make test" and "./configure" instead of an [o]nsgmls based [as it is currently]?
    • Employ a distributed build system to make automatic manual building work again?
    • Add some priorities to the build system so English manuals, and phpweb manuals are built more often?
    • How to merge the two CHM versions, and make it automatically buildable?
  5. Structuring the documentation

    Having structured see also sections in the documentation, grouping of reference sections by type and manpage like function documentation are old ideas. The problem with most of these ideas is that the current DSSSL based build system blocks us from going on with them, as we lack the expertise to customize the DSSSL sheets for our likeing.

    There is also no support in DocBook for that kind of see also lists we would like to use, and there is absolutely no support for reference part grouping. So to achive that we need to customize DocBook somehow, or rewrite the whole manual structure. The PHP-GTK documentation group has good experience with using a modified DocBook DTD, which is very well supported by DocBook.

    Still we cannot step in this direction unless the build system is ready for the modified input. Also note that the flow of the extensions documented in the TOC is the worst thing we can provide for a newcomer, and we reecieved criticism for this point many times in the past.

    Some important questions:
    • Is it OK to use a modified DocBook DTD?
    • Is the manpage like function documentation still preferred?
    • What is higher priority in the above suggestion list (reference grouping, structured see alsos, manpage structure)?
    See also: RFC/reference_grouping, RFC/manual.xml.in, and dtds/phpbook.dtd which has add support for reference grouping to DocBook XML.
  6. PHP 5 and PECL

    It is a largely open question when and how we should document the upcoming new features in PHP 5. As well as how to document the PHP classes using DocBook. It is important that when PHP 5 comes out [even in RCs?], the documentation should be up to date. But it is also important that we are not placing content into the manual related to PHP 5 without special notices, so users won't get confused.

    With the upcoming PHP releases and with PHP 5, many extension will be moved [some are already moved] to PECL. The PHP documentation is getting increasingly ready for a copy-paste move of the docs of those extensions [having all extension related docs at one place]. But it is still not decided whether all moved extensions' docs should move to PECL docs, or leave some (like the bundled extensions) at phpdoc.

    It is also important to maintain some consistency in moved docs. Eg. have a listing of removed extensions, make the URL shortcuts redirect to the PECL manual, move / remove the user notes related to those extensions, etc. Also, before anything is officially moved out of phpdoc, it should be available online in the peardoc manual.

    Some important questions:
    • How / when to document PHP 5 features?
    • What extensions will be moved to PECL and when?
    • Is it true that every extension will be moved into PECL, but most still bundled?
    • Will peardoc/PECL continue to use our old extension.xml format? This makes moving more difficult.
    Some ideas, phpdoc/RFC/removed_extensions, phpdoc/RFC/moved_extensions, and a PHP 5 Faqt that links to PHP 5 related resources.
  7. What do do with PHP 3 documentation

    In an effort to preserve history and possible PHP 3 users, we need to keep this information available but at the same time it adds clutter to the documentation. One proposal is to list all PHP 3 information in it's own appendix, and add appropriate links/notes to them. Within a manual page, one will see a simple link titled "PHP 3 Note" and it'll link to the appropriate location in the "PHP 3 Appendix". In here it might mention a feature was added in PHP 3, etc. Most people seemed to like this idea, no specifics have been discussed.

  8. Better 'documentation experience' for users

    This point includes discussion of ideas for improving the presentation of the online and offline manuals. Manual search solutions, PHP source code highlighting, better table of contents, indexes and other stuff would greatly add to the user experience. As PEAR is becoming more important, peardoc items may need quick access from php.net, using URL shortcuts and searches (eg. php.net/sqlite).

    The extended CHM is the best version for onscreen browsing on Windows. We should build on it's experience and add new features to other formats too [eg. include user notes in downloadable versions?].

    The role of the different formats should also be discussed. Whether we need two different Palm versions, two different CHM versions and if anyone uses the bightml version [except for converting it to PDF]? Would a self-hosted manual obsolote the CHMs?

    See also the parts on different formats in the build system topic.
  9. Separation of 'PHP Manual' and 'PHP Internals Manual'

    Having 'PHP Internals' information in a manual mainly focused at users leads to many confusions. Development pages can be returned in searches, users can get there inadvertantly. Having these sections in the manual also slows down the builds, and makes the manual downloads bigger for no reason. Those parts are not used by most of the reader base.

    There should be an easy way to separate the manual from the internals part and have two books separately built. The internals book would include information on development for PHP 3 / 4 and the streams development docs currently included in the manual. How this can be integrated in the current build system is still a question.

    Some important questions:
    • Do we need translation support for the Internals Manual?
    • Should it sit in the phpdoc CVS module?
    • PECL internals are also affected, so this Internals Manual should cover both?
    • How to integrate it with the build system?
    I don't know of any material to put here...

Contact

Contact the PHP Documentation Mailing List in case of questions and suggestions for discussions. In case of suggestions regarding this page, please write to Goba (goba at php dot net) or fix the page yourself in phpdoc/RFC/2003_meeting_agenda.html.